All articles on how to create & orchestrate LLM-powered AI Agents
- Getting Started with AI Agents
- Prerequisites: Set up your AI Agent's brain
- Create your AI Agent's persona
- Give your AI Agent a Job
- Make knowledge available to your AI Agent
- Give your AI Agent access to memory
- Deploy and use your AI Agent
- Improve your AI Agent’s skills using Tool Actions
- Enable your AI Agent to understand images
- Talk to your AI Agent via voice or phone
- Debugging your AI Agent
Cognigy’s AI Agents are multimodal, capable of understanding both text and images. To enable this functionality, you’ll need a storage provider and a vision-enabled large language model like GPT 4o (see “Prerequisites”).
Follow these steps to activate image processing for your AI Agent:
-
Enable image processing.
Activate Process Images in the Image Handling section of your AI Agent Node. Here you can also specify exactly how images should appear in the transcript.
-
Enable attachments to allow image uploads.
Activate "Attachment Upload" in your webchat endpoint settings in the Webchat Behavior section. You’ll need a storage provider such as AWS, Azure, or Google. Refer to Cognigy’s documentation to see which other Endpoints allow attachment uploads. Here is an example image (bar code).
Via the webchat, you can now upload an image of a tracking number directly into your conversation in the webchat, eliminating the need to type in all the details manually.
➡️ As a next step, get our AI Agent ready for voice experiences.
Comments
0 comments