Enable your AI Agent to understand images

All articles on how to create & orchestrate LLM-powered AI Agents

Cognigy’s AI Agents are multimodal, capable of understanding both text and images. To enable this functionality, you’ll need a storage provider and a vision-enabled large language model like GPT 4o (see “Prerequisites”).

Follow these steps to activate image processing for your AI Agent:

Enable image processing.
Activate Process Images in the Image Handling section of your AI Agent Node. Here you can also specify exactly how images should appear in the transcript.
Enable attachments to allow image uploads.
Activate "Attachment Upload" in your webchat endpoint settings in the Webchat Behavior section. You’ll need a storage provider such as AWS, Azure, or Google. Refer to Cognigy’s documentation to see which other Endpoints allow attachment uploads. Here is an example image (bar code).

Via the webchat, you can now upload an image of a tracking number directly into your conversation in the webchat, eliminating the need to type in all the details manually.

➡️ As a next step, get our AI Agent ready for voice experiences.

ACME_Logistics_Overview_With_Tables.pdf
(5 KB)
code.png
(5 KB)

Comments

0 comments

Please sign in to leave a comment.

Was this article helpful?

0 out of 0 found this helpful

Enable your AI Agent to understand images

Getting Started with AI Agents

Comments