Image, Video, and Audio Chat
The chat API supports multimodal inputs, including images, videos, and audio.
You can insert multimodal content in the conversation by using media content types. The supported types are: image_url
, video_url
, audio_url
, and pdf_url
.
Below is an example of sending an image of a cat by URL:
This will output a response like:
The animal in the image is a domestic cat. Specifically, it appears to be a ginger or orange tabby cat, which is characterized by its reddish-brown fur with darker stripes or patches. The cat is engaging in a common feline behavior of sniffing or licking objects, which in this case is a computer keyboard. Cats are known for their curiosity and often explore their environment by using their sense of smell, which is highly developed. The act of licking or sniffing can also be a way for cats to mark their territory with pheromones from their saliva.
Data URLs
The API supports sending media via data URLs, for example you could URL-encode a jpeg image and then set image_url
to a value like "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAASABIAAD/4QmoRXhpZgAATU0AKgAAAAgADQEPAAIAA..."
.
Multiple media
You can send multiple media files in your request by appending them to a content
for a user
like this:
Streaming, Async, and other advanced usage
Please see the guide for text-only chat for more guidance on the advanced features of the chat API, which also work for multimodal inputs.