Image, Video, and Audio Chat

The chat API supports multimodal inputs, including images, videos, and audio.

You can insert multimodal content in the conversation by using media content types. The supported types are: image_url, video_url, audio_url, and pdf_url.

Below is an example of sending an image of a cat by URL:

Image of a cat on a keyboard
1from reka import ChatMessage
2from reka.client import Reka
4client = Reka()
5response =
6 messages=[
7 ChatMessage(
8 content=[
9 {"type": "image_url", "image_url": ""},
10 {"type": "text", "text": "What animal is this? Answer briefly"}
11 ],
12 role="user",
13 )
14 ],
15 model="reka-core-20240501",

This will output a response like:

The animal in the image is a domestic cat. Specifically, it appears to be a ginger or orange tabby cat, which is characterized by its reddish-brown fur with darker stripes or patches. The cat is engaging in a common feline behavior of sniffing or licking objects, which in this case is a computer keyboard. Cats are known for their curiosity and often explore their environment by using their sense of smell, which is highly developed. The act of licking or sniffing can also be a way for cats to mark their territory with pheromones from their saliva.

Data URLs

The API supports sending media via data URLs, for example you could URL-encode a jpeg image and then set image_url to a value like "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAASABIAAD/4QmoRXhpZgAATU0AKgAAAAgADQEPAAIAA...".

Multiple media

You can send multiple media files in your request by appending them to a content for a user like this:

1response =
2 messages=[
3 ChatMessage(
4 content=[
5 {"type": "image_url", "image_url": ""},
6 {"type": "image_url", "image_url": ""},
7 {"type": "text", "text": "What colours and shapes are present in both images?"}
8 ],
9 role="user",
10 )
11 ],
12 model="reka-core-20240501",

Streaming, Async, and other advanced usage

Please see the guide for text-only chat for more guidance on the advanced features of the chat API, which also work for multimodal inputs.