Chat with Image, Video, and Audio

The Chat API supports conversations that include images, short videos, and audio.

Our Chat API performs best with videos shorter than 30 seconds.

For longer videos, use our Vision API instead. After uploading your video, we run a sophisticated pipeline to process your video so that it is optimized for our models to query your video content.

For one-shot flows, see below on Working with video.

You can insert multimodal content in the conversation by using media content types. The supported types are: image_url, video_url, audio_url, and pdf_url.

Below is an example of sending an image of a cat by URL:

Image of a cat on a keyboard
1from reka import ChatMessage
2from reka.client import Reka
3
4client = Reka()
5response = client.chat.create(
6 messages=[
7 ChatMessage(
8 content=[
9 {"type": "image_url", "image_url": "https://v0.docs.reka.ai/_images/000000245576.jpg"},
10 {"type": "text", "text": "What animal is this? Answer briefly"}
11 ],
12 role="user",
13 )
14 ],
15 model="reka-flash",
16)
17print(response.responses[0].message.content)

This will output a response like:

The animal in the image is a domestic cat. Specifically, it appears to be a ginger or orange tabby cat, which is characterized by its reddish-brown fur with darker stripes or patches. The cat is engaging in a common feline behavior of sniffing or licking objects, which in this case is a computer keyboard. Cats are known for their curiosity and often explore their environment by using their sense of smell, which is highly developed. The act of licking or sniffing can also be a way for cats to mark their territory with pheromones from their saliva.

Data URLs

The API supports sending media via data URLs, for example you could URL-encode a jpeg image and then set image_url to a value like "...".

Multiple media

You can send multiple media files in your request by appending them to the content array for a user message:

1response = client.chat.create(
2 messages=[
3 ChatMessage(
4 content=[
5 {"type": "image_url", "image_url": "https://example.com/image_1.jpg"},
6 {"type": "image_url", "image_url": "https://example.com/image_2.jpg"},
7 {"type": "text", "text": "What colours and shapes are present in both images?"}
8 ],
9 role="user",
10 )
11 ],
12 model="reka-flash",
13)

Working with video

Using the Chat API with a video URL

Note that this method only works if the video URL is unprotected - providers like Youtube employ defensive measures to prevent video download.

Our Vision API provides a managed service which helps you download and process videos from Youtube.

If you have a short video (less than 30 seconds), you can pass it into the Chat API using the video_url content type:

1from reka import ChatMessage
2from reka.client import Reka
3
4client = Reka()
5response = client.chat.create(
6 messages=[
7 ChatMessage(
8 content=[
9 {"type": "video_url", "video_url": "https://example.com/short_video.mp4"},
10 {"type": "text", "text": "Describe what happens in this video."}
11 ],
12 role="user",
13 )
14 ],
15 model="reka-flash",
16)
17print(response.responses[0].message.content)

For longer videos, use our Vision API. After uploading your video, we run a sophisticated pipeline to process and extract information from your video so that it is optimized for our models to query your video content. The Vision API supports downloads from Youtube so all you need to do is specify a video URL.

Longer videos using the Chat API - processing videos yourself

If you prefer to handle video processing of longer yourself (for e.g. to control frame sampling or reduce upload size), you can send pre-extracted frames to the Chat API using the video/jpeg MIME type.

First, extract frames from your video using ffmpeg. This command extracts one frame per second:

$ffmpeg -i input.mp4 -vf "fps=1" -q:v 2 frame_%03d.jpg

Then encode each frame as base64 and join them with commas:

Error handling is omitted from these examples for brevity. In production code, you should handle file I/O errors and HTTP response errors appropriately.
1import base64
2import glob
3from reka import ChatMessage
4from reka.client import Reka
5
6# Read and encode frames extracted by ffmpeg
7frames = []
8for frame_path in sorted(glob.glob("frame_*.jpg")):
9 with open(frame_path, "rb") as f:
10 base64_frame = base64.b64encode(f.read()).decode("utf-8")
11 frames.append(base64_frame)
12
13# Join frames with commas using the video/jpeg MIME type
14video_data_url = "data:video/jpeg;base64," + ",".join(frames)
15
16client = Reka()
17response = client.chat.create(
18 messages=[
19 ChatMessage(
20 content=[
21 {"type": "video_url", "video_url": video_data_url},
22 {"type": "text", "text": "Describe what happens in this video."}
23 ],
24 role="user",
25 )
26 ],
27 model="reka-flash",
28)
29print(response.responses[0].message.content)

In practice this is not too different from sending multiple image_url entries. However, currently there is a limit of 6 media items per user turn so video_url allows you to send more frames at once than if you used image_url.

Streaming, Async, and other advanced usage

Please see the guide for text-only chat for more guidance on the advanced features of the Chat API, which also work for multimodal inputs.