Chat with Image, Video, and Audio

The Chat API supports conversations that include images, short videos, and audio.

Our Chat API performs best with videos shorter than 30 seconds.

For longer videos, use our Vision API instead. After uploading your video, we run a sophisticated pipeline to process your video so that it is optimized for our models to query your video content.

For one-shot flows, see below on Working with video.

You can insert multimodal content in the conversation by using media content types. The supported types are: image_url, video_url, audio_url, and pdf_url.

Below is an example of sending an image of a cat by URL:

1 from openai import OpenAI
2 
3 client = OpenAI(
4     base_url="https://api.reka.ai/v1",
5     api_key="YOUR_API_KEY",
6 )
7 
8 response = client.chat.completions.create(
9     model="reka-flash",
10     messages=[
11         {
12             "role": "user",
13             "content": [
14                 {"type": "image_url", "image_url": {"url": "https://v0.docs.reka.ai/_images/000000245576.jpg"}},
15                 {"type": "text", "text": "What animal is this? Answer briefly"}
16             ],
17         }
18     ],
19 )
20 print(response.choices[0].message.content)

This will output a response like:

The animal in the image is a domestic cat. Specifically, it appears to be a ginger or orange tabby cat, which is characterized by its reddish-brown fur with darker stripes or patches. The cat is engaging in a common feline behavior of sniffing or licking objects, which in this case is a computer keyboard. Cats are known for their curiosity and often explore their environment by using their sense of smell, which is highly developed. The act of licking or sniffing can also be a way for cats to mark their territory with pheromones from their saliva.

Data URLs

The API supports sending media via data URLs, for example you could URL-encode a jpeg image and then set image_url to a value like "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAASABIAAD/4QmoRXhpZgAATU0AKgAAAAgADQEPAAIAA...".

Multiple media

You can send multiple media files in your request by appending them to the content array for a user message:

1 response = client.chat.completions.create(
2     model="reka-flash",
3     messages=[
4         {
5             "role": "user",
6             "content": [
7                 {"type": "image_url", "image_url": {"url": "https://example.com/image_1.jpg"}},
8                 {"type": "image_url", "image_url": {"url": "https://example.com/image_2.jpg"}},
9                 {"type": "text", "text": "What colours and shapes are present in both images?"}
10             ],
11         }
12     ],
13 )

Working with video

Using the Chat API with a video URL

Note that this method only works if the video URL is unprotected - providers like Youtube employ defensive measures to prevent video download.

Our Vision API provides a managed service which helps you download and process videos from Youtube.

If you have a short video (less than 30 seconds), you can pass it into the Chat API using the video_url content type:

1 from openai import OpenAI
2 
3 client = OpenAI(
4     base_url="https://api.reka.ai/v1",
5     api_key="YOUR_API_KEY",
6 )
7 
8 response = client.chat.completions.create(
9     model="reka-flash",
10     messages=[
11         {
12             "role": "user",
13             "content": [
14                 {"type": "video_url", "video_url": "https://example.com/short_video.mp4"},
15                 {"type": "text", "text": "Describe what happens in this video."}
16             ],
17         }
18     ],
19 )
20 print(response.choices[0].message.content)

Longer videos using the Vision API (recommended)

For longer videos, use our Vision API. After uploading your video, we run a sophisticated pipeline to process and extract information from your video so that it is optimized for our models to query your video content. The Vision API supports downloads from Youtube so all you need to do is specify a video URL.

Longer videos using the Chat API - processing videos yourself

If you prefer to handle video processing yourself (for e.g. to control frame sampling or reduce upload size), you can extract frames from your video and send them as multiple image_url content entries.

First, extract frames from your video using ffmpeg. This command extracts one frame per second:

$ ffmpeg -i input.mp4 -vf "fps=1" -q:v 2 frame_%03d.jpg

Then encode each frame as a base64 data URL and send them as separate image_url entries:

Error handling is omitted from these examples for brevity. In production code, you should handle file I/O errors and HTTP response errors appropriately.

1 import base64
2 import glob
3 from openai import OpenAI
4 
5 # Read and encode frames extracted by ffmpeg
6 image_content = []
7 for frame_path in sorted(glob.glob("frame_*.jpg")):
8     with open(frame_path, "rb") as f:
9         base64_frame = base64.b64encode(f.read()).decode("utf-8")
10         image_content.append({
11             "type": "image_url",
12             "image_url": {"url": f"data:image/jpeg;base64,{base64_frame}"}
13         })
14 
15 client = OpenAI(
16     base_url="https://api.reka.ai/v1",
17     api_key="YOUR_API_KEY",
18 )
19 response = client.chat.completions.create(
20     model="reka-flash",
21     messages=[
22         {
23             "role": "user",
24             "content": [
25                 *image_content,
26                 {"type": "text", "text": "Describe what happens in this video."}
27             ],
28         }
29     ],
30 )
31 print(response.choices[0].message.content)

This uses the same image_url content type shown in the Multiple media section above, with each video frame sent as a separate base64-encoded image.

Streaming, Async, and other advanced usage

Please see the guide for text-only chat for more guidance on the advanced features of the Chat API, which also work for multimodal inputs.