Vision API

The Reka Vision API provides powerful video processing and analysis capabilities, enabling you to upload, manage, and interact with videos using AI-powered question answering.

The Vision API provides a managed service by pre-processing and storing your videos and embeddings for you. If you prefer to handle this yourself, you may choose to use the Chat API with multimodal inputs.

Key Features

  • Video Management: Upload, retrieve, list, and delete videos
  • Video Search: Find videos using semantic search
  • Video Q&A: Ask questions about video content and get AI-powered answers with streaming support
  • MCP Server: Connect agent clients to Reka Vision for video research workflows
  • Metadata Tagging: Generate tags for videos
  • Highlight Clip Generation: Generate shorter highlight clips from your longer videos

Getting Started

  1. Upload a video using the /v1/videos/upload endpoint
  2. Search your videos using the /v1/videos/search endpoint
  3. Ask questions about your videos using the /v1/qa/chat endpoint
  4. Generate tags for your videos using the /v1/qa/indexedtag endpoint
  5. Generate highlight clips from your videos using the /v1/clips endpoint

Vision MCP Server

Connect Claude Desktop, Cursor, Claude Code, and other MCP clients to Reka Vision through the Reka MCP server. Your agent can upload, manage, search, understand, and analyze videos using Reka Vision without calling the API directly.

Authentication

All Vision API requests require authentication using an API key in the X-Api-Key header:

$X-Api-Key: YOUR_API_KEY

Base URL

https://vision-agent.api.reka.ai

Healthcheck

Quickly verify service availability by pinging /health.

$curl -s https://vision-agent.api.reka.ai/health | jq .
1import requests
2
3def health_check():
4 url = "https://vision-agent.api.reka.ai/health"
5 resp = requests.get(url, timeout=5)
6 resp.raise_for_status()
7 print("[HEALTH ✓]", resp.status_code)
8
9health_check()

Finetuning

We offer finetuning services with our engineering team.

Pricing starts from $0.5 per input video minute (including output text), with discounts for larger scale training.