Reka AI Documentation

This page shows you the steps needed to get started with the Reka API and running the Reka Edge model locally.

Run models on the Reka Platform
Run Reka Edge locally (macOS)
Run Reka Edge locally using OpenAI-compatible server (Linux)

Run models on the Reka Platform

Create a free account on the Reka Platform to access your API key.

Keep your API key secure. Never expose it in client-side code or share it publicly.

Install the SDK

Install the OpenAI Python SDK — our API is fully OpenAI-compatible.

$ pip install openai

Available models

Our baseline models always available for public access are:

model name
reka-flash
reka-edge (or reka-edge-2603)

Make your first request

First request

1 from openai import OpenAI
2 
3 client = OpenAI(
4     base_url="https://api.reka.ai/v1",
5     api_key="YOUR_API_KEY",
6 )
7 
8 response = client.chat.completions.create(
9     model="reka-edge",  # or "reka-flash"
10     messages=[
11         {
12             "role": "user",
13             "content": [
14                 {"type": "image_url", "image_url": {"url": "https://v0.docs.reka.ai/_images/000000245576.jpg"}},
15                 {"type": "text", "text": "What do you like about this image?"}
16             ],
17         }
18     ],
19 )
20 print(response.choices[0].message.content)

Run Reka Edge locally (macOS)

See our HuggingFace repository for instructions on running Reka Edge locally.

Requirements

OS: macOS 13+
Hardware: Apple Silicon Mac with 32 GB+ unified memory (M1 Pro/Max or later recommended)
Python: 3.12+
uv (recommended) — handles dependencies automatically

Run Reka Edge locally using OpenAI-compatible server (Linux)

For high-throughput serving, you can use the vllm-reka plugin that extends standard vLLM to support Reka’s custom architectures and optimized tokenizer. Please follow our vllm-reka installation instructions to install the plugin along with vLLM.

Requirements

OS: Linux with CUDA. macOS is not supported for serving.
Hardware: NVIDIA GPU, ideally with ≥24 GB VRAM. This has been tested to work on GTX 3090 GPUs with 40-50 tokens/s.
Python: 3.10 ≥ x > 3.14
vLLM: 0.15.x (0.15.0 ≥ x > 0.16.0)

Next up

Explore the Reka API’s capabilities through the following guides: