For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DiscordGet API Key
  • Getting Started
    • Overview
    • Quickstart
    • Errors
    • Pricing
  • Chat
    • Overview
    • Chat with Image, Video, and Audio
    • Function Calling
    • Models
  • Vision
    • Overview
    • Rate Limits
    • Pricing
    • MCP Server
    • Video Management
    • Video Group Management
    • Video Search
    • Video QA
    • Clip Generation
    • Metadata Tagging
    • Image Management
    • Image Search
  • Research
    • Overview
    • Streaming
    • Reasoning Steps
    • Web Search
    • Structured Output
    • Parallel Thinking
    • Best Practices
    • Errors
    • Examples
  • Speech
    • Overview
    • Audio Transcription
    • Speech Translation
    • Speech-to-Speech Translation
  • Resources
    • FAQs
    • Changelog
    • System Status
LogoLogo
DiscordGet API Key
On this page
  • Speech API
  • Key Features
  • Getting Started
  • Authentication
  • Base URL
  • Supported Languages
  • Audio Format Requirements
  • Models
  • Example: Prepare Audio
  • Python
  • Rate Limits
Speech

Speech API Overview

Was this page helpful?
Previous

Audio Transcription

Next
Built with

Speech API

The Reka Speech API provides powerful audio processing capabilities, enabling you to transcribe audio, translate speech across languages, and generate translated audio output using AI-powered models.

Key Features

  • Audio Transcription: Convert speech to text with word-level timestamps
  • Speech Translation: Translate audio from one language to another with high accuracy
  • Speech-to-Speech Translation: Generate translated audio output that preserves the original speaking style
  • Multi-language Support: Support for English, French, Spanish, Japanese, Chinese, Korean, Italian, Portuguese, and German

Getting Started

  1. Encode to base64 or host the audio file and provide a URL
  2. Transcribe audio using the /v1/transcription_or_translation endpoint
  3. Translate speech by adding target_language parameter
  4. Generate translated audio by setting return_translation_audio: true

Authentication

All Speech API requests require authentication using an API key in the X-Api-Key header:

$X-Api-Key: YOUR_API_KEY

Base URL

https://api.reka.ai

Supported Languages

The Speech API supports translation between English and the following languages:

  • French
  • Spanish
  • Japanese
  • Chinese
  • Korean
  • Italian
  • Portuguese
  • German

Audio Format Requirements

  • Format: WAV
  • Sampling Rate: 16,000 Hz (recommended)
  • Encoding: Base64 or URL to hosted audio file
  • Input Methods:
    • audio_url: URL to audio file (http/https or data URI)
    • Base64-encoded audio as data URI: data:audio/wav;base64,<base64_string>

Models

The Speech API supports reka-tiny-asr: Fast and efficient model for transcription

Example: Prepare Audio

Before calling the API, you need to prepare your audio file:

Python

1import base64
2import io
3import librosa
4import soundfile
5
6SAMPLING_RATE = 16_000
7
8with soundfile.SoundFile("/path/to/audio.wav") as sound_file:
9 waveform, _ = librosa.load(
10 sound_file,
11 sr=SAMPLING_RATE,
12 )
13 cache = io.BytesIO()
14 soundfile.write(cache, waveform, SAMPLING_RATE, format="WAV")
15 cache.seek(0)
16 audio_in_base64 = base64.b64encode(cache.read()).decode("ascii")
17
18audio_url = f"data:audio/wav;base64,{audio_in_base64}"

Rate Limits

Please contact us for information about rate limits and pricing for the Speech API.