Speech API Overview

Speech API

The Reka Speech API provides powerful audio processing capabilities, enabling you to transcribe audio, translate speech across languages, and generate translated audio output using AI-powered models.

Key Features

  • Audio Transcription: Convert speech to text with word-level timestamps
  • Speech Translation: Translate audio from one language to another with high accuracy
  • Speech-to-Speech Translation: Generate translated audio output that preserves the original speaking style
  • Multi-language Support: Support for English, French, Spanish, Japanese, Chinese, Korean, Italian, Portuguese, and German

Getting Started

  1. Encode to base64 or host the audio file and provide a URL
  2. Transcribe audio using the /v1/transcription_or_translation endpoint
  3. Translate speech by adding target_language parameter
  4. Generate translated audio by setting return_translation_audio: true

Authentication

All Speech API requests require authentication using an API key in the X-Api-Key header:

$X-Api-Key: YOUR_API_KEY

Base URL

https://api.reka.ai

Supported Languages

The Speech API supports translation between English and the following languages:

  • French
  • Spanish
  • Japanese
  • Chinese
  • Korean
  • Italian
  • Portuguese
  • German

Audio Format Requirements

  • Format: WAV
  • Sampling Rate: 16,000 Hz (recommended)
  • Encoding: Base64 or URL to hosted audio file
  • Input Methods:
    • audio_url: URL to audio file (http/https or data URI)
    • Base64-encoded audio as data URI: data:audio/wav;base64,<base64_string>

Models

The Speech API supports different models for various use cases:

  • reka-tiny-asr: Fast and efficient model for transcription
  • reka-spark: Advanced model for translation with high accuracy

Example: Prepare Audio

Before calling the API, you need to prepare your audio file:

Python

1import base64
2import io
3import librosa
4import soundfile
5
6SAMPLING_RATE = 16_000
7
8with soundfile.SoundFile("/path/to/audio.wav") as sound_file:
9 waveform, _ = librosa.load(
10 sound_file,
11 sr=SAMPLING_RATE,
12 )
13 cache = io.BytesIO()
14 soundfile.write(cache, waveform, SAMPLING_RATE, format="WAV")
15 cache.seek(0)
16 audio_in_base64 = base64.b64encode(cache.read()).decode("ascii")
17
18audio_url = f"data:audio/wav;base64,{audio_in_base64}"

Rate Limits

Please contact us for information about rate limits and pricing for the Speech API.