Speech Translation

The Speech API provides speech translation capabilities that convert audio from one language to another while preserving meaning and context. This is useful for multilingual content, international communication, and accessibility.

Translate Speech

Translate audio from one language to another with high accuracy.

Endpoint

  • POST /v1/transcription_or_translation

Request

Bash

$curl -X POST https://api.reka.ai/v1/transcription_or_translation \
> -H "X-Api-Key: YOUR_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "audio_url": "data:audio/wav;base64,<your_base64_encoded_audio>",
> "sampling_rate": 16000,
> "target_language": "chinese",
> "parallel_mode": true,
> "temperature": 0.0,
> "max_tokens": 1024
> }'

Python

1import base64
2import io
3import httpx
4import librosa
5import soundfile
6
7REKA_API_KEY = "YOUR_API_KEY"
8SAMPLING_RATE = 16_000
9
10# Prepare audio
11with soundfile.SoundFile("/path/to/audio.wav") as sound_file:
12 waveform, _ = librosa.load(
13 sound_file,
14 sr=SAMPLING_RATE,
15 )
16 cache = io.BytesIO()
17 soundfile.write(cache, waveform, SAMPLING_RATE, format="WAV")
18 cache.seek(0)
19 audio_in_base64 = base64.b64encode(cache.read()).decode("ascii")
20
21audio_url = f"data:audio/wav;base64,{audio_in_base64}"
22
23# Make request
24with httpx.Client(timeout=180, follow_redirects=True) as client:
25 response = client.request(
26 method="POST",
27 url="https://api.reka.ai/v1/transcription_or_translation",
28 json={
29 "audio_url": audio_url,
30 "sampling_rate": SAMPLING_RATE,
31 "target_language": "chinese",
32 "temperature": 0.0,
33 "max_tokens": 1024,
34 "is_translate": True
35 },
36 headers={
37 "X-Api-Key": REKA_API_KEY,
38 },
39 )
40 result = response.json()
41 print("Original transcript:", result["transcript"])
42 print("Translation:", result["translation"])

Parameters

  • audio_url (required): URL to the audio file or base64-encoded audio as data URI
  • sampling_rate (required): Audio sampling rate in Hz (recommended: 16000)
  • target_language (required): Target language for translation. Supported: "french", "spanish", "japanese", "chinese", "korean", "italian", "portuguese", "german"
  • temperature (optional): Controls randomness in generation. Use 0.0 for deterministic output. Default: 0.0
  • max_tokens (optional): Maximum number of tokens to generate. Default: 1024
  • is_translate (required): Set to true to indicate translation request

Response

Returns a translation result with:

1{
2 "transcript": "Original transcribed text in source language",
3 "translation": "Translated text in target language"
4}
  • transcript: Transcribed text in the original language
  • translation: Translated text in the target language

Supported Languages

The Speech API supports translation between English and the following languages:

  • French ("french")
  • Spanish ("spanish")
  • Japanese ("japanese")
  • Chinese ("chinese")
  • Korean ("korean")
  • Italian ("italian")
  • Portuguese ("portuguese")
  • German ("german")

Use Cases

  • Multilingual Content: Translate videos, podcasts, or audio content for international audiences
  • International Communication: Enable real-time communication across language barriers
  • Accessibility: Make content accessible to speakers of different languages
  • Localization: Adapt audio content for different markets

Performance Tips

  • Set temperature: 0.0 for consistent, deterministic translations
  • Use the recommended sampling rate of 16000 Hz for best results