Speech-to-Speech Translation

The Speech API provides speech-to-speech translation capabilities that not only translate the content but also generate audio output in the target language. This allows you to create fully translated audio content while preserving the natural flow of speech.

Translate and Generate Speech

Translate audio from one language to another and receive both text and audio output.

Endpoint

  • POST /v1/transcription_or_translation

Request

Bash

$curl -X POST https://api.reka.ai/v1/transcription_or_translation \
> -H "X-Api-Key: YOUR_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "audio_url": "data:audio/wav;base64,<your_base64_encoded_audio>",
> "sampling_rate": 16000,
> "target_language": "chinese",
> "return_translation_audio": true,
> "temperature": 0.0,
> "max_tokens": 1024
> }'

Python

1import base64
2import io
3import httpx
4import librosa
5import soundfile
6from IPython.display import Audio, display
7
8REKA_API_KEY = "YOUR_API_KEY"
9SAMPLING_RATE = 16_000
10
11# Prepare audio
12with soundfile.SoundFile("/path/to/audio.wav") as sound_file:
13 waveform, _ = librosa.load(
14 sound_file,
15 sr=SAMPLING_RATE,
16 )
17 cache = io.BytesIO()
18 soundfile.write(cache, waveform, SAMPLING_RATE, format="WAV")
19 cache.seek(0)
20 audio_in_base64 = base64.b64encode(cache.read()).decode("ascii")
21
22audio_url = f"data:audio/wav;base64,{audio_in_base64}"
23
24# Make request
25with httpx.Client(timeout=180, follow_redirects=True) as client:
26 response = client.request(
27 method="POST",
28 url="https://api.reka.ai/v1/transcription_or_translation",
29 json={
30 "audio_url": audio_url,
31 "sampling_rate": SAMPLING_RATE,
32 "target_language": "chinese",
33 "return_translation_audio": True,
34 "temperature": 0.0,
35 "max_tokens": 1024,
36 "is_translate": True
37 },
38 headers={
39 "X-Api-Key": REKA_API_KEY,
40 },
41 )
42 result = response.json()
43 print("Original transcript:", result["transcript"])
44 print("Translation:", result["translation"])
45
46 audio_data = base64.b64decode(result["audio_base64"])
47 with open("translated.wav", "wb") as f:
48 f.write(audio_data)
49
50 out_path = os.path.abspath("translated.wav")
51 print(f"Saved audio file: {out_path}")
52

Parameters

  • audio_url (required): URL to the audio file or base64-encoded audio as data URI
  • sampling_rate (required): Audio sampling rate in Hz (recommended: 16000)
  • target_language (required): Target language for translation. Supported: "french", "spanish", "japanese", "chinese", "korean", "italian", "portuguese", "german"
  • return_translation_audio (required): Set to true to receive translated audio output
  • temperature (optional): Controls randomness in generation. Use 0.0 for deterministic output. Default: 0.0
  • max_tokens (optional): Maximum number of tokens to generate. Default: 1024
  • is_translate (required): Set to true to indicate translation request

Response

Returns a translation result with audio:

1{
2 "transcript": "Original transcribed text in source language",
3 "translation": "Translated text in target language",
4 "audio_base64": "base64_encoded_translated_audio_data"
5}
  • transcript: Transcribed text in the original language
  • translation: Translated text in the target language
  • audio_base64: Base64-encoded WAV audio of the translated speech

Use Cases

  • Video Dubbing: Create dubbed versions of videos in different languages
  • Voice Translation: Translate voice messages or audio notes
  • Multilingual Audio Content: Generate audio content in multiple languages from a single source
  • Accessibility: Provide audio translations for hearing-impaired users who prefer audio in their native language
  • Language Learning: Create parallel audio content for language learning applications

Performance Tips

  • Set temperature: 0.0 for consistent, deterministic output
  • The translated audio will be in WAV format at the same sampling rate as the input
  • Processing time increases with audio length; plan accordingly for long-form content