Speech-to-Speech Translation

The Speech API provides speech-to-speech translation capabilities that not only translate the content but also generate audio output in the target language. This allows you to create fully translated audio content while preserving the natural flow of speech.

Translate and Generate Speech

Translate audio from one language to another and receive both text and audio output.

Endpoint

POST /v1/transcription_or_translation

Request

Bash

$ curl -X POST https://api.reka.ai/v1/transcription_or_translation \
>     -H "X-Api-Key: YOUR_API_KEY" \
>     -H "Content-Type: application/json" \
>     -d '{
>     "audio_url": "data:audio/wav;base64,<your_base64_encoded_audio>",
>     "sampling_rate": 16000,
>     "target_language": "chinese",
>     "return_translation_audio": true,
>     "temperature": 0.0,
>     "max_tokens": 1024
>     }'

Python

1 import base64
2 import io
3 import httpx
4 import librosa
5 import soundfile
6 from IPython.display import Audio, display
7 
8 REKA_API_KEY = "YOUR_API_KEY"
9 SAMPLING_RATE = 16_000
10 
11 # Prepare audio
12 with soundfile.SoundFile("/path/to/audio.wav") as sound_file:
13     waveform, _ = librosa.load(
14         sound_file,
15         sr=SAMPLING_RATE,
16     )
17     cache = io.BytesIO()
18     soundfile.write(cache, waveform, SAMPLING_RATE, format="WAV")
19     cache.seek(0)
20     audio_in_base64 = base64.b64encode(cache.read()).decode("ascii")
21 
22 audio_url = f"data:audio/wav;base64,{audio_in_base64}"
23 
24 # Make request
25 with httpx.Client(timeout=180, follow_redirects=True) as client:
26     response = client.request(
27         method="POST",
28         url="https://api.reka.ai/v1/transcription_or_translation",
29         json={
30             "audio_url": audio_url,
31             "sampling_rate": SAMPLING_RATE,
32             "target_language": "chinese",
33             "return_translation_audio": True,
34             "temperature": 0.0,
35             "max_tokens": 1024,
36             "is_translate": True
37         },
38         headers={
39             "X-Api-Key": REKA_API_KEY,
40         },
41     )
42     result = response.json()
43     print("Original transcript:", result["transcript"])
44     print("Translation:", result["translation"])
45 
46     audio_data = base64.b64decode(result["audio_base64"])
47     with open("translated.wav", "wb") as f:
48         f.write(audio_data)
49 
50     out_path = os.path.abspath("translated.wav")
51     print(f"Saved audio file: {out_path}")
52

Parameters

audio_url (required): URL to the audio file or base64-encoded audio as data URI
sampling_rate (required): Audio sampling rate in Hz (recommended: 16000)
target_language (required): Target language for translation. Supported: "french", "spanish", "japanese", "chinese", "korean", "italian", "portuguese", "german"
return_translation_audio (required): Set to true to receive translated audio output
temperature (optional): Controls randomness in generation. Use 0.0 for deterministic output. Default: 0.0
max_tokens (optional): Maximum number of tokens to generate. Default: 1024
is_translate (required): Set to true to indicate translation request

Response

Returns a translation result with audio:

1 {
2     "transcript": "Original transcribed text in source language",
3     "translation": "Translated text in target language",
4     "audio_base64": "base64_encoded_translated_audio_data"
5 }

transcript: Transcribed text in the original language
translation: Translated text in the target language
audio_base64: Base64-encoded WAV audio of the translated speech

Use Cases

Video Dubbing: Create dubbed versions of videos in different languages
Voice Translation: Translate voice messages or audio notes
Multilingual Audio Content: Generate audio content in multiple languages from a single source
Accessibility: Provide audio translations for hearing-impaired users who prefer audio in their native language
Language Learning: Create parallel audio content for language learning applications

Performance Tips

Set temperature: 0.0 for consistent, deterministic output
The translated audio will be in WAV format at the same sampling rate as the input
Processing time increases with audio length; plan accordingly for long-form content