Speech-to-Speech Translation
The Speech API provides speech-to-speech translation capabilities that not only translate the content but also generate audio output in the target language. This allows you to create fully translated audio content while preserving the natural flow of speech.
Translate and Generate Speech
Translate audio from one language to another and receive both text and audio output.
Endpoint
POST /v1/transcription_or_translation
Request
Bash
Python
Parameters
audio_url(required): URL to the audio file or base64-encoded audio as data URIsampling_rate(required): Audio sampling rate in Hz (recommended: 16000)target_language(required): Target language for translation. Supported:"french","spanish","japanese","chinese","korean","italian","portuguese","german"return_translation_audio(required): Set totrueto receive translated audio outputtemperature(optional): Controls randomness in generation. Use 0.0 for deterministic output. Default: 0.0max_tokens(optional): Maximum number of tokens to generate. Default: 1024is_translate(required): Set totrueto indicate translation request
Response
Returns a translation result with audio:
transcript: Transcribed text in the original languagetranslation: Translated text in the target languageaudio_base64: Base64-encoded WAV audio of the translated speech
Use Cases
- Video Dubbing: Create dubbed versions of videos in different languages
- Voice Translation: Translate voice messages or audio notes
- Multilingual Audio Content: Generate audio content in multiple languages from a single source
- Accessibility: Provide audio translations for hearing-impaired users who prefer audio in their native language
- Language Learning: Create parallel audio content for language learning applications
Performance Tips
- Set
temperature: 0.0for consistent, deterministic output - The translated audio will be in WAV format at the same sampling rate as the input
- Processing time increases with audio length; plan accordingly for long-form content