Speech-to-Speech Translation
The Speech API provides speech-to-speech translation capabilities that not only translate the content but also generate audio output in the target language. This allows you to create fully translated audio content while preserving the natural flow of speech.
Translate and Generate Speech
Translate audio from one language to another and receive both text and audio output.
Endpoint
- POST /v1/transcription_or_translation
Request
Bash
Python
Parameters
- audio_url(required): URL to the audio file or base64-encoded audio as data URI
- sampling_rate(required): Audio sampling rate in Hz (recommended: 16000)
- target_language(required): Target language for translation. Supported:- "french",- "spanish",- "japanese",- "chinese",- "korean",- "italian",- "portuguese",- "german"
- return_translation_audio(required): Set to- trueto receive translated audio output
- temperature(optional): Controls randomness in generation. Use 0.0 for deterministic output. Default: 0.0
- max_tokens(optional): Maximum number of tokens to generate. Default: 1024
- is_translate(required): Set to- trueto indicate translation request
Response
Returns a translation result with audio:
- transcript: Transcribed text in the original language
- translation: Translated text in the target language
- audio_base64: Base64-encoded WAV audio of the translated speech
Use Cases
- Video Dubbing: Create dubbed versions of videos in different languages
- Voice Translation: Translate voice messages or audio notes
- Multilingual Audio Content: Generate audio content in multiple languages from a single source
- Accessibility: Provide audio translations for hearing-impaired users who prefer audio in their native language
- Language Learning: Create parallel audio content for language learning applications
Performance Tips
- Set temperature: 0.0for consistent, deterministic output
- The translated audio will be in WAV format at the same sampling rate as the input
- Processing time increases with audio length; plan accordingly for long-form content