DiscordPlatform
  • Getting Started
    • Overview
    • Quick Start
    • Pricing
  • Research
    • Overview
    • Streaming
    • Reasoning Steps
    • Web Search
    • Structured Output
    • Parallel Thinking
    • Best Practices
    • Errors
    • Examples
  • Chat
    • Overview
    • Chat with Image, Video, and Audio
    • Function Calling
    • Models
    • API Versioning
    • Python SDK
  • Vision
    • Overview
    • Pricing
    • Video Management
    • Video Search
    • Video QA
    • Highlight Reel Generation
    • Metadata Tagging
    • Image Management
    • Image Search
  • Speech
    • Overview
    • Audio Transcription
    • Speech Translation
    • Speech-to-Speech Translation
  • Resources
    • FAQs
    • Changelog
    • System Status
LogoLogo
DiscordPlatform
SpeechAPI Reference

POST
https://api.reka.ai/v1/transcription_or_translation
POST
/v1/transcription_or_translation
1import requests
2
3url = "https://api.reka.ai/v1/transcription_or_translation"
4
5payload = {
6 "audio_url": "data:audio/wav;base64,<base64_encoded_audio>",
7 "sampling_rate": 16000,
8 "temperature": 0,
9 "max_tokens": 1024
10}
11headers = {
12 "X-Api-Key": "<apiKey>",
13 "Content-Type": "application/json"
14}
15
16response = requests.post(url, json=payload, headers=headers)
17
18print(response.json())
Try it
200Transcribe Audio with Timestamps
1{
2 "transcript": "Example transcribed text from the audio",
3 "transcript_translation_with_timestamp": [
4 {
5 "start": 0,
6 "end": 0.5,
7 "transcript": "Example"
8 },
9 {
10 "start": 0.5,
11 "end": 1.2,
12 "transcript": "transcribed"
13 }
14 ]
15}
Was this page helpful?
Previous

FAQs

Common questions about Reka API
Next
Built with
Transcribe audio to text, translate speech to another language, or generate translated audio output. This endpoint supports multiple modes: - Transcription only: Convert speech to text with optional word-level timestamps - Translation: Translate audio from one language to another - Speech-to-speech translation: Generate translated audio output
Transcribe or Translate

Authentication

X-Api-Keystring
API Key authentication via header

Request

This endpoint expects an object.
audio_urlstringRequired
URL to the audio file or base64-encoded audio as data URI (data:audio/wav;base64,...)
sampling_rateintegerRequiredDefaults to 16000
Audio sampling rate in Hz
target_languageenumOptional
Target language for translation
is_translatebooleanOptionalDefaults to false
Set to true to indicate translation request
return_translation_audiobooleanOptionalDefaults to false
If true, returns base64-encoded audio of the translated speech
temperaturedoubleOptional
Controls randomness in generation. Use 0.0 for deterministic output
max_tokensintegerOptional>=1Defaults to 1024
Maximum number of tokens to generate

Response

Successful transcription or translation
transcriptstring or null
Transcribed text in the original language
translationstring or null
Translated text in the target language (only if target_language is specified)
transcript_translation_with_timestamplist of objects or null
Word-level timestamps
audio_base64string or null
Base64-encoded WAV audio of the translated speech (only if return_translation_audio is true)

Errors

Audio sampling rate in Hz
Target language for translation
Set to true to indicate translation request
Controls randomness in generation. Use 0.0 for deterministic output
Maximum number of tokens to generate
Successful transcription or translation

Translated text in the target language (only if target_language is specified)

Transcribe audio to text, translate speech to another language, or generate translated audio output.

This endpoint supports multiple modes:

  • Transcription only: Convert speech to text with optional word-level timestamps
  • Translation: Translate audio from one language to another
  • Speech-to-speech translation: Generate translated audio output

URL to the audio file or base64-encoded audio as data URI (data:audio/wav;base64,…)

Base64-encoded WAV audio of the translated speech (only if return_translation_audio is true)

Word-level timestamps

If true, returns base64-encoded audio of the translated speech