For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DiscordGet API Key
  • Getting Started
    • Overview
    • Quickstart
    • Errors
    • Pricing
  • Chat
    • Overview
    • Chat with Image, Video, and Audio
    • Function Calling
    • Models
  • Vision
    • Overview
    • Rate Limits
    • Pricing
    • MCP Server
    • Video Management
    • Video Group Management
    • Video Search
    • Video QA
    • Clip Generation
    • Metadata Tagging
    • Image Management
    • Image Search
  • Research
    • Overview
    • Streaming
    • Reasoning Steps
    • Web Search
    • Structured Output
    • Parallel Thinking
    • Best Practices
    • Errors
    • Examples
  • Speech
    • Overview
    • Audio Transcription
    • Speech Translation
    • Speech-to-Speech Translation
  • Resources
    • FAQs
    • Changelog
    • System Status
LogoLogo
DiscordGet API Key
On this page
  • Example Usage
  • Performance Impact
Research

Parallel Thinking

Was this page helpful?
Previous

Best Practices for Reka Research

Next
Built with

Reka Research supports a version of test time compute called Parallel Thinking for tasks and applications that require the highest quality results.

Example Usage

To enable this, you need to set mode to either "high" or "low". This controls the amount of compute used to generate the answer. The default, "none", is equivalent to the default Reka Research behavior.

Parallel Thinking requests are not supported when streaming is enabled.

1completion = client.chat.completions.create(
2 model="reka-flash-research",
3 messages=[
4 {
5 "role": "user",
6 "content": "Provide in in-depth comparison between the iPhone 16 and the Samsung Galaxy S25.",
7 },
8 ],
9 extra_body={
10 "research": {
11 "parallel_thinking": {
12 "mode": "high"
13 },
14 }
15 },
16)

The response is in the exact same format as a response from a standard Reka Research request. This allows you to seamlessly transition between the two modes and decide dynamically when you need the extra compute and quality that parallel thinking gives you.

Performance Impact

Under the hood, Reka Research is using more compute to generate a higher quality response. This means that there will be a slight performance impact on the request.