Parallel Thinking

Reka Research supports a version of test time compute called Parallel Thinking for tasks and applications that require the highest quality results.

Example Usage

To enable this, you need to set mode to either "high" or "low". This controls the amount of compute used to generate the answer. The default, "none", is equivalent to the default Reka Research behavior.

Parallel Thinking requests are not supported when streaming is enabled.

1 completion = client.chat.completions.create(
2     model="reka-flash-research",
3     messages=[
4         {
5             "role": "user",
6             "content": "Provide in in-depth comparison between the iPhone 16 and the Samsung Galaxy S25.",
7         },
8     ],
9     extra_body={
10         "research": {
11             "parallel_thinking": {
12                 "mode": "high"
13             },
14         }
15     },   
16 )

The response is in the exact same format as a response from a standard Reka Research request. This allows you to seamlessly transition between the two modes and decide dynamically when you need the extra compute and quality that parallel thinking gives you.

Performance Impact

Under the hood, Reka Research is using more compute to generate a higher quality response. This means that there will be a slight performance impact on the request.