Overview
This page gives an introduction to using our Chat API, which supports conversations that include images, short videos, and audio.
Quickstart
First, obtain an API key by setting up an account in the Reka Platform.
For Python, install the Reka Python SDK with pip install "reka-api>=2.0.0".
You can then use your API key to query the models:
This will print a response like:
The fifth prime number is 11. Here’s a quick breakdown of the first five prime numbers in order: 2, 3, 5, 7, 11.
Single Turn Prompting
A simple single turn request can be made as follows:
This will return a response like:
See Available Models for details on valid model names.
Multiple Turn Conversations
You can request a response to a multiple turn conversation by adding more messages in the history. For example:
This will return a response like:
Assistant Completions
We support guiding the assistant output (e.g. prompting it to output a structured JSON response), by allowing the developer to specify how the assistant response should start. This is done by adding a partial assistant response as the last message:
This will output:
Useful Parameters
The parameters of the Chat API are fully documented in the API reference, but some particularly useful parameters are listed below:
- temperature: Typically between 0 and 1. Values close to 0 will result in less varied generations, and higher values will result in more variation and creativity.
- max_tokens: The maximum number of tokens that should be returned. Increase this if generations are being truncated, i.e. the
finish_reasonin the response is"length". - stop: A list of strings that should stop the generation. This can be used to stop after generating a code block, when reaching a certain number in a list etc.
Streaming
The Chat API supports streaming with the create_stream method in the Python SDK,
or by setting stream to true in the HTTP API.
Async
The Python SDK also exports an async client so that you can make non-blocking calls to our API. This can be useful to make batch requests. The following code illustrates how to batch calls to the API, by creating a list of async tasks, and gathering them with asyncio.gather. The Semaphore limits the number of concurrent requests to the API.