Overview
This page gives an introduction to using the Chat API via the Python SDK "reka-api>=2.0.0"
.
Find the documentation for API v0 (Python SDK < 2.0.0) at: https://v0.docs.reka.ai/.
Quickstart
First, obtain an API key by setting up an account in the Reka Platform.
Then, install the Reka Python SDK with pip install "reka-api>=2.0.0"
.
You can then use your API key to query the models:
This will print a response like:
The fifth prime number is 11. Here’s a quick breakdown of the first five prime numbers in order: 2, 3, 5, 7, 11.
Single Turn Prompting
A simple single turn request can be made as follows:
This will return a response like:
See Available Models for details on valid model names.
Multiple Turn Conversations
You can request a response to a multiple turn conversation by adding more messages in the history. For example:
This will return a response like:
Assistant Completions
We support guiding the assistant output (e.g. prompting it to output a structured JSON response), by allowing the developer to specify how the assistant response should start. This is done by adding a partial assistant response as the last message:
This will output:
Useful Parameters
The parameters of the Chat API are fully documented in the API reference, but some particularly useful parameters are listed below:
- temperature: Typically between 0 and 1. Values close to 0 will result in less varied generations, and higher values will result in more variation and creativity.
- max_tokens: The maximum number of tokens that should be returned. Increase this if generations are being truncated, i.e. the
finish_reason
in the response is"length"
. - stop: A list of strings that should stop the generation. This can be used to stop after generating a code block, when reaching a certain number in a list etc.
Streaming
The Chat API supports streaming with the chat_stream
function in the Python SDK,
or by setting stream
to true
in the HTTP API. Below is an example of streaming in Python:
Async
The Python SDK also exports an async client so that you can make non-blocking calls to our API. This can be useful to make batch requests. The following code illustrates how to batch calls to the API, by creating a list of async tasks, and gathering them with asyncio.gather
. The Semaphore
limits the number of concurrent requests to the API.