Streaming
Get partial responses in real time with stream=True
. Ideal for low-latency apps that thrive on fast feedback.
The Chat Completions API uses server-sent events (SSE) to stream a sequence of chunks. Handle them in real time to update your UI as the agent reasons, searches, and responds.
Here how to do this in Python and TypeScript.
What you get back
Each streamed chunk contains a delta
: a partial result from the model. You’ll get reasoning steps, tool calls, and final answers in real time — exactly as the agent generates them.
Output