Books/AI APIs & SDKs/Streaming and Chat Completions

Streaming and Chat Completions

When you use ChatGPT or Claude, you see words appearing one at a time — that's streaming. Without streaming, users wait in silence until the entire response is ready. Let's learn how to implement streaming in your applications.

Why Streaming Matters for UX

Without Streaming

User sends prompt → Waits 3-10 seconds → Entire response appears at once

The user sees a loading spinner for seconds with no feedback. This feels slow and unresponsive.

With Streaming

User sends prompt → Words start appearing in ~200ms → Response builds in real-time

The user sees the response forming immediately. Even though the total time is the same, the perceived speed is much faster.

Streaming is essential for any user-facing AI feature. It's what makes AI chat feel conversational rather than like submitting a form and waiting.

Streaming with OpenAI

Basic Streaming

import OpenAI from "openai";

const openai = new OpenAI();

const stream = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Explain how streaming works in AI APIs" }],
  stream: true,  // Enable streaming
});

// Process each chunk as it arrives
for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) {
    process.stdout.write(content);  // Print without newline
  }
}

The key difference is stream: true — instead of getting one response object, you get an async iterable of chunks. Each chunk contains a small piece of the response.

Collecting the Full Response

const stream = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Write a short poem" }],
  stream: true,
});

let fullResponse = "";

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) {
    fullResponse += content;
    // Update your UI with each chunk here
    updateUI(fullResponse);
  }
}

// fullResponse now contains the complete text
console.log("Complete:", fullResponse);

Streaming with Anthropic

Basic Streaming

import Anthropic from "@anthropic-ai/sdk";

const anthropic = new Anthropic();

const stream = anthropic.messages.stream({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Explain how streaming works" }],
});

// Process text events
stream.on("text", (text) => {
  process.stdout.write(text);
});

// Wait for completion
const finalMessage = await stream.finalMessage();
console.log("\nDone. Total tokens:", finalMessage.usage.output_tokens);

Alternative: Async Iterator

const stream = await anthropic.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Write a haiku" }],
  stream: true,
});

for await (const event of stream) {
  if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
    process.stdout.write(event.delta.text);
  }
}

Streaming with Google Gemini

import { GoogleGenerativeAI } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!);
const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });

const result = await model.generateContentStream("Explain streaming in AI APIs");

for await (const chunk of result.stream) {
  const text = chunk.text();
  process.stdout.write(text);
}

Building a Streaming Chat UI

Here's how to build a streaming chat in a web application using server-sent events (SSE).

Backend: API Route (Next.js Example)

// app/api/chat/route.ts
import OpenAI from "openai";

const openai = new OpenAI();

export async function POST(req: Request) {
  const { messages } = await req.json();

  const stream = await openai.chat.completions.create({
    model: "gpt-4o",
    messages,
    stream: true,
  });

  // Create a ReadableStream to send chunks to the client
  const encoder = new TextEncoder();
  const readableStream = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        const content = chunk.choices[0]?.delta?.content;
        if (content) {
          controller.enqueue(encoder.encode(`data: ${JSON.stringify({ content })}\n\n`));
        }
      }
      controller.enqueue(encoder.encode("data: [DONE]\n\n"));
      controller.close();
    },
  });

  return new Response(readableStream, {
    headers: {
      "Content-Type": "text/event-stream",
      "Cache-Control": "no-cache",
      Connection: "keep-alive",
    },
  });
}

Frontend: React Component

import { useState } from "react";

interface Message {
  role: "user" | "assistant";
  content: string;
}

export default function Chat() {
  const [messages, setMessages] = useState<Message[]>([]);
  const [input, setInput] = useState("");
  const [isStreaming, setIsStreaming] = useState(false);

  const sendMessage = async () => {
    if (!input.trim() || isStreaming) return;

    const userMessage: Message = { role: "user", content: input };
    const newMessages = [...messages, userMessage];
    setMessages(newMessages);
    setInput("");
    setIsStreaming(true);

    // Add empty assistant message that we'll fill with streamed content
    setMessages([...newMessages, { role: "assistant", content: "" }]);

    const response = await fetch("/api/chat", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ messages: newMessages }),
    });

    const reader = response.body!.getReader();
    const decoder = new TextDecoder();
    let assistantContent = "";

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const text = decoder.decode(value);
      const lines = text.split("\n").filter(Boolean);

      for (const line of lines) {
        if (line.startsWith("data: ") && line !== "data: [DONE]") {
          const data = JSON.parse(line.slice(6));
          assistantContent += data.content;

          // Update the last message with new content
          setMessages((prev) => [
            ...prev.slice(0, -1),
            { role: "assistant", content: assistantContent },
          ]);
        }
      }
    }

    setIsStreaming(false);
  };

  return (
    <div className="flex flex-col h-screen max-w-2xl mx-auto p-4">
      <div className="flex-1 overflow-y-auto space-y-4">
        {messages.map((msg, i) => (
          <div
            key={i}
            className={msg.role === "user" ? "text-right" : "text-left"}
          >
            <div
              className={`inline-block p-3 rounded-lg ${
                msg.role === "user"
                  ? "bg-blue-500 text-white"
                  : "bg-gray-100 text-gray-900"
              }`}
            >
              {msg.content}
            </div>
          </div>
        ))}
      </div>

      <div className="flex gap-2 pt-4">
        <input
          value={input}
          onChange={(e) => setInput(e.target.value)}
          onKeyDown={(e) => e.key === "Enter" && sendMessage()}
          placeholder="Type a message..."
          className="flex-1 p-2 border rounded"
          disabled={isStreaming}
        />
        <button
          onClick={sendMessage}
          disabled={isStreaming}
          className="px-4 py-2 bg-blue-500 text-white rounded disabled:opacity-50"
        >
          Send
        </button>
      </div>
    </div>
  );
}

What to ask your AI: "Build a streaming chat UI with React and the [OpenAI/Anthropic] API. Include message history, loading state, and auto-scroll."

Server-Sent Events (SSE) Explained

Server-Sent Events is the protocol used for streaming. It's a one-way channel from server to client.

How SSE Works

Client sends POST request → Server starts streaming
                          ← data: {"content": "Hello"}
                          ← data: {"content": " world"}
                          ← data: {"content": "!"}
                          ← data: [DONE]

SSE Format

Each event is a line starting with data: followed by the payload, ending with two newlines:

data: {"content": "Hello"}


data: {"content": " world"}


data: [DONE]

Why SSE Instead of WebSockets?

Feature	SSE	WebSockets
Direction	Server → Client only	Bidirectional
Complexity	Simple HTTP	Separate protocol
Reconnection	Built-in	Manual
Best for	Streaming AI responses	Real-time chat, games

SSE is perfect for AI streaming because data only flows in one direction (server to client). WebSockets would be overkill.

Streaming Tips

1. Handle Errors Gracefully

try {
  for await (const chunk of stream) {
    // process chunk
  }
} catch (error) {
  // Stream was interrupted — show error to user
  console.error("Stream error:", error);
  setMessages((prev) => [
    ...prev.slice(0, -1),
    { role: "assistant", content: "Sorry, an error occurred. Please try again." },
  ]);
}

2. Allow Cancellation

const controller = new AbortController();

const response = await fetch("/api/chat", {
  method: "POST",
  body: JSON.stringify({ messages }),
  signal: controller.signal,  // Pass abort signal
});

// To cancel:
controller.abort();

3. Auto-Scroll to Bottom

const messagesEndRef = useRef<HTMLDivElement>(null);

useEffect(() => {
  messagesEndRef.current?.scrollIntoView({ behavior: "smooth" });
}, [messages]);

What's Next?

Now let's learn about the practical side — API keys, costs, and rate limits — so you can build production applications without surprises on your bill.

What to ask your AI: "Add streaming support to my existing chat API route. I'm using [provider] with [framework]."

🌐 www.genai-mentor.ai

Google Gemini API

API Keys, Costs, and Rate Limits