Books/AI APIs & SDKs/Streaming and Chat Completions

    Streaming and Chat Completions

    Streaming and Chat Completions

    When you use ChatGPT or Claude, you see words appearing one at a time — that's streaming. Without streaming, users wait in silence until the entire response is ready. Let's learn how to implement streaming in your applications.

    Why Streaming Matters for UX

    Without Streaming

    User sends prompt → Waits 3-10 seconds → Entire response appears at once
    

    The user sees a loading spinner for seconds with no feedback. This feels slow and unresponsive.

    With Streaming

    User sends prompt → Words start appearing in ~200ms → Response builds in real-time
    

    The user sees the response forming immediately. Even though the total time is the same, the perceived speed is much faster.

    Streaming is essential for any user-facing AI feature. It's what makes AI chat feel conversational rather than like submitting a form and waiting.

    Streaming with OpenAI

    Basic Streaming

    import OpenAI from "openai";
    
    const openai = new OpenAI();
    
    const stream = await openai.chat.completions.create({
      model: "gpt-4o",
      messages: [{ role: "user", content: "Explain how streaming works in AI APIs" }],
      stream: true,  // Enable streaming
    });
    
    // Process each chunk as it arrives
    for await (const chunk of stream) {
      const content = chunk.choices[0]?.delta?.content;
      if (content) {
        process.stdout.write(content);  // Print without newline
      }
    }

    The key difference is stream: true — instead of getting one response object, you get an async iterable of chunks. Each chunk contains a small piece of the response.

    Collecting the Full Response

    const stream = await openai.chat.completions.create({
      model: "gpt-4o",
      messages: [{ role: "user", content: "Write a short poem" }],
      stream: true,
    });
    
    let fullResponse = "";
    
    for await (const chunk of stream) {
      const content = chunk.choices[0]?.delta?.content;
      if (content) {
        fullResponse += content;
        // Update your UI with each chunk here
        updateUI(fullResponse);
      }
    }
    
    // fullResponse now contains the complete text
    console.log("Complete:", fullResponse);

    Streaming with Anthropic

    Basic Streaming

    import Anthropic from "@anthropic-ai/sdk";
    
    const anthropic = new Anthropic();
    
    const stream = anthropic.messages.stream({
      model: "claude-sonnet-4-20250514",
      max_tokens: 1024,
      messages: [{ role: "user", content: "Explain how streaming works" }],
    });
    
    // Process text events
    stream.on("text", (text) => {
      process.stdout.write(text);
    });
    
    // Wait for completion
    const finalMessage = await stream.finalMessage();
    console.log("\nDone. Total tokens:", finalMessage.usage.output_tokens);

    Alternative: Async Iterator

    const stream = await anthropic.messages.create({
      model: "claude-sonnet-4-20250514",
      max_tokens: 1024,
      messages: [{ role: "user", content: "Write a haiku" }],
      stream: true,
    });
    
    for await (const event of stream) {
      if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
        process.stdout.write(event.delta.text);
      }
    }

    Streaming with Google Gemini

    import { GoogleGenerativeAI } from "@google/generative-ai";
    
    const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!);
    const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });
    
    const result = await model.generateContentStream("Explain streaming in AI APIs");
    
    for await (const chunk of result.stream) {
      const text = chunk.text();
      process.stdout.write(text);
    }

    Building a Streaming Chat UI

    Here's how to build a streaming chat in a web application using server-sent events (SSE).

    Backend: API Route (Next.js Example)

    // app/api/chat/route.ts
    import OpenAI from "openai";
    
    const openai = new OpenAI();
    
    export async function POST(req: Request) {
      const { messages } = await req.json();
    
      const stream = await openai.chat.completions.create({
        model: "gpt-4o",
        messages,
        stream: true,
      });
    
      // Create a ReadableStream to send chunks to the client
      const encoder = new TextEncoder();
      const readableStream = new ReadableStream({
        async start(controller) {
          for await (const chunk of stream) {
            const content = chunk.choices[0]?.delta?.content;
            if (content) {
              controller.enqueue(encoder.encode(`data: ${JSON.stringify({ content })}\n\n`));
            }
          }
          controller.enqueue(encoder.encode("data: [DONE]\n\n"));
          controller.close();
        },
      });
    
      return new Response(readableStream, {
        headers: {
          "Content-Type": "text/event-stream",
          "Cache-Control": "no-cache",
          Connection: "keep-alive",
        },
      });
    }

    Frontend: React Component

    import { useState } from "react";
    
    interface Message {
      role: "user" | "assistant";
      content: string;
    }
    
    export default function Chat() {
      const [messages, setMessages] = useState<Message[]>([]);
      const [input, setInput] = useState("");
      const [isStreaming, setIsStreaming] = useState(false);
    
      const sendMessage = async () => {
        if (!input.trim() || isStreaming) return;
    
        const userMessage: Message = { role: "user", content: input };
        const newMessages = [...messages, userMessage];
        setMessages(newMessages);
        setInput("");
        setIsStreaming(true);
    
        // Add empty assistant message that we'll fill with streamed content
        setMessages([...newMessages, { role: "assistant", content: "" }]);
    
        const response = await fetch("/api/chat", {
          method: "POST",
          headers: { "Content-Type": "application/json" },
          body: JSON.stringify({ messages: newMessages }),
        });
    
        const reader = response.body!.getReader();
        const decoder = new TextDecoder();
        let assistantContent = "";
    
        while (true) {
          const { done, value } = await reader.read();
          if (done) break;
    
          const text = decoder.decode(value);
          const lines = text.split("\n").filter(Boolean);
    
          for (const line of lines) {
            if (line.startsWith("data: ") && line !== "data: [DONE]") {
              const data = JSON.parse(line.slice(6));
              assistantContent += data.content;
    
              // Update the last message with new content
              setMessages((prev) => [
                ...prev.slice(0, -1),
                { role: "assistant", content: assistantContent },
              ]);
            }
          }
        }
    
        setIsStreaming(false);
      };
    
      return (
        <div className="flex flex-col h-screen max-w-2xl mx-auto p-4">
          <div className="flex-1 overflow-y-auto space-y-4">
            {messages.map((msg, i) => (
              <div
                key={i}
                className={msg.role === "user" ? "text-right" : "text-left"}
              >
                <div
                  className={`inline-block p-3 rounded-lg ${
                    msg.role === "user"
                      ? "bg-blue-500 text-white"
                      : "bg-gray-100 text-gray-900"
                  }`}
                >
                  {msg.content}
                </div>
              </div>
            ))}
          </div>
    
          <div className="flex gap-2 pt-4">
            <input
              value={input}
              onChange={(e) => setInput(e.target.value)}
              onKeyDown={(e) => e.key === "Enter" && sendMessage()}
              placeholder="Type a message..."
              className="flex-1 p-2 border rounded"
              disabled={isStreaming}
            />
            <button
              onClick={sendMessage}
              disabled={isStreaming}
              className="px-4 py-2 bg-blue-500 text-white rounded disabled:opacity-50"
            >
              Send
            </button>
          </div>
        </div>
      );
    }

    What to ask your AI: "Build a streaming chat UI with React and the [OpenAI/Anthropic] API. Include message history, loading state, and auto-scroll."

    Server-Sent Events (SSE) Explained

    Server-Sent Events is the protocol used for streaming. It's a one-way channel from server to client.

    How SSE Works

    Client sends POST request → Server starts streaming
                              ← data: {"content": "Hello"}
                              ← data: {"content": " world"}
                              ← data: {"content": "!"}
                              ← data: [DONE]
    

    SSE Format

    Each event is a line starting with data: followed by the payload, ending with two newlines:

    data: {"content": "Hello"}
    
    
    data: {"content": " world"}
    
    
    data: [DONE]
    
    
    

    Why SSE Instead of WebSockets?

    FeatureSSEWebSockets
    DirectionServer → Client onlyBidirectional
    ComplexitySimple HTTPSeparate protocol
    ReconnectionBuilt-inManual
    Best forStreaming AI responsesReal-time chat, games

    SSE is perfect for AI streaming because data only flows in one direction (server to client). WebSockets would be overkill.

    Streaming Tips

    1. Handle Errors Gracefully

    try {
      for await (const chunk of stream) {
        // process chunk
      }
    } catch (error) {
      // Stream was interrupted — show error to user
      console.error("Stream error:", error);
      setMessages((prev) => [
        ...prev.slice(0, -1),
        { role: "assistant", content: "Sorry, an error occurred. Please try again." },
      ]);
    }

    2. Allow Cancellation

    const controller = new AbortController();
    
    const response = await fetch("/api/chat", {
      method: "POST",
      body: JSON.stringify({ messages }),
      signal: controller.signal,  // Pass abort signal
    });
    
    // To cancel:
    controller.abort();

    3. Auto-Scroll to Bottom

    const messagesEndRef = useRef<HTMLDivElement>(null);
    
    useEffect(() => {
      messagesEndRef.current?.scrollIntoView({ behavior: "smooth" });
    }, [messages]);

    What's Next?

    Now let's learn about the practical side — API keys, costs, and rate limits — so you can build production applications without surprises on your bill.

    What to ask your AI: "Add streaming support to my existing chat API route. I'm using [provider] with [framework]."


    🌐 www.genai-mentor.ai