Books/AI APIs & SDKs/Google Gemini API

    Google Gemini API

    Google Gemini API

    Google's Gemini is a family of multimodal AI models that can process text, images, audio, and video. Let's learn how to use the Gemini API in your applications.

    Setting Up the Google AI SDK

    Install

    npm install @google/generative-ai

    Configure Your API Key

    Get your key from aistudio.google.com/apikey, then add it to your .env:

    GOOGLE_API_KEY=AIzaSy-your-key-here

    Initialize the Client

    import { GoogleGenerativeAI } from "@google/generative-ai";
    
    const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!);

    Unlike OpenAI and Anthropic, the Google SDK doesn't automatically read from environment variables — you need to pass the key explicitly.

    Gemini Models Overview

    ModelBest ForSpeedCost
    Gemini 2.5 FlashMost tasks, balanced quality and speedFastLow
    Gemini 2.5 ProComplex reasoning, coding, multimodalModerateMedium
    Gemini 2.0 FlashSimple tasks, very high speedVery fastVery low

    Model Selection

    // General purpose — great default
    const flash = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });
    
    // Complex reasoning
    const pro = genAI.getGenerativeModel({ model: "gemini-2.5-pro" });

    Basic Text Generation

    Simple Prompt

    import { GoogleGenerativeAI } from "@google/generative-ai";
    
    const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!);
    const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });
    
    const result = await model.generateContent("What is TypeScript?");
    const text = result.response.text();
    
    console.log(text);

    With System Instructions

    const model = genAI.getGenerativeModel({
      model: "gemini-2.5-flash",
      systemInstruction: "You are a senior software engineer. Give concise, practical answers with code examples.",
    });
    
    const result = await model.generateContent("How do I handle errors in async/await?");
    console.log(result.response.text());

    With Generation Config

    const model = genAI.getGenerativeModel({
      model: "gemini-2.5-flash",
      generationConfig: {
        temperature: 0.7,
        topP: 0.9,
        maxOutputTokens: 1024,
      },
    });
    
    const result = await model.generateContent("Write a haiku about programming");
    console.log(result.response.text());

    Multi-Turn Conversations (Chat)

    Gemini has a built-in chat interface that manages conversation history for you:

    const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });
    
    const chat = model.startChat({
      history: [],
    });
    
    // Turn 1
    const result1 = await chat.sendMessage("What is a closure in JavaScript?");
    console.log(result1.response.text());
    
    // Turn 2 — automatically has context from Turn 1
    const result2 = await chat.sendMessage("Can you show me a practical example?");
    console.log(result2.response.text());
    
    // Turn 3 — the conversation continues
    const result3 = await chat.sendMessage("How does this relate to React hooks?");
    console.log(result3.response.text());

    This is simpler than OpenAI and Anthropic, where you need to manually track the message history.

    Chat with Initial History

    const chat = model.startChat({
      history: [
        {
          role: "user",
          parts: [{ text: "My name is Alex and I'm learning TypeScript." }],
        },
        {
          role: "model",
          parts: [{ text: "Nice to meet you, Alex! I'd be happy to help you learn TypeScript." }],
        },
      ],
    });
    
    const result = await chat.sendMessage("What should I learn first?");
    // The model remembers the context about Alex and TypeScript

    What to ask your AI: "Build a chatbot using the Gemini API with conversation history and system instructions."

    Multimodal: Text + Images

    One of Gemini's standout features is native multimodal support — you can send images alongside text:

    Analyze an Image from a File

    import { GoogleGenerativeAI } from "@google/generative-ai";
    import * as fs from "fs";
    
    const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!);
    const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });
    
    // Read image as base64
    const imageBuffer = fs.readFileSync("./screenshot.png");
    const imageBase64 = imageBuffer.toString("base64");
    
    const result = await model.generateContent([
      {
        inlineData: {
          mimeType: "image/png",
          data: imageBase64,
        },
      },
      "Describe what you see in this image. If it's code, explain what it does.",
    ]);
    
    console.log(result.response.text());

    Analyze an Image from a URL

    const imageUrl = "https://example.com/chart.png";
    const imageResponse = await fetch(imageUrl);
    const imageArrayBuffer = await imageResponse.arrayBuffer();
    const imageBase64 = Buffer.from(imageArrayBuffer).toString("base64");
    
    const result = await model.generateContent([
      {
        inlineData: {
          mimeType: "image/png",
          data: imageBase64,
        },
      },
      "What data does this chart show? Summarize the key trends.",
    ]);

    Multiple Images

    const result = await model.generateContent([
      {
        inlineData: {
          mimeType: "image/png",
          data: image1Base64,
        },
      },
      {
        inlineData: {
          mimeType: "image/png",
          data: image2Base64,
        },
      },
      "Compare these two designs. Which one has better visual hierarchy and why?",
    ]);

    What to ask your AI: "Build a feature that lets users upload an image and get an AI description using the Gemini API."

    Gemini vs. OpenAI vs. Claude

    FeatureOpenAI (GPT-4o)Anthropic (Claude Sonnet)Google (Gemini Flash)
    SDKopenai@anthropic-ai/sdk@google/generative-ai
    Env VariableOPENAI_API_KEYANTHROPIC_API_KEYManual
    System promptMessage roleTop-level paramModel config
    Chat historyManual arrayManual arrayBuilt-in chat object
    MultimodalYes (images)Yes (images)Yes (images, audio, video)
    Free tierNoNoYes (generous)
    StreamingYesYesYes

    When to Choose Gemini

    • Free tier needed — Gemini has a generous free tier for prototyping
    • Multimodal first — Best native support for images, audio, and video
    • Google ecosystem — If you're already using Firebase, Google Cloud, etc.
    • Long documents — Gemini has very large context windows

    When to Choose OpenAI

    • Widest ecosystem — Most tutorials, tools, and integrations
    • Function calling — Most mature tool-use implementation
    • ChatGPT familiarity — Same models as the chat product

    When to Choose Anthropic

    • Instruction following — Claude excels at following detailed instructions
    • Long, careful analysis — Strong at reading and analyzing long documents
    • Safety-conscious — Designed to be helpful, harmless, and honest

    Error Handling

    import { GoogleGenerativeAI } from "@google/generative-ai";
    
    const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!);
    const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });
    
    try {
      const result = await model.generateContent("Hello!");
      console.log(result.response.text());
    } catch (error) {
      if (error instanceof Error) {
        console.error("Error:", error.message);
    
        if (error.message.includes("429")) {
          console.error("Rate limited — wait and retry");
        }
        if (error.message.includes("API_KEY")) {
          console.error("Invalid API key — check your GOOGLE_API_KEY");
        }
      }
    }

    What's Next?

    Now that you know all three major providers, let's learn about streaming — sending responses to the user in real-time as they're generated.

    What to ask your AI: "Help me build a multimodal app with Gemini that can analyze images and answer questions about them."


    🌐 www.genai-mentor.ai