Books/AI APIs & SDKs/Google Gemini API

Google Gemini API

Google's Gemini is a family of multimodal AI models that can process text, images, audio, and video. Let's learn how to use the Gemini API in your applications.

Setting Up the Google AI SDK

Install

npm install @google/generative-ai

Configure Your API Key

Get your key from aistudio.google.com/apikey, then add it to your .env:

GOOGLE_API_KEY=AIzaSy-your-key-here

Initialize the Client

import { GoogleGenerativeAI } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!);

Unlike OpenAI and Anthropic, the Google SDK doesn't automatically read from environment variables — you need to pass the key explicitly.

Gemini Models Overview

Model	Best For	Speed	Cost
Gemini 2.5 Flash	Most tasks, balanced quality and speed	Fast	Low
Gemini 2.5 Pro	Complex reasoning, coding, multimodal	Moderate	Medium
Gemini 2.0 Flash	Simple tasks, very high speed	Very fast	Very low

Model Selection

// General purpose — great default
const flash = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });

// Complex reasoning
const pro = genAI.getGenerativeModel({ model: "gemini-2.5-pro" });

Basic Text Generation

Simple Prompt

import { GoogleGenerativeAI } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!);
const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });

const result = await model.generateContent("What is TypeScript?");
const text = result.response.text();

console.log(text);

With System Instructions

const model = genAI.getGenerativeModel({
  model: "gemini-2.5-flash",
  systemInstruction: "You are a senior software engineer. Give concise, practical answers with code examples.",
});

const result = await model.generateContent("How do I handle errors in async/await?");
console.log(result.response.text());

With Generation Config

const model = genAI.getGenerativeModel({
  model: "gemini-2.5-flash",
  generationConfig: {
    temperature: 0.7,
    topP: 0.9,
    maxOutputTokens: 1024,
  },
});

const result = await model.generateContent("Write a haiku about programming");
console.log(result.response.text());

Multi-Turn Conversations (Chat)

Gemini has a built-in chat interface that manages conversation history for you:

const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });

const chat = model.startChat({
  history: [],
});

// Turn 1
const result1 = await chat.sendMessage("What is a closure in JavaScript?");
console.log(result1.response.text());

// Turn 2 — automatically has context from Turn 1
const result2 = await chat.sendMessage("Can you show me a practical example?");
console.log(result2.response.text());

// Turn 3 — the conversation continues
const result3 = await chat.sendMessage("How does this relate to React hooks?");
console.log(result3.response.text());

This is simpler than OpenAI and Anthropic, where you need to manually track the message history.

Chat with Initial History

const chat = model.startChat({
  history: [
    {
      role: "user",
      parts: [{ text: "My name is Alex and I'm learning TypeScript." }],
    },
    {
      role: "model",
      parts: [{ text: "Nice to meet you, Alex! I'd be happy to help you learn TypeScript." }],
    },
  ],
});

const result = await chat.sendMessage("What should I learn first?");
// The model remembers the context about Alex and TypeScript

What to ask your AI: "Build a chatbot using the Gemini API with conversation history and system instructions."

Multimodal: Text + Images

One of Gemini's standout features is native multimodal support — you can send images alongside text:

Analyze an Image from a File

import { GoogleGenerativeAI } from "@google/generative-ai";
import * as fs from "fs";

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!);
const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });

// Read image as base64
const imageBuffer = fs.readFileSync("./screenshot.png");
const imageBase64 = imageBuffer.toString("base64");

const result = await model.generateContent([
  {
    inlineData: {
      mimeType: "image/png",
      data: imageBase64,
    },
  },
  "Describe what you see in this image. If it's code, explain what it does.",
]);

console.log(result.response.text());

Analyze an Image from a URL

const imageUrl = "https://example.com/chart.png";
const imageResponse = await fetch(imageUrl);
const imageArrayBuffer = await imageResponse.arrayBuffer();
const imageBase64 = Buffer.from(imageArrayBuffer).toString("base64");

const result = await model.generateContent([
  {
    inlineData: {
      mimeType: "image/png",
      data: imageBase64,
    },
  },
  "What data does this chart show? Summarize the key trends.",
]);

Multiple Images

const result = await model.generateContent([
  {
    inlineData: {
      mimeType: "image/png",
      data: image1Base64,
    },
  },
  {
    inlineData: {
      mimeType: "image/png",
      data: image2Base64,
    },
  },
  "Compare these two designs. Which one has better visual hierarchy and why?",
]);

What to ask your AI: "Build a feature that lets users upload an image and get an AI description using the Gemini API."

Gemini vs. OpenAI vs. Claude

Feature	OpenAI (GPT-4o)	Anthropic (Claude Sonnet)	Google (Gemini Flash)
SDK	`openai`	`@anthropic-ai/sdk`	`@google/generative-ai`
Env Variable	`OPENAI_API_KEY`	`ANTHROPIC_API_KEY`	Manual
System prompt	Message role	Top-level param	Model config
Chat history	Manual array	Manual array	Built-in chat object
Multimodal	Yes (images)	Yes (images)	Yes (images, audio, video)
Free tier	No	No	Yes (generous)
Streaming	Yes	Yes	Yes

When to Choose Gemini

Free tier needed — Gemini has a generous free tier for prototyping
Multimodal first — Best native support for images, audio, and video
Google ecosystem — If you're already using Firebase, Google Cloud, etc.
Long documents — Gemini has very large context windows

When to Choose OpenAI

Widest ecosystem — Most tutorials, tools, and integrations
Function calling — Most mature tool-use implementation
ChatGPT familiarity — Same models as the chat product

When to Choose Anthropic

Instruction following — Claude excels at following detailed instructions
Long, careful analysis — Strong at reading and analyzing long documents
Safety-conscious — Designed to be helpful, harmless, and honest

Error Handling

import { GoogleGenerativeAI } from "@google/generative-ai";

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!);
const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });

try {
  const result = await model.generateContent("Hello!");
  console.log(result.response.text());
} catch (error) {
  if (error instanceof Error) {
    console.error("Error:", error.message);

    if (error.message.includes("429")) {
      console.error("Rate limited — wait and retry");
    }
    if (error.message.includes("API_KEY")) {
      console.error("Invalid API key — check your GOOGLE_API_KEY");
    }
  }
}

What's Next?

Now that you know all three major providers, let's learn about streaming — sending responses to the user in real-time as they're generated.

What to ask your AI: "Help me build a multimodal app with Gemini that can analyze images and answer questions about them."

🌐 www.genai-mentor.ai

Anthropic Claude API

Streaming and Chat Completions