Google Gemini API
Google Gemini API
Google's Gemini is a family of multimodal AI models that can process text, images, audio, and video. Let's learn how to use the Gemini API in your applications.
Setting Up the Google AI SDK
Install
npm install @google/generative-ai
Configure Your API Key
Get your key from aistudio.google.com/apikey, then add it to your .env:
GOOGLE_API_KEY=AIzaSy-your-key-here
Initialize the Client
import { GoogleGenerativeAI } from "@google/generative-ai"; const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!);
Unlike OpenAI and Anthropic, the Google SDK doesn't automatically read from environment variables — you need to pass the key explicitly.
Gemini Models Overview
| Model | Best For | Speed | Cost |
|---|---|---|---|
| Gemini 2.5 Flash | Most tasks, balanced quality and speed | Fast | Low |
| Gemini 2.5 Pro | Complex reasoning, coding, multimodal | Moderate | Medium |
| Gemini 2.0 Flash | Simple tasks, very high speed | Very fast | Very low |
Model Selection
// General purpose — great default const flash = genAI.getGenerativeModel({ model: "gemini-2.5-flash" }); // Complex reasoning const pro = genAI.getGenerativeModel({ model: "gemini-2.5-pro" });
Basic Text Generation
Simple Prompt
import { GoogleGenerativeAI } from "@google/generative-ai"; const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!); const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" }); const result = await model.generateContent("What is TypeScript?"); const text = result.response.text(); console.log(text);
With System Instructions
const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash", systemInstruction: "You are a senior software engineer. Give concise, practical answers with code examples.", }); const result = await model.generateContent("How do I handle errors in async/await?"); console.log(result.response.text());
With Generation Config
const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash", generationConfig: { temperature: 0.7, topP: 0.9, maxOutputTokens: 1024, }, }); const result = await model.generateContent("Write a haiku about programming"); console.log(result.response.text());
Multi-Turn Conversations (Chat)
Gemini has a built-in chat interface that manages conversation history for you:
const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" }); const chat = model.startChat({ history: [], }); // Turn 1 const result1 = await chat.sendMessage("What is a closure in JavaScript?"); console.log(result1.response.text()); // Turn 2 — automatically has context from Turn 1 const result2 = await chat.sendMessage("Can you show me a practical example?"); console.log(result2.response.text()); // Turn 3 — the conversation continues const result3 = await chat.sendMessage("How does this relate to React hooks?"); console.log(result3.response.text());
This is simpler than OpenAI and Anthropic, where you need to manually track the message history.
Chat with Initial History
const chat = model.startChat({ history: [ { role: "user", parts: [{ text: "My name is Alex and I'm learning TypeScript." }], }, { role: "model", parts: [{ text: "Nice to meet you, Alex! I'd be happy to help you learn TypeScript." }], }, ], }); const result = await chat.sendMessage("What should I learn first?"); // The model remembers the context about Alex and TypeScript
What to ask your AI: "Build a chatbot using the Gemini API with conversation history and system instructions."
Multimodal: Text + Images
One of Gemini's standout features is native multimodal support — you can send images alongside text:
Analyze an Image from a File
import { GoogleGenerativeAI } from "@google/generative-ai"; import * as fs from "fs"; const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!); const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" }); // Read image as base64 const imageBuffer = fs.readFileSync("./screenshot.png"); const imageBase64 = imageBuffer.toString("base64"); const result = await model.generateContent([ { inlineData: { mimeType: "image/png", data: imageBase64, }, }, "Describe what you see in this image. If it's code, explain what it does.", ]); console.log(result.response.text());
Analyze an Image from a URL
const imageUrl = "https://example.com/chart.png"; const imageResponse = await fetch(imageUrl); const imageArrayBuffer = await imageResponse.arrayBuffer(); const imageBase64 = Buffer.from(imageArrayBuffer).toString("base64"); const result = await model.generateContent([ { inlineData: { mimeType: "image/png", data: imageBase64, }, }, "What data does this chart show? Summarize the key trends.", ]);
Multiple Images
const result = await model.generateContent([ { inlineData: { mimeType: "image/png", data: image1Base64, }, }, { inlineData: { mimeType: "image/png", data: image2Base64, }, }, "Compare these two designs. Which one has better visual hierarchy and why?", ]);
What to ask your AI: "Build a feature that lets users upload an image and get an AI description using the Gemini API."
Gemini vs. OpenAI vs. Claude
| Feature | OpenAI (GPT-4o) | Anthropic (Claude Sonnet) | Google (Gemini Flash) |
|---|---|---|---|
| SDK | openai | @anthropic-ai/sdk | @google/generative-ai |
| Env Variable | OPENAI_API_KEY | ANTHROPIC_API_KEY | Manual |
| System prompt | Message role | Top-level param | Model config |
| Chat history | Manual array | Manual array | Built-in chat object |
| Multimodal | Yes (images) | Yes (images) | Yes (images, audio, video) |
| Free tier | No | No | Yes (generous) |
| Streaming | Yes | Yes | Yes |
When to Choose Gemini
- Free tier needed — Gemini has a generous free tier for prototyping
- Multimodal first — Best native support for images, audio, and video
- Google ecosystem — If you're already using Firebase, Google Cloud, etc.
- Long documents — Gemini has very large context windows
When to Choose OpenAI
- Widest ecosystem — Most tutorials, tools, and integrations
- Function calling — Most mature tool-use implementation
- ChatGPT familiarity — Same models as the chat product
When to Choose Anthropic
- Instruction following — Claude excels at following detailed instructions
- Long, careful analysis — Strong at reading and analyzing long documents
- Safety-conscious — Designed to be helpful, harmless, and honest
Error Handling
import { GoogleGenerativeAI } from "@google/generative-ai"; const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!); const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" }); try { const result = await model.generateContent("Hello!"); console.log(result.response.text()); } catch (error) { if (error instanceof Error) { console.error("Error:", error.message); if (error.message.includes("429")) { console.error("Rate limited — wait and retry"); } if (error.message.includes("API_KEY")) { console.error("Invalid API key — check your GOOGLE_API_KEY"); } } }
What's Next?
Now that you know all three major providers, let's learn about streaming — sending responses to the user in real-time as they're generated.
What to ask your AI: "Help me build a multimodal app with Gemini that can analyze images and answer questions about them."