Books/AI APIs & SDKs/API Keys, Costs, and Rate Limits

API Keys, Costs, and Rate Limits

Building with AI APIs means managing keys, understanding costs, and handling limits. This tutorial covers everything you need to know to use AI APIs responsibly and efficiently.

Managing API Keys Securely

The Golden Rule

Never hardcode API keys in your source code. If your key ends up in a Git repository, bots will find it within minutes and rack up charges on your account.

Environment Variables

The standard approach is environment variables:

# .env (local development)
OPENAI_API_KEY=sk-proj-abc123...
ANTHROPIC_API_KEY=sk-ant-abc123...
GOOGLE_API_KEY=AIzaSy...

// Access in your code
const openai = new OpenAI(); // reads OPENAI_API_KEY automatically
const anthropic = new Anthropic(); // reads ANTHROPIC_API_KEY automatically
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!);

Always Create a .env.example

# .env.example (commit this — shows what keys are needed without real values)
OPENAI_API_KEY=sk-proj-your-key-here
ANTHROPIC_API_KEY=sk-ant-your-key-here
GOOGLE_API_KEY=your-google-ai-key-here

Always Gitignore Your .env

# .gitignore
.env
.env.local
.env.*.local

Key Security Checklist

1. Never hardcode keys in source code
2. Add .env to .gitignore BEFORE creating it
3. Create a .env.example with placeholder values
4. Use different keys for development and production
5. Set spending limits on your API accounts
6. Rotate keys regularly (every 90 days)
7. If a key is exposed, revoke it immediately

What to ask your AI: "Set up secure API key management for my Node.js project. I need keys for OpenAI, Anthropic, and Google AI."

Environment Variables by Framework

Framework	File	Prefix	Access
Node.js	`.env`	None	`process.env.KEY`
Next.js (server)	`.env.local`	None	`process.env.KEY`
Next.js (client)	`.env.local`	`NEXT_PUBLIC_`	`process.env.NEXT_PUBLIC_KEY`
Vite (client)	`.env`	`VITE_`	`import.meta.env.VITE_KEY`

Important: AI API keys should NEVER be exposed on the client side. Always call AI APIs from your server (API routes, Cloud Functions, etc.).

// WRONG — exposes key to the browser
const openai = new OpenAI({
  apiKey: import.meta.env.VITE_OPENAI_KEY, // Anyone can see this!
  dangerouslyAllowBrowser: true,
});

// RIGHT — call from server-side API route
// app/api/chat/route.ts (Next.js server)
const openai = new OpenAI(); // Key stays on the server

Understanding Pricing: Tokens and Models

AI APIs charge based on tokens — the units of text the model processes.

What is a Token?

~4 characters in English
~0.75 words
"Hello, world!" = ~4 tokens
A 500-word article = ~667 tokens
A typical chat message = 20-100 tokens

Pricing by Provider (Approximate)

Prices change frequently. Check each provider's pricing page for current rates.

Model	Input (per 1M tokens)	Output (per 1M tokens)
GPT-4o	$2.50	$10.00
GPT-4o-mini	$0.15	$0.60
Claude Sonnet 4	$3.00	$15.00
Claude Haiku	$0.25	$1.25
Gemini 2.5 Flash	$0.15	$0.60
Gemini 2.5 Pro	$1.25	$10.00

What Does This Mean in Practice?

Task	Tokens Used	Approx. Cost (GPT-4o)
Single chat message	~200 total	$0.001
10-message conversation	~2,000 total	$0.01
Summarize a blog post	~1,500 total	$0.008
1,000 API calls/day	~200,000 total	$1.00

For learning and development, costs are typically under $5/month. Production costs depend entirely on usage volume.

Input vs. Output Tokens

Input tokens — Everything you send: system prompt + conversation history + user message
Output tokens — Everything the model generates in response

Output tokens are more expensive (typically 3-5x) because they require more computation.

What to ask your AI: "Estimate the monthly cost if my app makes [X] API calls per day with [Y] average tokens per call using [model]."

Rate Limits and How to Handle Them

Rate limits prevent any single user from overwhelming the API. When you hit a limit, you get a 429 (Too Many Requests) error.

Types of Rate Limits

Limit Type	What It Means
Requests per minute (RPM)	How many API calls you can make per minute
Tokens per minute (TPM)	How many total tokens you can process per minute
Tokens per day (TPD)	Daily token quota (some tiers)

Rate Limits Vary by Tier

Most providers have usage tiers. As you spend more, your limits increase:

Tier	Typical RPM	How to Reach
Free / Tier 1	60-500 RPM	Sign up
Tier 2	500-5,000 RPM	Spend $50+
Tier 3	5,000+ RPM	Spend $500+

Handling Rate Limits in Code

async function callWithRetry<T>(
  fn: () => Promise<T>,
  maxRetries = 3,
  baseDelay = 1000
): Promise<T> {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error: any) {
      if (error?.status === 429 && attempt < maxRetries) {
        // Exponential backoff: 1s, 2s, 4s
        const delay = baseDelay * Math.pow(2, attempt);
        console.log(`Rate limited. Retrying in ${delay}ms...`);
        await new Promise((resolve) => setTimeout(resolve, delay));
      } else {
        throw error;
      }
    }
  }
  throw new Error("Max retries exceeded");
}

// Usage
const response = await callWithRetry(() =>
  openai.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello!" }],
  })
);

Batch Processing with Rate Limiting

async function processInBatches<T, R>(
  items: T[],
  processFn: (item: T) => Promise<R>,
  batchSize = 5,
  delayMs = 1000
): Promise<R[]> {
  const results: R[] = [];

  for (let i = 0; i < items.length; i += batchSize) {
    const batch = items.slice(i, i + batchSize);
    const batchResults = await Promise.all(batch.map(processFn));
    results.push(...batchResults);

    // Wait between batches to avoid rate limits
    if (i + batchSize < items.length) {
      await new Promise((resolve) => setTimeout(resolve, delayMs));
    }
  }

  return results;
}

// Process 100 items in batches of 5 with 1-second delays
const prompts = ["prompt1", "prompt2", /* ... */];
const results = await processInBatches(
  prompts,
  async (prompt) => {
    const res = await openai.chat.completions.create({
      model: "gpt-4o-mini",
      messages: [{ role: "user", content: prompt }],
    });
    return res.choices[0].message.content;
  },
  5,
  1000
);

What to ask your AI: "Add rate limiting and retry logic to my AI API calls. I'm making [X] requests per minute."

Cost Optimization Strategies

1. Use the Cheapest Model That Works

Don't default to the most powerful model. Test with cheaper models first:

// Start here — very cheap
const model = "gpt-4o-mini";  // or "claude-haiku" or "gemini-2.5-flash"

// Only upgrade if quality isn't good enough
const model = "gpt-4o";  // or "claude-sonnet-4" or "gemini-2.5-pro"

2. Optimize Your Prompts

Shorter prompts = fewer input tokens = lower cost:

// Expensive — long system prompt repeated every call
const system = "You are an incredibly talented and experienced senior software engineer with over 20 years of experience in building scalable distributed systems. You have expertise in TypeScript, React, Node.js, cloud computing, databases, and system design. When answering questions, please provide comprehensive, well-structured responses...";

// Cheaper — concise system prompt
const system = "Senior software engineer. Give concise answers with code examples.";

3. Cache Responses

If the same questions come up often, cache the answers:

const cache = new Map<string, string>();

async function getCachedResponse(prompt: string): Promise<string> {
  if (cache.has(prompt)) {
    return cache.get(prompt)!;
  }

  const response = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [{ role: "user", content: prompt }],
  });

  const text = response.choices[0].message.content!;
  cache.set(prompt, text);
  return text;
}

4. Set Max Tokens

Prevent unexpectedly long (and expensive) responses:

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Summarize this article" }],
  max_tokens: 300,  // Limit response length
});

5. Limit Conversation History

Don't send the entire conversation every time — trim old messages:

function trimHistory(
  messages: Message[],
  maxMessages = 20
): Message[] {
  if (messages.length <= maxMessages) return messages;

  // Keep the system message and the most recent messages
  const system = messages.filter((m) => m.role === "system");
  const recent = messages.filter((m) => m.role !== "system").slice(-maxMessages);
  return [...system, ...recent];
}

Monitoring Usage

Check Usage Dashboards

Provider	Dashboard URL
OpenAI	platform.openai.com/usage
Anthropic	console.anthropic.com/settings/usage
Google	aistudio.google.com

Set Spending Limits

All providers let you set monthly spending limits. Always set these, especially when learning:

OpenAI: Settings → Limits → Set monthly budget
Anthropic: Settings → Spending → Set spending limit
Google: Free tier has built-in limits

Track Usage in Code

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello!" }],
});

// Log token usage
console.log("Tokens used:", {
  input: response.usage?.prompt_tokens,
  output: response.usage?.completion_tokens,
  total: response.usage?.total_tokens,
});

What's Next?

You now understand the practical side of AI APIs. The final tutorial is your AI APIs Cheat Sheet — a quick reference with side-by-side comparisons, code templates, and model selection guides.

What to ask your AI: "Help me set up cost tracking and spending alerts for my AI API usage."

🌐 www.genai-mentor.ai

Streaming and Chat Completions

AI APIs Cheat Sheet