API Keys, Costs, and Rate Limits
API Keys, Costs, and Rate Limits
Building with AI APIs means managing keys, understanding costs, and handling limits. This tutorial covers everything you need to know to use AI APIs responsibly and efficiently.
Managing API Keys Securely
The Golden Rule
Never hardcode API keys in your source code. If your key ends up in a Git repository, bots will find it within minutes and rack up charges on your account.
Environment Variables
The standard approach is environment variables:
# .env (local development) OPENAI_API_KEY=sk-proj-abc123... ANTHROPIC_API_KEY=sk-ant-abc123... GOOGLE_API_KEY=AIzaSy...
// Access in your code const openai = new OpenAI(); // reads OPENAI_API_KEY automatically const anthropic = new Anthropic(); // reads ANTHROPIC_API_KEY automatically const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!);
Always Create a .env.example
# .env.example (commit this — shows what keys are needed without real values) OPENAI_API_KEY=sk-proj-your-key-here ANTHROPIC_API_KEY=sk-ant-your-key-here GOOGLE_API_KEY=your-google-ai-key-here
Always Gitignore Your .env
# .gitignore .env .env.local .env.*.local
Key Security Checklist
1. Never hardcode keys in source code
2. Add .env to .gitignore BEFORE creating it
3. Create a .env.example with placeholder values
4. Use different keys for development and production
5. Set spending limits on your API accounts
6. Rotate keys regularly (every 90 days)
7. If a key is exposed, revoke it immediately
What to ask your AI: "Set up secure API key management for my Node.js project. I need keys for OpenAI, Anthropic, and Google AI."
Environment Variables by Framework
| Framework | File | Prefix | Access |
|---|---|---|---|
| Node.js | .env | None | process.env.KEY |
| Next.js (server) | .env.local | None | process.env.KEY |
| Next.js (client) | .env.local | NEXT_PUBLIC_ | process.env.NEXT_PUBLIC_KEY |
| Vite (client) | .env | VITE_ | import.meta.env.VITE_KEY |
Important: AI API keys should NEVER be exposed on the client side. Always call AI APIs from your server (API routes, Cloud Functions, etc.).
// WRONG — exposes key to the browser const openai = new OpenAI({ apiKey: import.meta.env.VITE_OPENAI_KEY, // Anyone can see this! dangerouslyAllowBrowser: true, }); // RIGHT — call from server-side API route // app/api/chat/route.ts (Next.js server) const openai = new OpenAI(); // Key stays on the server
Understanding Pricing: Tokens and Models
AI APIs charge based on tokens — the units of text the model processes.
What is a Token?
- ~4 characters in English
- ~0.75 words
- "Hello, world!" = ~4 tokens
- A 500-word article = ~667 tokens
- A typical chat message = 20-100 tokens
Pricing by Provider (Approximate)
Prices change frequently. Check each provider's pricing page for current rates.
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| GPT-4o | $2.50 | $10.00 |
| GPT-4o-mini | $0.15 | $0.60 |
| Claude Sonnet 4 | $3.00 | $15.00 |
| Claude Haiku | $0.25 | $1.25 |
| Gemini 2.5 Flash | $0.15 | $0.60 |
| Gemini 2.5 Pro | $1.25 | $10.00 |
What Does This Mean in Practice?
| Task | Tokens Used | Approx. Cost (GPT-4o) |
|---|---|---|
| Single chat message | ~200 total | $0.001 |
| 10-message conversation | ~2,000 total | $0.01 |
| Summarize a blog post | ~1,500 total | $0.008 |
| 1,000 API calls/day | ~200,000 total | $1.00 |
For learning and development, costs are typically under $5/month. Production costs depend entirely on usage volume.
Input vs. Output Tokens
- Input tokens — Everything you send: system prompt + conversation history + user message
- Output tokens — Everything the model generates in response
Output tokens are more expensive (typically 3-5x) because they require more computation.
What to ask your AI: "Estimate the monthly cost if my app makes [X] API calls per day with [Y] average tokens per call using [model]."
Rate Limits and How to Handle Them
Rate limits prevent any single user from overwhelming the API. When you hit a limit, you get a 429 (Too Many Requests) error.
Types of Rate Limits
| Limit Type | What It Means |
|---|---|
| Requests per minute (RPM) | How many API calls you can make per minute |
| Tokens per minute (TPM) | How many total tokens you can process per minute |
| Tokens per day (TPD) | Daily token quota (some tiers) |
Rate Limits Vary by Tier
Most providers have usage tiers. As you spend more, your limits increase:
| Tier | Typical RPM | How to Reach |
|---|---|---|
| Free / Tier 1 | 60-500 RPM | Sign up |
| Tier 2 | 500-5,000 RPM | Spend $50+ |
| Tier 3 | 5,000+ RPM | Spend $500+ |
Handling Rate Limits in Code
async function callWithRetry<T>( fn: () => Promise<T>, maxRetries = 3, baseDelay = 1000 ): Promise<T> { for (let attempt = 0; attempt <= maxRetries; attempt++) { try { return await fn(); } catch (error: any) { if (error?.status === 429 && attempt < maxRetries) { // Exponential backoff: 1s, 2s, 4s const delay = baseDelay * Math.pow(2, attempt); console.log(`Rate limited. Retrying in ${delay}ms...`); await new Promise((resolve) => setTimeout(resolve, delay)); } else { throw error; } } } throw new Error("Max retries exceeded"); } // Usage const response = await callWithRetry(() => openai.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], }) );
Batch Processing with Rate Limiting
async function processInBatches<T, R>( items: T[], processFn: (item: T) => Promise<R>, batchSize = 5, delayMs = 1000 ): Promise<R[]> { const results: R[] = []; for (let i = 0; i < items.length; i += batchSize) { const batch = items.slice(i, i + batchSize); const batchResults = await Promise.all(batch.map(processFn)); results.push(...batchResults); // Wait between batches to avoid rate limits if (i + batchSize < items.length) { await new Promise((resolve) => setTimeout(resolve, delayMs)); } } return results; } // Process 100 items in batches of 5 with 1-second delays const prompts = ["prompt1", "prompt2", /* ... */]; const results = await processInBatches( prompts, async (prompt) => { const res = await openai.chat.completions.create({ model: "gpt-4o-mini", messages: [{ role: "user", content: prompt }], }); return res.choices[0].message.content; }, 5, 1000 );
What to ask your AI: "Add rate limiting and retry logic to my AI API calls. I'm making [X] requests per minute."
Cost Optimization Strategies
1. Use the Cheapest Model That Works
Don't default to the most powerful model. Test with cheaper models first:
// Start here — very cheap const model = "gpt-4o-mini"; // or "claude-haiku" or "gemini-2.5-flash" // Only upgrade if quality isn't good enough const model = "gpt-4o"; // or "claude-sonnet-4" or "gemini-2.5-pro"
2. Optimize Your Prompts
Shorter prompts = fewer input tokens = lower cost:
// Expensive — long system prompt repeated every call const system = "You are an incredibly talented and experienced senior software engineer with over 20 years of experience in building scalable distributed systems. You have expertise in TypeScript, React, Node.js, cloud computing, databases, and system design. When answering questions, please provide comprehensive, well-structured responses..."; // Cheaper — concise system prompt const system = "Senior software engineer. Give concise answers with code examples.";
3. Cache Responses
If the same questions come up often, cache the answers:
const cache = new Map<string, string>(); async function getCachedResponse(prompt: string): Promise<string> { if (cache.has(prompt)) { return cache.get(prompt)!; } const response = await openai.chat.completions.create({ model: "gpt-4o-mini", messages: [{ role: "user", content: prompt }], }); const text = response.choices[0].message.content!; cache.set(prompt, text); return text; }
4. Set Max Tokens
Prevent unexpectedly long (and expensive) responses:
const response = await openai.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Summarize this article" }], max_tokens: 300, // Limit response length });
5. Limit Conversation History
Don't send the entire conversation every time — trim old messages:
function trimHistory( messages: Message[], maxMessages = 20 ): Message[] { if (messages.length <= maxMessages) return messages; // Keep the system message and the most recent messages const system = messages.filter((m) => m.role === "system"); const recent = messages.filter((m) => m.role !== "system").slice(-maxMessages); return [...system, ...recent]; }
Monitoring Usage
Check Usage Dashboards
| Provider | Dashboard URL |
|---|---|
| OpenAI | platform.openai.com/usage |
| Anthropic | console.anthropic.com/settings/usage |
| aistudio.google.com |
Set Spending Limits
All providers let you set monthly spending limits. Always set these, especially when learning:
- OpenAI: Settings → Limits → Set monthly budget
- Anthropic: Settings → Spending → Set spending limit
- Google: Free tier has built-in limits
Track Usage in Code
const response = await openai.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello!" }], }); // Log token usage console.log("Tokens used:", { input: response.usage?.prompt_tokens, output: response.usage?.completion_tokens, total: response.usage?.total_tokens, });
What's Next?
You now understand the practical side of AI APIs. The final tutorial is your AI APIs Cheat Sheet — a quick reference with side-by-side comparisons, code templates, and model selection guides.
What to ask your AI: "Help me set up cost tracking and spending alerts for my AI API usage."