Books/AI APIs & SDKs/API Keys, Costs, and Rate Limits

    API Keys, Costs, and Rate Limits

    API Keys, Costs, and Rate Limits

    Building with AI APIs means managing keys, understanding costs, and handling limits. This tutorial covers everything you need to know to use AI APIs responsibly and efficiently.

    Managing API Keys Securely

    The Golden Rule

    Never hardcode API keys in your source code. If your key ends up in a Git repository, bots will find it within minutes and rack up charges on your account.

    Environment Variables

    The standard approach is environment variables:

    # .env (local development)
    OPENAI_API_KEY=sk-proj-abc123...
    ANTHROPIC_API_KEY=sk-ant-abc123...
    GOOGLE_API_KEY=AIzaSy...
    // Access in your code
    const openai = new OpenAI(); // reads OPENAI_API_KEY automatically
    const anthropic = new Anthropic(); // reads ANTHROPIC_API_KEY automatically
    const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!);

    Always Create a .env.example

    # .env.example (commit this — shows what keys are needed without real values)
    OPENAI_API_KEY=sk-proj-your-key-here
    ANTHROPIC_API_KEY=sk-ant-your-key-here
    GOOGLE_API_KEY=your-google-ai-key-here

    Always Gitignore Your .env

    # .gitignore
    .env
    .env.local
    .env.*.local

    Key Security Checklist

    1. Never hardcode keys in source code
    2. Add .env to .gitignore BEFORE creating it
    3. Create a .env.example with placeholder values
    4. Use different keys for development and production
    5. Set spending limits on your API accounts
    6. Rotate keys regularly (every 90 days)
    7. If a key is exposed, revoke it immediately
    

    What to ask your AI: "Set up secure API key management for my Node.js project. I need keys for OpenAI, Anthropic, and Google AI."

    Environment Variables by Framework

    FrameworkFilePrefixAccess
    Node.js.envNoneprocess.env.KEY
    Next.js (server).env.localNoneprocess.env.KEY
    Next.js (client).env.localNEXT_PUBLIC_process.env.NEXT_PUBLIC_KEY
    Vite (client).envVITE_import.meta.env.VITE_KEY

    Important: AI API keys should NEVER be exposed on the client side. Always call AI APIs from your server (API routes, Cloud Functions, etc.).

    // WRONG — exposes key to the browser
    const openai = new OpenAI({
      apiKey: import.meta.env.VITE_OPENAI_KEY, // Anyone can see this!
      dangerouslyAllowBrowser: true,
    });
    
    // RIGHT — call from server-side API route
    // app/api/chat/route.ts (Next.js server)
    const openai = new OpenAI(); // Key stays on the server

    Understanding Pricing: Tokens and Models

    AI APIs charge based on tokens — the units of text the model processes.

    What is a Token?

    • ~4 characters in English
    • ~0.75 words
    • "Hello, world!" = ~4 tokens
    • A 500-word article = ~667 tokens
    • A typical chat message = 20-100 tokens

    Pricing by Provider (Approximate)

    Prices change frequently. Check each provider's pricing page for current rates.

    ModelInput (per 1M tokens)Output (per 1M tokens)
    GPT-4o$2.50$10.00
    GPT-4o-mini$0.15$0.60
    Claude Sonnet 4$3.00$15.00
    Claude Haiku$0.25$1.25
    Gemini 2.5 Flash$0.15$0.60
    Gemini 2.5 Pro$1.25$10.00

    What Does This Mean in Practice?

    TaskTokens UsedApprox. Cost (GPT-4o)
    Single chat message~200 total$0.001
    10-message conversation~2,000 total$0.01
    Summarize a blog post~1,500 total$0.008
    1,000 API calls/day~200,000 total$1.00

    For learning and development, costs are typically under $5/month. Production costs depend entirely on usage volume.

    Input vs. Output Tokens

    • Input tokens — Everything you send: system prompt + conversation history + user message
    • Output tokens — Everything the model generates in response

    Output tokens are more expensive (typically 3-5x) because they require more computation.

    What to ask your AI: "Estimate the monthly cost if my app makes [X] API calls per day with [Y] average tokens per call using [model]."

    Rate Limits and How to Handle Them

    Rate limits prevent any single user from overwhelming the API. When you hit a limit, you get a 429 (Too Many Requests) error.

    Types of Rate Limits

    Limit TypeWhat It Means
    Requests per minute (RPM)How many API calls you can make per minute
    Tokens per minute (TPM)How many total tokens you can process per minute
    Tokens per day (TPD)Daily token quota (some tiers)

    Rate Limits Vary by Tier

    Most providers have usage tiers. As you spend more, your limits increase:

    TierTypical RPMHow to Reach
    Free / Tier 160-500 RPMSign up
    Tier 2500-5,000 RPMSpend $50+
    Tier 35,000+ RPMSpend $500+

    Handling Rate Limits in Code

    async function callWithRetry<T>(
      fn: () => Promise<T>,
      maxRetries = 3,
      baseDelay = 1000
    ): Promise<T> {
      for (let attempt = 0; attempt <= maxRetries; attempt++) {
        try {
          return await fn();
        } catch (error: any) {
          if (error?.status === 429 && attempt < maxRetries) {
            // Exponential backoff: 1s, 2s, 4s
            const delay = baseDelay * Math.pow(2, attempt);
            console.log(`Rate limited. Retrying in ${delay}ms...`);
            await new Promise((resolve) => setTimeout(resolve, delay));
          } else {
            throw error;
          }
        }
      }
      throw new Error("Max retries exceeded");
    }
    
    // Usage
    const response = await callWithRetry(() =>
      openai.chat.completions.create({
        model: "gpt-4o",
        messages: [{ role: "user", content: "Hello!" }],
      })
    );

    Batch Processing with Rate Limiting

    async function processInBatches<T, R>(
      items: T[],
      processFn: (item: T) => Promise<R>,
      batchSize = 5,
      delayMs = 1000
    ): Promise<R[]> {
      const results: R[] = [];
    
      for (let i = 0; i < items.length; i += batchSize) {
        const batch = items.slice(i, i + batchSize);
        const batchResults = await Promise.all(batch.map(processFn));
        results.push(...batchResults);
    
        // Wait between batches to avoid rate limits
        if (i + batchSize < items.length) {
          await new Promise((resolve) => setTimeout(resolve, delayMs));
        }
      }
    
      return results;
    }
    
    // Process 100 items in batches of 5 with 1-second delays
    const prompts = ["prompt1", "prompt2", /* ... */];
    const results = await processInBatches(
      prompts,
      async (prompt) => {
        const res = await openai.chat.completions.create({
          model: "gpt-4o-mini",
          messages: [{ role: "user", content: prompt }],
        });
        return res.choices[0].message.content;
      },
      5,
      1000
    );

    What to ask your AI: "Add rate limiting and retry logic to my AI API calls. I'm making [X] requests per minute."

    Cost Optimization Strategies

    1. Use the Cheapest Model That Works

    Don't default to the most powerful model. Test with cheaper models first:

    // Start here — very cheap
    const model = "gpt-4o-mini";  // or "claude-haiku" or "gemini-2.5-flash"
    
    // Only upgrade if quality isn't good enough
    const model = "gpt-4o";  // or "claude-sonnet-4" or "gemini-2.5-pro"

    2. Optimize Your Prompts

    Shorter prompts = fewer input tokens = lower cost:

    // Expensive — long system prompt repeated every call
    const system = "You are an incredibly talented and experienced senior software engineer with over 20 years of experience in building scalable distributed systems. You have expertise in TypeScript, React, Node.js, cloud computing, databases, and system design. When answering questions, please provide comprehensive, well-structured responses...";
    
    // Cheaper — concise system prompt
    const system = "Senior software engineer. Give concise answers with code examples.";

    3. Cache Responses

    If the same questions come up often, cache the answers:

    const cache = new Map<string, string>();
    
    async function getCachedResponse(prompt: string): Promise<string> {
      if (cache.has(prompt)) {
        return cache.get(prompt)!;
      }
    
      const response = await openai.chat.completions.create({
        model: "gpt-4o-mini",
        messages: [{ role: "user", content: prompt }],
      });
    
      const text = response.choices[0].message.content!;
      cache.set(prompt, text);
      return text;
    }

    4. Set Max Tokens

    Prevent unexpectedly long (and expensive) responses:

    const response = await openai.chat.completions.create({
      model: "gpt-4o",
      messages: [{ role: "user", content: "Summarize this article" }],
      max_tokens: 300,  // Limit response length
    });

    5. Limit Conversation History

    Don't send the entire conversation every time — trim old messages:

    function trimHistory(
      messages: Message[],
      maxMessages = 20
    ): Message[] {
      if (messages.length <= maxMessages) return messages;
    
      // Keep the system message and the most recent messages
      const system = messages.filter((m) => m.role === "system");
      const recent = messages.filter((m) => m.role !== "system").slice(-maxMessages);
      return [...system, ...recent];
    }

    Monitoring Usage

    Check Usage Dashboards

    Set Spending Limits

    All providers let you set monthly spending limits. Always set these, especially when learning:

    • OpenAI: Settings → Limits → Set monthly budget
    • Anthropic: Settings → Spending → Set spending limit
    • Google: Free tier has built-in limits

    Track Usage in Code

    const response = await openai.chat.completions.create({
      model: "gpt-4o",
      messages: [{ role: "user", content: "Hello!" }],
    });
    
    // Log token usage
    console.log("Tokens used:", {
      input: response.usage?.prompt_tokens,
      output: response.usage?.completion_tokens,
      total: response.usage?.total_tokens,
    });

    What's Next?

    You now understand the practical side of AI APIs. The final tutorial is your AI APIs Cheat Sheet — a quick reference with side-by-side comparisons, code templates, and model selection guides.

    What to ask your AI: "Help me set up cost tracking and spending alerts for my AI API usage."


    🌐 www.genai-mentor.ai