Books/Deploying AI Apps/Monitoring and Logging

    Monitoring and Logging

    Monitoring and Logging

    Your AI app is deployed and running. But how do you know if it's actually working? Are users getting errors? Is the AI responding too slowly? Are you burning through your API budget? Monitoring and logging answer these questions.

    Why Monitoring Matters for AI Apps

    AI apps have unique challenges that traditional web apps don't:

    ChallengeWhy It Matters
    API outagesOpenAI, Anthropic, and other providers sometimes go down
    Slow responsesAI API calls can take 2-30+ seconds
    Cost spikesA bug could trigger thousands of unnecessary API calls
    Quality degradationModel updates can change response quality
    Rate limitingYou might hit API rate limits without knowing
    Token overuseBadly constructed prompts waste tokens and money

    Without monitoring, you'll only know about problems when users complain — and by then, you might have a $500 API bill.

    Logging API Calls and Responses

    The most important thing to log in an AI app is every API call. Here's a logging wrapper pattern:

    // src/lib/aiLogger.ts
    interface AILogEntry {
      timestamp: string;
      model: string;
      promptTokens: number;
      completionTokens: number;
      totalTokens: number;
      latencyMs: number;
      status: "success" | "error";
      error?: string;
      userId?: string;
      endpoint: string;
    }
    
    export function logAICall(entry: AILogEntry): void {
      // Log to console (visible in hosting platform logs)
      console.log(JSON.stringify({
        type: "ai_api_call",
        ...entry,
      }));
    
      // Optionally: save to database for analytics
      // await db.collection("ai_logs").add(entry);
    }

    Wrapping Your AI API Calls

    // src/services/aiService.ts
    import OpenAI from "openai";
    import { logAICall } from "@/lib/aiLogger";
    
    const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
    
    export async function generateResponse(
      prompt: string,
      userId?: string
    ): Promise<string> {
      const startTime = Date.now();
    
      try {
        const response = await openai.chat.completions.create({
          model: "gpt-4o",
          messages: [{ role: "user", content: prompt }],
          max_tokens: 1000,
        });
    
        const latencyMs = Date.now() - startTime;
        const usage = response.usage;
    
        logAICall({
          timestamp: new Date().toISOString(),
          model: "gpt-4o",
          promptTokens: usage?.prompt_tokens ?? 0,
          completionTokens: usage?.completion_tokens ?? 0,
          totalTokens: usage?.total_tokens ?? 0,
          latencyMs,
          status: "success",
          userId,
          endpoint: "chat.completions",
        });
    
        return response.choices[0]?.message?.content ?? "";
      } catch (error) {
        const latencyMs = Date.now() - startTime;
    
        logAICall({
          timestamp: new Date().toISOString(),
          model: "gpt-4o",
          promptTokens: 0,
          completionTokens: 0,
          totalTokens: 0,
          latencyMs,
          status: "error",
          error: error instanceof Error ? error.message : "Unknown error",
          userId,
          endpoint: "chat.completions",
        });
    
        throw error;
      }
    }

    This pattern gives you a complete picture of every AI interaction: how long it took, how many tokens it used, and whether it succeeded.

    What to ask your AI: "Create a logging wrapper for my [OpenAI/Anthropic/Google AI] API calls that tracks latency, token usage, and errors."

    Error Tracking

    Console logs work, but dedicated error tracking tools give you much more: stack traces, user context, error frequency, and alerts.

    Sentry

    Sentry is the most popular error tracking tool. Free tier includes 5,000 errors/month.

    npm install @sentry/nextjs
    # or
    npm install @sentry/react
    // src/lib/sentry.ts
    import * as Sentry from "@sentry/nextjs";
    
    Sentry.init({
      dsn: process.env.NEXT_PUBLIC_SENTRY_DSN,
      environment: process.env.NODE_ENV,
      tracesSampleRate: 0.1, // Track 10% of transactions for performance
    });

    Now errors are automatically captured and sent to your Sentry dashboard with full context.

    Custom Error Handling for AI

    // src/lib/errorHandler.ts
    import * as Sentry from "@sentry/nextjs";
    
    export function handleAIError(error: unknown, context: Record<string, unknown>) {
      // Add context for debugging
      Sentry.withScope((scope) => {
        scope.setTag("service", "ai_api");
        scope.setContext("ai_call", context);
    
        if (error instanceof Error) {
          // Categorize common AI API errors
          if (error.message.includes("rate_limit")) {
            scope.setTag("error_type", "rate_limit");
          } else if (error.message.includes("insufficient_quota")) {
            scope.setTag("error_type", "quota_exceeded");
          } else if (error.message.includes("timeout")) {
            scope.setTag("error_type", "timeout");
          }
    
          Sentry.captureException(error);
        }
      });
    }

    LogRocket

    LogRocket records user sessions so you can replay exactly what happened when an error occurred. Great for debugging "it doesn't work" reports.

    npm install logrocket
    import LogRocket from "logrocket";
    LogRocket.init("your-app-id/your-project");
    
    // Identify users
    LogRocket.identify(userId, {
      name: user.name,
      email: user.email,
    });

    What to ask your AI: "Set up Sentry error tracking for my Next.js app. Include custom error handling for AI API failures."

    Performance Monitoring

    Slow AI responses kill user experience. Monitor performance to catch issues early.

    What to Track

    MetricTargetWhy
    AI API latency< 3 secondsUsers leave if responses are too slow
    Time to First Token< 1 secondFor streaming responses
    Page load time< 2 secondsStandard web performance
    API error rate< 1%Reliability target
    P95 latency< 10 secondsWorst-case experience

    Vercel Analytics

    If you're on Vercel, enable Web Analytics and Speed Insights:

    npm install @vercel/analytics @vercel/speed-insights
    // app/layout.tsx (Next.js)
    import { Analytics } from "@vercel/analytics/react";
    import { SpeedInsights } from "@vercel/speed-insights/next";
    
    export default function RootLayout({ children }) {
      return (
        <html>
          <body>
            {children}
            <Analytics />
            <SpeedInsights />
          </body>
        </html>
      );
    }

    Firebase Performance Monitoring

    import { getPerformance } from "firebase/performance";
    
    // Initialize
    const perf = getPerformance();
    
    // Custom trace for AI calls
    import { trace } from "firebase/performance";
    
    async function trackedAICall(prompt: string) {
      const t = trace(perf, "ai_response");
      t.start();
      t.putAttribute("model", "gpt-4o");
    
      const result = await generateResponse(prompt);
    
      t.putMetric("response_length", result.length);
      t.stop();
    
      return result;
    }

    AI-Specific Monitoring

    Beyond standard web monitoring, AI apps need specialized tracking.

    Token Usage Dashboard

    Build a simple dashboard to track token consumption:

    // src/services/tokenTracker.ts
    interface TokenUsage {
      date: string;
      model: string;
      promptTokens: number;
      completionTokens: number;
      estimatedCost: number;
    }
    
    const COSTS_PER_1K_TOKENS: Record<string, { input: number; output: number }> = {
      "gpt-4o": { input: 0.0025, output: 0.01 },
      "gpt-4o-mini": { input: 0.00015, output: 0.0006 },
      "claude-3-5-sonnet": { input: 0.003, output: 0.015 },
    };
    
    export function calculateCost(
      model: string,
      promptTokens: number,
      completionTokens: number
    ): number {
      const costs = COSTS_PER_1K_TOKENS[model];
      if (!costs) return 0;
    
      return (
        (promptTokens / 1000) * costs.input +
        (completionTokens / 1000) * costs.output
      );
    }

    Latency Monitoring

    Track how long AI calls take over time:

    // Log percentiles
    function trackLatency(latencies: number[]) {
      const sorted = [...latencies].sort((a, b) => a - b);
      const p50 = sorted[Math.floor(sorted.length * 0.5)];
      const p95 = sorted[Math.floor(sorted.length * 0.95)];
      const p99 = sorted[Math.floor(sorted.length * 0.99)];
    
      console.log(`Latency: p50=${p50}ms, p95=${p95}ms, p99=${p99}ms`);
    }

    Cost Alerts

    Set up alerts when spending crosses thresholds:

    // In your logging function
    const DAILY_BUDGET = 10; // $10 per day
    
    async function checkBudget() {
      const today = new Date().toISOString().split("T")[0];
      const todayLogs = await db.collection("ai_logs")
        .where("date", "==", today)
        .get();
    
      let totalCost = 0;
      todayLogs.forEach(doc => {
        totalCost += doc.data().estimatedCost;
      });
    
      if (totalCost > DAILY_BUDGET * 0.8) {
        // Send alert — 80% of daily budget used
        console.warn(`⚠️ AI spending alert: $${totalCost.toFixed(2)} of $${DAILY_BUDGET} daily budget used`);
        // Send email/Slack notification
      }
    }

    What to ask your AI: "Create a monitoring dashboard for my AI app that tracks token usage, costs, latency, and error rates by day."

    Setting Up Alerts

    Don't wait for users to tell you something is broken. Set up alerts:

    What to Alert On

    AlertThresholdAction
    Error rate spike> 5% of requestsCheck AI API status page
    High latencyP95 > 15 secondsCheck if model is overloaded
    Daily cost exceeded> 80% of budgetReview usage patterns
    AI API downHealth check failsSwitch to fallback or show maintenance page

    Simple Health Check Endpoint

    // app/api/health/route.ts (Next.js)
    import { NextResponse } from "next/server";
    
    export async function GET() {
      const checks = {
        server: "ok",
        aiApi: "unknown",
        database: "unknown",
      };
    
      // Check AI API
      try {
        const response = await fetch("https://api.openai.com/v1/models", {
          headers: { Authorization: `Bearer ${process.env.OPENAI_API_KEY}` },
        });
        checks.aiApi = response.ok ? "ok" : "degraded";
      } catch {
        checks.aiApi = "down";
      }
    
      const allOk = Object.values(checks).every(v => v === "ok");
    
      return NextResponse.json(checks, {
        status: allOk ? 200 : 503,
      });
    }

    Monitoring Checklist

    ✅ AI API calls are logged with latency, tokens, and status
    ✅ Error tracking is set up (Sentry or similar)
    ✅ Performance monitoring is active
    ✅ Token usage is tracked per user/day
    ✅ Cost estimates are calculated and logged
    ✅ Alerts are set for error spikes and budget thresholds
    ✅ Health check endpoint is available
    ✅ Logs are structured (JSON) for easy parsing
    

    What's Next?

    You know how to monitor your app. The next tutorial focuses on the business side: cost management and scaling — keeping your AI app affordable as it grows.

    What to ask your AI: "Help me set up a monitoring stack for my AI app. I want to track errors, performance, and AI API costs."


    🌐 www.genai-mentor.ai