Books/Deploying AI Apps/Monitoring and Logging

Monitoring and Logging

Your AI app is deployed and running. But how do you know if it's actually working? Are users getting errors? Is the AI responding too slowly? Are you burning through your API budget? Monitoring and logging answer these questions.

Why Monitoring Matters for AI Apps

AI apps have unique challenges that traditional web apps don't:

Challenge	Why It Matters
API outages	OpenAI, Anthropic, and other providers sometimes go down
Slow responses	AI API calls can take 2-30+ seconds
Cost spikes	A bug could trigger thousands of unnecessary API calls
Quality degradation	Model updates can change response quality
Rate limiting	You might hit API rate limits without knowing
Token overuse	Badly constructed prompts waste tokens and money

Without monitoring, you'll only know about problems when users complain — and by then, you might have a $500 API bill.

Logging API Calls and Responses

The most important thing to log in an AI app is every API call. Here's a logging wrapper pattern:

// src/lib/aiLogger.ts
interface AILogEntry {
  timestamp: string;
  model: string;
  promptTokens: number;
  completionTokens: number;
  totalTokens: number;
  latencyMs: number;
  status: "success" | "error";
  error?: string;
  userId?: string;
  endpoint: string;
}

export function logAICall(entry: AILogEntry): void {
  // Log to console (visible in hosting platform logs)
  console.log(JSON.stringify({
    type: "ai_api_call",
    ...entry,
  }));

  // Optionally: save to database for analytics
  // await db.collection("ai_logs").add(entry);
}

Wrapping Your AI API Calls

// src/services/aiService.ts
import OpenAI from "openai";
import { logAICall } from "@/lib/aiLogger";

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

export async function generateResponse(
  prompt: string,
  userId?: string
): Promise<string> {
  const startTime = Date.now();

  try {
    const response = await openai.chat.completions.create({
      model: "gpt-4o",
      messages: [{ role: "user", content: prompt }],
      max_tokens: 1000,
    });

    const latencyMs = Date.now() - startTime;
    const usage = response.usage;

    logAICall({
      timestamp: new Date().toISOString(),
      model: "gpt-4o",
      promptTokens: usage?.prompt_tokens ?? 0,
      completionTokens: usage?.completion_tokens ?? 0,
      totalTokens: usage?.total_tokens ?? 0,
      latencyMs,
      status: "success",
      userId,
      endpoint: "chat.completions",
    });

    return response.choices[0]?.message?.content ?? "";
  } catch (error) {
    const latencyMs = Date.now() - startTime;

    logAICall({
      timestamp: new Date().toISOString(),
      model: "gpt-4o",
      promptTokens: 0,
      completionTokens: 0,
      totalTokens: 0,
      latencyMs,
      status: "error",
      error: error instanceof Error ? error.message : "Unknown error",
      userId,
      endpoint: "chat.completions",
    });

    throw error;
  }
}

This pattern gives you a complete picture of every AI interaction: how long it took, how many tokens it used, and whether it succeeded.

What to ask your AI: "Create a logging wrapper for my [OpenAI/Anthropic/Google AI] API calls that tracks latency, token usage, and errors."

Error Tracking

Console logs work, but dedicated error tracking tools give you much more: stack traces, user context, error frequency, and alerts.

Sentry

Sentry is the most popular error tracking tool. Free tier includes 5,000 errors/month.

npm install @sentry/nextjs
# or
npm install @sentry/react

// src/lib/sentry.ts
import * as Sentry from "@sentry/nextjs";

Sentry.init({
  dsn: process.env.NEXT_PUBLIC_SENTRY_DSN,
  environment: process.env.NODE_ENV,
  tracesSampleRate: 0.1, // Track 10% of transactions for performance
});

Now errors are automatically captured and sent to your Sentry dashboard with full context.

Custom Error Handling for AI

// src/lib/errorHandler.ts
import * as Sentry from "@sentry/nextjs";

export function handleAIError(error: unknown, context: Record<string, unknown>) {
  // Add context for debugging
  Sentry.withScope((scope) => {
    scope.setTag("service", "ai_api");
    scope.setContext("ai_call", context);

    if (error instanceof Error) {
      // Categorize common AI API errors
      if (error.message.includes("rate_limit")) {
        scope.setTag("error_type", "rate_limit");
      } else if (error.message.includes("insufficient_quota")) {
        scope.setTag("error_type", "quota_exceeded");
      } else if (error.message.includes("timeout")) {
        scope.setTag("error_type", "timeout");
      }

      Sentry.captureException(error);
    }
  });
}

LogRocket

LogRocket records user sessions so you can replay exactly what happened when an error occurred. Great for debugging "it doesn't work" reports.

npm install logrocket

import LogRocket from "logrocket";
LogRocket.init("your-app-id/your-project");

// Identify users
LogRocket.identify(userId, {
  name: user.name,
  email: user.email,
});

What to ask your AI: "Set up Sentry error tracking for my Next.js app. Include custom error handling for AI API failures."

Performance Monitoring

Slow AI responses kill user experience. Monitor performance to catch issues early.

What to Track

Metric	Target	Why
AI API latency	< 3 seconds	Users leave if responses are too slow
Time to First Token	< 1 second	For streaming responses
Page load time	< 2 seconds	Standard web performance
API error rate	< 1%	Reliability target
P95 latency	< 10 seconds	Worst-case experience

Vercel Analytics

If you're on Vercel, enable Web Analytics and Speed Insights:

npm install @vercel/analytics @vercel/speed-insights

// app/layout.tsx (Next.js)
import { Analytics } from "@vercel/analytics/react";
import { SpeedInsights } from "@vercel/speed-insights/next";

export default function RootLayout({ children }) {
  return (
    <html>
      <body>
        {children}
        <Analytics />
        <SpeedInsights />
      </body>
    </html>
  );
}

Firebase Performance Monitoring

import { getPerformance } from "firebase/performance";

// Initialize
const perf = getPerformance();

// Custom trace for AI calls
import { trace } from "firebase/performance";

async function trackedAICall(prompt: string) {
  const t = trace(perf, "ai_response");
  t.start();
  t.putAttribute("model", "gpt-4o");

  const result = await generateResponse(prompt);

  t.putMetric("response_length", result.length);
  t.stop();

  return result;
}

AI-Specific Monitoring

Beyond standard web monitoring, AI apps need specialized tracking.

Token Usage Dashboard

Build a simple dashboard to track token consumption:

// src/services/tokenTracker.ts
interface TokenUsage {
  date: string;
  model: string;
  promptTokens: number;
  completionTokens: number;
  estimatedCost: number;
}

const COSTS_PER_1K_TOKENS: Record<string, { input: number; output: number }> = {
  "gpt-4o": { input: 0.0025, output: 0.01 },
  "gpt-4o-mini": { input: 0.00015, output: 0.0006 },
  "claude-3-5-sonnet": { input: 0.003, output: 0.015 },
};

export function calculateCost(
  model: string,
  promptTokens: number,
  completionTokens: number
): number {
  const costs = COSTS_PER_1K_TOKENS[model];
  if (!costs) return 0;

  return (
    (promptTokens / 1000) * costs.input +
    (completionTokens / 1000) * costs.output
  );
}

Latency Monitoring

Track how long AI calls take over time:

// Log percentiles
function trackLatency(latencies: number[]) {
  const sorted = [...latencies].sort((a, b) => a - b);
  const p50 = sorted[Math.floor(sorted.length * 0.5)];
  const p95 = sorted[Math.floor(sorted.length * 0.95)];
  const p99 = sorted[Math.floor(sorted.length * 0.99)];

  console.log(`Latency: p50=${p50}ms, p95=${p95}ms, p99=${p99}ms`);
}

Cost Alerts

Set up alerts when spending crosses thresholds:

// In your logging function
const DAILY_BUDGET = 10; // $10 per day

async function checkBudget() {
  const today = new Date().toISOString().split("T")[0];
  const todayLogs = await db.collection("ai_logs")
    .where("date", "==", today)
    .get();

  let totalCost = 0;
  todayLogs.forEach(doc => {
    totalCost += doc.data().estimatedCost;
  });

  if (totalCost > DAILY_BUDGET * 0.8) {
    // Send alert — 80% of daily budget used
    console.warn(`⚠️ AI spending alert: $${totalCost.toFixed(2)} of $${DAILY_BUDGET} daily budget used`);
    // Send email/Slack notification
  }
}

What to ask your AI: "Create a monitoring dashboard for my AI app that tracks token usage, costs, latency, and error rates by day."

Setting Up Alerts

Don't wait for users to tell you something is broken. Set up alerts:

What to Alert On

Alert	Threshold	Action
Error rate spike	> 5% of requests	Check AI API status page
High latency	P95 > 15 seconds	Check if model is overloaded
Daily cost exceeded	> 80% of budget	Review usage patterns
AI API down	Health check fails	Switch to fallback or show maintenance page

Simple Health Check Endpoint

// app/api/health/route.ts (Next.js)
import { NextResponse } from "next/server";

export async function GET() {
  const checks = {
    server: "ok",
    aiApi: "unknown",
    database: "unknown",
  };

  // Check AI API
  try {
    const response = await fetch("https://api.openai.com/v1/models", {
      headers: { Authorization: `Bearer ${process.env.OPENAI_API_KEY}` },
    });
    checks.aiApi = response.ok ? "ok" : "degraded";
  } catch {
    checks.aiApi = "down";
  }

  const allOk = Object.values(checks).every(v => v === "ok");

  return NextResponse.json(checks, {
    status: allOk ? 200 : 503,
  });
}

Monitoring Checklist

✅ AI API calls are logged with latency, tokens, and status
✅ Error tracking is set up (Sentry or similar)
✅ Performance monitoring is active
✅ Token usage is tracked per user/day
✅ Cost estimates are calculated and logged
✅ Alerts are set for error spikes and budget thresholds
✅ Health check endpoint is available
✅ Logs are structured (JSON) for easy parsing

What's Next?

You know how to monitor your app. The next tutorial focuses on the business side: cost management and scaling — keeping your AI app affordable as it grows.

What to ask your AI: "Help me set up a monitoring stack for my AI app. I want to track errors, performance, and AI API costs."

🌐 www.genai-mentor.ai

CI/CD for AI Apps

Cost Management and Scaling