Books/GenAI Fundamentals/The GenAI Landscape

    The GenAI Landscape

    The GenAI Landscape

    The GenAI space is moving fast. New models drop every few weeks. This tutorial helps you understand the major players, compare their offerings, and make informed decisions about which models to use for your projects.

    Major Model Providers

    OpenAI

    The company that started the current AI boom with ChatGPT. They offer the GPT family of models.

    • Flagship: GPT-4o (multimodal — text, image, audio)
    • Budget: GPT-4o mini (fast and cheap)
    • API: api.openai.com
    • Chat product: ChatGPT
    • Strengths: Broad general knowledge, strong coding, massive ecosystem
    • Pricing model: Pay per token

    Anthropic

    Founded by former OpenAI researchers, focused on AI safety. They build Claude.

    • Flagship: Claude 3.5 Sonnet (exceptional at coding and analysis)
    • Budget: Claude 3.5 Haiku (fast, affordable)
    • API: api.anthropic.com
    • Chat product: claude.ai
    • Strengths: Code generation, long-context understanding, safety, structured output
    • Pricing model: Pay per token

    Google

    Google's DeepMind division builds the Gemini family of models.

    • Flagship: Gemini 1.5 Pro (massive 1M+ token context)
    • Budget: Gemini 1.5 Flash (fast, cheap)
    • API: Google AI Studio / Vertex AI
    • Chat product: Gemini (gemini.google.com)
    • Strengths: Huge context window, multimodal, Google integration
    • Pricing model: Pay per token (generous free tier)

    Meta

    Meta (Facebook) leads the open-source AI movement with Llama.

    • Flagship: Llama 3.1 405B
    • Popular: Llama 3 70B, Llama 3 8B
    • API: Via third-party providers (Together AI, Groq, Fireworks, etc.)
    • Chat product: meta.ai
    • Strengths: Open source, can self-host, no API costs if you run it yourself
    • Pricing model: Free to download; pay hosting providers if not self-hosting

    Mistral AI

    A French AI company with strong open-source and commercial offerings.

    • Flagship: Mistral Large
    • Open source: Mistral 7B, Mixtral 8x7B
    • API: api.mistral.ai
    • Strengths: Efficient architectures, European data sovereignty, strong multilingual
    • Pricing model: Pay per token

    Model Comparison Table

    ModelProviderContext WindowBest ForOpen SourceRelative Cost
    GPT-4oOpenAI128KGeneral purpose, multimodalNo$$$
    GPT-4o miniOpenAI128KBudget-friendly tasksNo$
    Claude 3.5 SonnetAnthropic200KCoding, analysis, long docsNo$$$
    Claude 3.5 HaikuAnthropic200KFast responses, simple tasksNo$
    Gemini 1.5 ProGoogle1M+Very long context tasksNo$$
    Gemini 1.5 FlashGoogle1M+Fast, budget multimodalNo$
    Llama 3.1 405BMeta128KSelf-hosting, no API costsYesFree*
    Llama 3 70BMeta8KGood balance of quality/sizeYesFree*
    Mistral LargeMistral128KEuropean, multilingualNo$$
    Mixtral 8x7BMistral32KOpen-source, efficientYesFree*

    *Free to download; compute costs apply if using cloud hosting.

    Open Source vs Closed Source

    Closed Source (Proprietary)

    Models like GPT-4o and Claude 3.5 Sonnet are closed source — you can only access them through APIs.

    Advantages:

    • Highest capability models (currently)
    • No infrastructure management
    • Always up-to-date
    • Easy to get started

    Disadvantages:

    • Per-token costs add up
    • Data sent to third-party servers
    • Vendor lock-in
    • Rate limits
    • Model could change without notice

    Open Source

    Models like Llama 3, Mistral, and Mixtral can be downloaded and run yourself.

    Advantages:

    • No per-token costs (only compute)
    • Full data privacy — runs on your servers
    • Customize and fine-tune for your use case
    • No rate limits
    • Reproducible — the model doesn't change

    Disadvantages:

    • Lower capability than top closed models (though the gap is closing)
    • Need GPU infrastructure (expensive)
    • You handle updates, security, scaling
    • More technical setup

    The Middle Ground: Open Models via APIs

    Services like Together AI, Groq, Fireworks AI, and Replicate host open-source models for you — so you get the benefits of open models without managing infrastructure.

    // Using Llama 3 via Together AI (same OpenAI-compatible format)
    const response = await fetch("https://api.together.xyz/v1/chat/completions", {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        "Authorization": "Bearer YOUR_TOGETHER_API_KEY",
      },
      body: JSON.stringify({
        model: "meta-llama/Llama-3-70b-chat-hf",
        messages: [{ role: "user", content: "Explain Docker in simple terms" }],
        temperature: 0.7,
        max_tokens: 1024,
      }),
    });

    When to Use Which Model

    Decision Framework

    What are you building?
    │
    ├─ Production app with high quality needs?
    │  ├─ Heavy coding/analysis → Claude 3.5 Sonnet
    │  ├─ General purpose → GPT-4o
    │  └─ Very long documents → Gemini 1.5 Pro
    │
    ├─ Budget-conscious or high volume?
    │  ├─ Simple tasks → GPT-4o mini or Claude 3.5 Haiku
    │  ├─ Batch processing → Gemini 1.5 Flash
    │  └─ Self-hosting option → Llama 3 70B
    │
    ├─ Data privacy is critical?
    │  ├─ Can manage infrastructure → Llama 3 (self-hosted)
    │  └─ Need managed service → Azure OpenAI or GCP Vertex AI
    │
    └─ Experimenting / learning?
       └─ Start with → GPT-4o mini (cheapest with good quality)
    

    Quick Recommendations

    ScenarioRecommended ModelWhy
    Building a coding assistantClaude 3.5 SonnetBest at code generation and analysis
    Customer support chatbotGPT-4o miniGood quality, very affordable
    Analyzing long legal documentsGemini 1.5 Pro1M+ token context window
    Processing 100K+ requests/dayLlama 3 via GroqFast inference, predictable costs
    Startup on a tight budgetGPT-4o miniCheapest with solid quality
    Enterprise with data concernsLlama 3 self-hostedFull data control

    Pricing Overview

    Costs are per million tokens (as of 2024-2025). Prices change frequently — always check the provider's pricing page.

    Input Token Pricing (per 1M tokens)

    ModelPrice
    GPT-4o mini$0.15
    Gemini 1.5 Flash$0.075
    Claude 3.5 Haiku$0.80
    GPT-4o$2.50
    Claude 3.5 Sonnet$3.00
    Gemini 1.5 Pro$1.25

    What Does This Mean in Practice?

    A chatbot handling 1,000 conversations/day:
      - Average 500 tokens input + 500 tokens output per conversation
      - Monthly: 1,000 × 30 × 1,000 = 30M tokens
    
      GPT-4o mini: ~$5-20/month
      GPT-4o:      ~$75-375/month
      Claude 3.5 Sonnet: ~$90-540/month
    
    For most startups and side projects, costs are very manageable.
    

    Multi-Model Strategy

    Smart teams don't use just one model. They route different tasks to different models:

    // Route to the best model for each task
    function selectModel(task: string): string {
      switch (task) {
        case "code-generation":
          return "claude-3-5-sonnet-20241022";
        case "simple-classification":
          return "gpt-4o-mini";
        case "long-document-analysis":
          return "gemini-1.5-pro";
        case "creative-writing":
          return "gpt-4o";
        default:
          return "gpt-4o-mini"; // Default to cheapest
      }
    }

    Key Takeaways

    • The "best" model depends on your specific use case, budget, and requirements
    • Closed-source models (GPT-4o, Claude) offer the highest quality but cost per token
    • Open-source models (Llama, Mistral) offer privacy and cost control at scale
    • Start cheap (GPT-4o mini), upgrade when you need more capability
    • Consider a multi-model strategy — route tasks to the best model for each job
    • Prices are dropping rapidly — what's expensive today may be cheap in 6 months

    What's Next?

    Let's go deeper into how LLMs actually work — understanding attention, embeddings, and next-token prediction will help you reason about their capabilities and limitations.

    What to ask your AI: "I'm building [describe your app]. Which AI model should I use and why? My budget is [amount] per month."


    🌐 www.genai-mentor.ai