Books/GenAI Fundamentals/The GenAI Landscape

The GenAI Landscape

The GenAI space is moving fast. New models drop every few weeks. This tutorial helps you understand the major players, compare their offerings, and make informed decisions about which models to use for your projects.

Major Model Providers

OpenAI

The company that started the current AI boom with ChatGPT. They offer the GPT family of models.

Flagship: GPT-4o (multimodal — text, image, audio)
Budget: GPT-4o mini (fast and cheap)
API: api.openai.com
Chat product: ChatGPT
Strengths: Broad general knowledge, strong coding, massive ecosystem
Pricing model: Pay per token

Anthropic

Founded by former OpenAI researchers, focused on AI safety. They build Claude.

Flagship: Claude 3.5 Sonnet (exceptional at coding and analysis)
Budget: Claude 3.5 Haiku (fast, affordable)
API: api.anthropic.com
Chat product: claude.ai
Strengths: Code generation, long-context understanding, safety, structured output
Pricing model: Pay per token

Google

Google's DeepMind division builds the Gemini family of models.

Flagship: Gemini 1.5 Pro (massive 1M+ token context)
Budget: Gemini 1.5 Flash (fast, cheap)
API: Google AI Studio / Vertex AI
Chat product: Gemini (gemini.google.com)
Strengths: Huge context window, multimodal, Google integration
Pricing model: Pay per token (generous free tier)

Mistral AI

A French AI company with strong open-source and commercial offerings.

Flagship: Mistral Large
Open source: Mistral 7B, Mixtral 8x7B
API: api.mistral.ai
Strengths: Efficient architectures, European data sovereignty, strong multilingual
Pricing model: Pay per token

Model Comparison Table

Model	Provider	Context Window	Best For	Open Source	Relative Cost
GPT-4o	OpenAI	128K	General purpose, multimodal	No	$$$
GPT-4o mini	OpenAI	128K	Budget-friendly tasks	No	$
Claude 3.5 Sonnet	Anthropic	200K	Coding, analysis, long docs	No	$$$
Claude 3.5 Haiku	Anthropic	200K	Fast responses, simple tasks	No	$
Gemini 1.5 Pro	Google	1M+	Very long context tasks	No	$$
Gemini 1.5 Flash	Google	1M+	Fast, budget multimodal	No	$
Llama 3.1 405B	Meta	128K	Self-hosting, no API costs	Yes	Free*
Llama 3 70B	Meta	8K	Good balance of quality/size	Yes	Free*
Mistral Large	Mistral	128K	European, multilingual	No	$$
Mixtral 8x7B	Mistral	32K	Open-source, efficient	Yes	Free*

*Free to download; compute costs apply if using cloud hosting.

Open Source vs Closed Source

Closed Source (Proprietary)

Models like GPT-4o and Claude 3.5 Sonnet are closed source — you can only access them through APIs.

Advantages:

Highest capability models (currently)
No infrastructure management
Always up-to-date
Easy to get started

Disadvantages:

Per-token costs add up
Data sent to third-party servers
Vendor lock-in
Rate limits
Model could change without notice

Open Source

Models like Llama 3, Mistral, and Mixtral can be downloaded and run yourself.

Advantages:

No per-token costs (only compute)
Full data privacy — runs on your servers
Customize and fine-tune for your use case
No rate limits
Reproducible — the model doesn't change

Disadvantages:

Lower capability than top closed models (though the gap is closing)
Need GPU infrastructure (expensive)
You handle updates, security, scaling
More technical setup

The Middle Ground: Open Models via APIs

Services like Together AI, Groq, Fireworks AI, and Replicate host open-source models for you — so you get the benefits of open models without managing infrastructure.

// Using Llama 3 via Together AI (same OpenAI-compatible format)
const response = await fetch("https://api.together.xyz/v1/chat/completions", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_TOGETHER_API_KEY",
  },
  body: JSON.stringify({
    model: "meta-llama/Llama-3-70b-chat-hf",
    messages: [{ role: "user", content: "Explain Docker in simple terms" }],
    temperature: 0.7,
    max_tokens: 1024,
  }),
});

When to Use Which Model

Decision Framework

What are you building?
│
├─ Production app with high quality needs?
│  ├─ Heavy coding/analysis → Claude 3.5 Sonnet
│  ├─ General purpose → GPT-4o
│  └─ Very long documents → Gemini 1.5 Pro
│
├─ Budget-conscious or high volume?
│  ├─ Simple tasks → GPT-4o mini or Claude 3.5 Haiku
│  ├─ Batch processing → Gemini 1.5 Flash
│  └─ Self-hosting option → Llama 3 70B
│
├─ Data privacy is critical?
│  ├─ Can manage infrastructure → Llama 3 (self-hosted)
│  └─ Need managed service → Azure OpenAI or GCP Vertex AI
│
└─ Experimenting / learning?
   └─ Start with → GPT-4o mini (cheapest with good quality)

Quick Recommendations

Scenario	Recommended Model	Why
Building a coding assistant	Claude 3.5 Sonnet	Best at code generation and analysis
Customer support chatbot	GPT-4o mini	Good quality, very affordable
Analyzing long legal documents	Gemini 1.5 Pro	1M+ token context window
Processing 100K+ requests/day	Llama 3 via Groq	Fast inference, predictable costs
Startup on a tight budget	GPT-4o mini	Cheapest with solid quality
Enterprise with data concerns	Llama 3 self-hosted	Full data control

Pricing Overview

Costs are per million tokens (as of 2024-2025). Prices change frequently — always check the provider's pricing page.

Input Token Pricing (per 1M tokens)

Model	Price
GPT-4o mini	$0.15
Gemini 1.5 Flash	$0.075
Claude 3.5 Haiku	$0.80
GPT-4o	$2.50
Claude 3.5 Sonnet	$3.00
Gemini 1.5 Pro	$1.25

What Does This Mean in Practice?

A chatbot handling 1,000 conversations/day:
  - Average 500 tokens input + 500 tokens output per conversation
  - Monthly: 1,000 × 30 × 1,000 = 30M tokens

  GPT-4o mini: ~$5-20/month
  GPT-4o:      ~$75-375/month
  Claude 3.5 Sonnet: ~$90-540/month

For most startups and side projects, costs are very manageable.

Multi-Model Strategy

Smart teams don't use just one model. They route different tasks to different models:

// Route to the best model for each task
function selectModel(task: string): string {
  switch (task) {
    case "code-generation":
      return "claude-3-5-sonnet-20241022";
    case "simple-classification":
      return "gpt-4o-mini";
    case "long-document-analysis":
      return "gemini-1.5-pro";
    case "creative-writing":
      return "gpt-4o";
    default:
      return "gpt-4o-mini"; // Default to cheapest
  }
}

Key Takeaways

The "best" model depends on your specific use case, budget, and requirements
Closed-source models (GPT-4o, Claude) offer the highest quality but cost per token
Open-source models (Llama, Mistral) offer privacy and cost control at scale
Start cheap (GPT-4o mini), upgrade when you need more capability
Consider a multi-model strategy — route tasks to the best model for each job
Prices are dropping rapidly — what's expensive today may be cheap in 6 months

What's Next?

Let's go deeper into how LLMs actually work — understanding attention, embeddings, and next-token prediction will help you reason about their capabilities and limitations.

What to ask your AI: "I'm building [describe your app]. Which AI model should I use and why? My budget is [amount] per month."

🌐 www.genai-mentor.ai

Tokens, Context Windows, and Temperature

How LLMs Actually Work

The GenAI Landscape

The GenAI Landscape

Major Model Providers

OpenAI

Anthropic

Google

Meta

Mistral AI

Model Comparison Table

Open Source vs Closed Source

Closed Source (Proprietary)

Open Source

The Middle Ground: Open Models via APIs

When to Use Which Model

Decision Framework

Quick Recommendations

Pricing Overview

Input Token Pricing (per 1M tokens)

What Does This Mean in Practice?

Multi-Model Strategy

Key Takeaways

What's Next?