Books/RAG Essentials/Understanding Embeddings

Understanding Embeddings

Embeddings are the foundation of RAG. Before you can search your documents for relevant information, you need to convert text into a format that machines can compare. That format is vectors — arrays of numbers that capture the meaning of text.

What Are Embeddings?

An embedding is a numerical representation of text in a high-dimensional space. Each piece of text — a sentence, a paragraph, a document — gets converted into a list of numbers (a vector). Texts with similar meanings end up with similar vectors.

Think of it like coordinates on a map. Just as two nearby points on a map are geographically close, two similar text embeddings are semantically close.

"How do I reset my password?"  →  [0.021, -0.134, 0.891, 0.045, ...]
"I forgot my login credentials" →  [0.019, -0.128, 0.887, 0.051, ...]
"What is the weather today?"    →  [-0.523, 0.412, 0.033, -0.298, ...]

Notice how the first two vectors are very similar (both are about account access), while the third is completely different (it is about weather).

How Text Becomes Numbers

The process works in three stages:

Tokenization — The text is split into tokens (words or subword pieces)
Neural network processing — The tokens pass through a trained transformer model
Pooling — The model outputs are combined into a single fixed-length vector

You do not need to understand the math in detail. What matters is the result: text goes in, a vector comes out, and similar texts produce similar vectors.

Input text: "How to train a puppy"
     │
     ▼
┌──────────────┐
│  Embedding   │
│    Model     │
└──────┬───────┘
       │
       ▼
Output: [0.023, -0.156, 0.734, 0.091, ..., -0.045]
        (1536 numbers for OpenAI text-embedding-3-small)

The length of the vector (its dimensionality) depends on the model:

Model	Dimensions	Provider
text-embedding-3-small	1536	OpenAI
text-embedding-3-large	3072	OpenAI
embed-english-v3.0	1024	Cohere
all-MiniLM-L6-v2	384	Open source (Sentence Transformers)
nomic-embed-text-v1.5	768	Nomic (open source)
BGE-large-en-v1.5	1024	BAAI (open source)

Semantic Similarity and Cosine Distance

Once you have embeddings, you compare them using cosine similarity. This measures how similar two vectors are on a scale from -1 to 1:

1.0 — Identical meaning
0.7-0.9 — Very similar
0.4-0.6 — Somewhat related
0.0 — Unrelated
-1.0 — Opposite meaning

from numpy import dot
from numpy.linalg import norm

def cosine_similarity(a, b):
    return dot(a, b) / (norm(a) * norm(b))

# Example
similarity = cosine_similarity(
    embedding_of("How do I reset my password?"),
    embedding_of("I forgot my login credentials")
)
# Result: ~0.91 (very similar!)

This is the core mechanic of RAG: when a user asks a question, you embed that question, then find the stored document embeddings with the highest cosine similarity.

Embedding Models: Choosing the Right One

OpenAI Embeddings

The most popular choice for getting started. Excellent quality, simple API, pay per token.

from openai import OpenAI

client = OpenAI()

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="How do I reset my password?"
)

embedding = response.data[0].embedding
print(f"Vector length: {len(embedding)}")  # 1536
print(f"First 5 values: {embedding[:5]}")

Cohere Embeddings

Strong alternative with a generous free tier. Supports different input types for better retrieval.

import cohere

co = cohere.ClientV2()

response = co.embed(
    texts=["How do I reset my password?"],
    model="embed-english-v3.0",
    input_type="search_query",           # or "search_document" for stored docs
    embedding_types=["float"]
)

embedding = response.embeddings.float_[0]
print(f"Vector length: {len(embedding)}")  # 1024

Open-Source Embeddings (Sentence Transformers)

Free, runs locally, no API key needed. Good for privacy-sensitive applications.

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

sentences = [
    "How do I reset my password?",
    "I forgot my login credentials",
    "What is the weather today?"
]

embeddings = model.encode(sentences)
print(f"Shape: {embeddings.shape}")  # (3, 384)

Code Example: Generating and Comparing Embeddings

Here is a complete example that generates embeddings and finds the most similar document to a query:

from openai import OpenAI
from numpy import dot
from numpy.linalg import norm

client = OpenAI()

def get_embedding(text: str) -> list[float]:
    """Generate an embedding for a piece of text."""
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

def cosine_similarity(a: list[float], b: list[float]) -> float:
    """Calculate cosine similarity between two vectors."""
    return dot(a, b) / (norm(a) * norm(b))

# Our "knowledge base" — in a real app these come from your documents
documents = [
    "To reset your password, go to Settings > Security > Change Password.",
    "Our refund policy allows returns within 30 days of purchase.",
    "Business hours are Monday through Friday, 9 AM to 5 PM EST.",
    "To upgrade your plan, visit the Billing section in your dashboard.",
]

# Pre-compute embeddings for all documents
doc_embeddings = [get_embedding(doc) for doc in documents]

# User asks a question
query = "How do I change my password?"
query_embedding = get_embedding(query)

# Find the most similar document
similarities = [
    cosine_similarity(query_embedding, doc_emb)
    for doc_emb in doc_embeddings
]

best_idx = similarities.index(max(similarities))
print(f"Best match (similarity {similarities[best_idx]:.3f}):")
print(f"  {documents[best_idx]}")
# Output: "To reset your password, go to Settings > Security > Change Password."

Best Practices for Embeddings

Use the same model for queries and documents — Mixing models produces incompatible vectors
Batch your embedding calls — Most APIs support embedding multiple texts at once, which is faster and cheaper
Cache embeddings — Store computed embeddings so you do not regenerate them every time
Consider dimensionality vs cost — Smaller models (384 dims) are faster and cheaper; larger ones (3072 dims) are more accurate
Normalize your text — Clean up formatting, remove excessive whitespace, and handle special characters before embedding

What to ask your AI: "Help me set up an embedding pipeline that processes my [documents/PDFs/web pages] and stores the embeddings for later retrieval."

What's Next?

You can now generate embeddings for any text. But where do you store millions of these vectors and search through them efficiently? That is the job of a vector database, which we cover next.

🌐 www.genai-mentor.ai

What is RAG and Why It Matters

Vector Databases Explained