Understanding Embeddings
Understanding Embeddings
Embeddings are the foundation of RAG. Before you can search your documents for relevant information, you need to convert text into a format that machines can compare. That format is vectors — arrays of numbers that capture the meaning of text.
What Are Embeddings?
An embedding is a numerical representation of text in a high-dimensional space. Each piece of text — a sentence, a paragraph, a document — gets converted into a list of numbers (a vector). Texts with similar meanings end up with similar vectors.
Think of it like coordinates on a map. Just as two nearby points on a map are geographically close, two similar text embeddings are semantically close.
"How do I reset my password?" → [0.021, -0.134, 0.891, 0.045, ...]
"I forgot my login credentials" → [0.019, -0.128, 0.887, 0.051, ...]
"What is the weather today?" → [-0.523, 0.412, 0.033, -0.298, ...]
Notice how the first two vectors are very similar (both are about account access), while the third is completely different (it is about weather).
How Text Becomes Numbers
The process works in three stages:
- Tokenization — The text is split into tokens (words or subword pieces)
- Neural network processing — The tokens pass through a trained transformer model
- Pooling — The model outputs are combined into a single fixed-length vector
You do not need to understand the math in detail. What matters is the result: text goes in, a vector comes out, and similar texts produce similar vectors.
Input text: "How to train a puppy"
│
▼
┌──────────────┐
│ Embedding │
│ Model │
└──────┬───────┘
│
▼
Output: [0.023, -0.156, 0.734, 0.091, ..., -0.045]
(1536 numbers for OpenAI text-embedding-3-small)
The length of the vector (its dimensionality) depends on the model:
| Model | Dimensions | Provider |
|---|---|---|
| text-embedding-3-small | 1536 | OpenAI |
| text-embedding-3-large | 3072 | OpenAI |
| embed-english-v3.0 | 1024 | Cohere |
| all-MiniLM-L6-v2 | 384 | Open source (Sentence Transformers) |
| nomic-embed-text-v1.5 | 768 | Nomic (open source) |
| BGE-large-en-v1.5 | 1024 | BAAI (open source) |
Semantic Similarity and Cosine Distance
Once you have embeddings, you compare them using cosine similarity. This measures how similar two vectors are on a scale from -1 to 1:
- 1.0 — Identical meaning
- 0.7-0.9 — Very similar
- 0.4-0.6 — Somewhat related
- 0.0 — Unrelated
- -1.0 — Opposite meaning
from numpy import dot from numpy.linalg import norm def cosine_similarity(a, b): return dot(a, b) / (norm(a) * norm(b)) # Example similarity = cosine_similarity( embedding_of("How do I reset my password?"), embedding_of("I forgot my login credentials") ) # Result: ~0.91 (very similar!)
This is the core mechanic of RAG: when a user asks a question, you embed that question, then find the stored document embeddings with the highest cosine similarity.
Embedding Models: Choosing the Right One
OpenAI Embeddings
The most popular choice for getting started. Excellent quality, simple API, pay per token.
from openai import OpenAI client = OpenAI() response = client.embeddings.create( model="text-embedding-3-small", input="How do I reset my password?" ) embedding = response.data[0].embedding print(f"Vector length: {len(embedding)}") # 1536 print(f"First 5 values: {embedding[:5]}")
Cohere Embeddings
Strong alternative with a generous free tier. Supports different input types for better retrieval.
import cohere co = cohere.ClientV2() response = co.embed( texts=["How do I reset my password?"], model="embed-english-v3.0", input_type="search_query", # or "search_document" for stored docs embedding_types=["float"] ) embedding = response.embeddings.float_[0] print(f"Vector length: {len(embedding)}") # 1024
Open-Source Embeddings (Sentence Transformers)
Free, runs locally, no API key needed. Good for privacy-sensitive applications.
from sentence_transformers import SentenceTransformer model = SentenceTransformer("all-MiniLM-L6-v2") sentences = [ "How do I reset my password?", "I forgot my login credentials", "What is the weather today?" ] embeddings = model.encode(sentences) print(f"Shape: {embeddings.shape}") # (3, 384)
Code Example: Generating and Comparing Embeddings
Here is a complete example that generates embeddings and finds the most similar document to a query:
from openai import OpenAI from numpy import dot from numpy.linalg import norm client = OpenAI() def get_embedding(text: str) -> list[float]: """Generate an embedding for a piece of text.""" response = client.embeddings.create( model="text-embedding-3-small", input=text ) return response.data[0].embedding def cosine_similarity(a: list[float], b: list[float]) -> float: """Calculate cosine similarity between two vectors.""" return dot(a, b) / (norm(a) * norm(b)) # Our "knowledge base" — in a real app these come from your documents documents = [ "To reset your password, go to Settings > Security > Change Password.", "Our refund policy allows returns within 30 days of purchase.", "Business hours are Monday through Friday, 9 AM to 5 PM EST.", "To upgrade your plan, visit the Billing section in your dashboard.", ] # Pre-compute embeddings for all documents doc_embeddings = [get_embedding(doc) for doc in documents] # User asks a question query = "How do I change my password?" query_embedding = get_embedding(query) # Find the most similar document similarities = [ cosine_similarity(query_embedding, doc_emb) for doc_emb in doc_embeddings ] best_idx = similarities.index(max(similarities)) print(f"Best match (similarity {similarities[best_idx]:.3f}):") print(f" {documents[best_idx]}") # Output: "To reset your password, go to Settings > Security > Change Password."
Best Practices for Embeddings
- Use the same model for queries and documents — Mixing models produces incompatible vectors
- Batch your embedding calls — Most APIs support embedding multiple texts at once, which is faster and cheaper
- Cache embeddings — Store computed embeddings so you do not regenerate them every time
- Consider dimensionality vs cost — Smaller models (384 dims) are faster and cheaper; larger ones (3072 dims) are more accurate
- Normalize your text — Clean up formatting, remove excessive whitespace, and handle special characters before embedding
What to ask your AI: "Help me set up an embedding pipeline that processes my [documents/PDFs/web pages] and stores the embeddings for later retrieval."
What's Next?
You can now generate embeddings for any text. But where do you store millions of these vectors and search through them efficiently? That is the job of a vector database, which we cover next.