24.3 Embeddings 101 for builders

On this page

Goal: understand embeddings well enough to debug retrieval
What embeddings are (builder-friendly)
Similarity search in practice
Common pitfalls that break retrieval
Practical tips for embedding pipelines
How to measure retrieval quality
Copy-paste prompts
Where to go next

Goal: understand embeddings well enough to debug retrieval

You don’t need to be a machine learning researcher to build RAG, but you do need a working mental model of embeddings.

The goal of this page is to make you dangerous enough to answer questions like:

“Why are we retrieving irrelevant chunks?”
“Why does the system miss obvious matches?”
“Why did retrieval get worse after we changed preprocessing?”
“How do we evaluate retrieval without guessing?”
“When should we use hybrid search or reranking?”

What embeddings are (builder-friendly)

An embedding model converts text into a vector (a long list of numbers).

The practical idea: texts with similar meaning end up “near” each other in vector space.

Chunk embedding: vector representing a document chunk.
Query embedding: vector representing the user’s question.
Similarity: a score that estimates how close the vectors are.

Retrieval usually means: embed the query, then find the nearest chunk vectors.

Embeddings are about similarity, not truth

Embeddings can retrieve text that’s semantically close but factually irrelevant. That’s why ranking and guardrails matter.

Similarity search in practice

Most systems do:

Store chunk vectors in a vector index.
Compute a query vector at runtime.
Run nearest-neighbor search (top-k).
Optionally rerank top candidates with a stronger model.
Include the best chunks in the prompt.

Important practical details:

Distance metric: cosine similarity, dot product, or L2 distance.
Approximate search: most vector stores use ANN for speed; results are “close enough.”
Filtering: apply metadata filters (permissions, doc types) before ranking.
Top-k tuning: retrieval k is not the same as “chunks to include in prompt.”

Common pitfalls that break retrieval

Embedding mismatch

Different models: chunk embeddings created with one model, query embeddings with another.
Different preprocessing: chunk text is normalized differently from query text.
Different languages: your embedding model may be weaker for non-English content.

Bad chunk content

Missing keywords: chunk lacks the terms users search for (headings missing).
Too much boilerplate: repeated headers/footers dominate embeddings.
Overlapping duplicates: overlap creates many near-identical chunks that crowd out diversity.

Query issues

Too short: “refunds?” provides little semantic signal.
Too specific in the wrong way: includes irrelevant details that bias similarity.
Ambiguous intent: multiple plausible meanings without disambiguation.

Missing or wrong metadata

No doc type tags: you retrieve tickets when you needed canonical policy.
No permissions tags: you either leak data or over-filter and retrieve nothing.
No versioning: you retrieve outdated chunks after an update.

If retrieval is wrong, prompts won’t save you

Prompting can improve faithfulness to retrieved text, but it can’t invent the missing source. Fix retrieval first.

Practical tips for embedding pipelines

Store raw text: always keep the exact chunk text alongside the embedding.
Deduplicate boilerplate: remove repeated headers/footers before embedding.
Keep chunk ids stable: citations and audits depend on stable references.
Log retrieval results: store top-k chunk ids and scores per query for debugging.
Batch embedding: embed chunks in batches and retry failures safely.
Version embeddings: record embedding model name/version and re-embed intentionally.

How to measure retrieval quality

Evaluation starts with an eval set of questions.

For each question, you can label:

relevant chunks (ideal), or
relevant documents (good enough), or
answerability (“should be answerable from corpus” vs “not found”).

Useful retrieval metrics:

Recall@k: is at least one relevant chunk in the top k?
MRR: how high does the first relevant chunk appear?
Precision@k: how many of the top k are actually relevant?

You don’t need perfect labels to get signal. Even coarse labels catch regressions.

Copy-paste prompts

Prompt: rewrite queries for better retrieval

Rewrite this user question to improve document retrieval.

Rules:
- Keep the user intent the same.
- Expand acronyms and include likely keywords/synonyms.
- Output 3 rewritten queries: one short, one medium, one explicit.

Question: [user question]

Prompt: label retrieved chunks (quick relevance audit)

I will give you a question and 10 retrieved chunks (with ids).

Task:
1) Label each chunk as: relevant / partially relevant / irrelevant.
2) Explain why (1 sentence each).
3) Recommend how to improve retrieval (query rewrite, metadata filter, chunking fix).

Return as a table.

24.3 Embeddings 101 for builders

Goal: understand embeddings well enough to debug retrieval

What embeddings are (builder-friendly)

Similarity search in practice

Common pitfalls that break retrieval

Embedding mismatch

Bad chunk content

Query issues

Missing or wrong metadata

Practical tips for embedding pipelines

How to measure retrieval quality

Copy-paste prompts

Prompt: rewrite queries for better retrieval

Prompt: label retrieved chunks (quick relevance audit)

Where to go next