Part VIII — Retrieval, Grounding, and "Don't Make Stuff Up" Engineering
Overview and links for this section of the guide.
On this page
What this part is for
Part VIII is where “don’t make stuff up” stops being a vibe and becomes an engineering discipline.
You’ll learn how to build systems that:
- retrieve relevant source material from a large corpus,
- generate answers constrained to those sources,
- prove faithfulness with citations and audit logs,
- handle uncertainty and conflict instead of bluffing.
Answers are grounded when every important claim can be traced to a source you provided, and the system behaves safely when the sources are missing or conflicting.
The core problem: plausible answers without proof
LLMs are good at producing text that sounds right. When you ask them questions about your docs, policies, or internal knowledge, two failures show up fast:
- Hallucination: the model fills gaps with plausible-sounding details.
- Ungrounded synthesis: the model merges multiple sources into one confident answer without telling you which parts came from where.
If you’re building a real product, these are not “edge cases.” They are the default failure mode unless you design against them.
Grounding: what it means (practically)
Grounding is a set of system behaviors, not a single prompt trick:
- Source selection: choose which documents/chunks are allowed to influence the answer.
- Source presentation: provide sources in a format the model can cite and follow.
- Answer constraints: force “answer only from sources” and “not found” behavior.
- Verification: validate output structure and citation presence; spot-check faithfulness.
- Logging: record which sources were used so you can debug and audit.
Dumping a giant document into the prompt increases cost and reduces precision. Grounding is about selecting the right sources and forcing traceability.
RAG in one sentence (and what it’s not)
Retrieval-Augmented Generation (RAG) is a pattern where you retrieve relevant source chunks at runtime and include them in the model prompt so the model can answer using those sources.
RAG is not:
- a guarantee of correctness (retrieval can fetch the wrong chunks),
- a replacement for evaluation (you still need tests and review),
- a substitute for access control (retrieval must respect permissions),
- an excuse to ignore prompt injection (documents can be “attacks”).
What you’ll be able to do after Part VIII
- Decide when you need RAG versus simpler approaches.
- Chunk documents effectively (size, overlap, metadata) for retrieval and citations.
- Understand embeddings well enough to debug retrieval quality issues.
- Improve retrieval with ranking, reranking, and query strategies.
- Write prompts that force the model to stay within retrieved context.
- Build a basic RAG app end-to-end (Project 2) with evaluation and maintenance.
- Add guardrails: “sources-only,” uncertainty UX, conflict handling, escalation, auditing.
The “grounded system” mental model
Think in layers. Each layer has its own failure modes and tests:
- Corpus: what documents exist? are they current? are they accessible?
- Chunking: how you cut docs into retrievable units (precision vs context).
- Retrieval: how you select candidate chunks for a query (recall vs speed).
- Ranking/reranking: how you pick the best chunks from candidates.
- Prompt composition: how you present sources and rules to the model.
- Answer format: structured output + per-claim citations + “not found.”
- UX guardrails: confidence, conflict, refusal, escalation, transparency.
- Observability: logs that tie each answer to sources and versions.
Most RAG problems are not model problems. They are pipeline problems.
Part VIII map (Sections 24–26)
- 24. Retrieval-Augmented Generation Basics
- 25. Building a RAG App (Project 2)
- 26. Guardrails for Grounded Systems
Where to go next
Explore next