Home/ Part VIII — Retrieval, Grounding, and "Don't Make Stuff Up" Engineering

Part VIII — Retrieval, Grounding, and "Don't Make Stuff Up" Engineering

Overview and links for this section of the guide.

What this part is for

Part VIII is where “don’t make stuff up” stops being a vibe and becomes an engineering discipline.

You’ll learn how to build systems that:

  • retrieve relevant source material from a large corpus,
  • generate answers constrained to those sources,
  • prove faithfulness with citations and audit logs,
  • handle uncertainty and conflict instead of bluffing.
What “grounded” means here

Answers are grounded when every important claim can be traced to a source you provided, and the system behaves safely when the sources are missing or conflicting.

The core problem: plausible answers without proof

LLMs are good at producing text that sounds right. When you ask them questions about your docs, policies, or internal knowledge, two failures show up fast:

  • Hallucination: the model fills gaps with plausible-sounding details.
  • Ungrounded synthesis: the model merges multiple sources into one confident answer without telling you which parts came from where.

If you’re building a real product, these are not “edge cases.” They are the default failure mode unless you design against them.

Grounding: what it means (practically)

Grounding is a set of system behaviors, not a single prompt trick:

  • Source selection: choose which documents/chunks are allowed to influence the answer.
  • Source presentation: provide sources in a format the model can cite and follow.
  • Answer constraints: force “answer only from sources” and “not found” behavior.
  • Verification: validate output structure and citation presence; spot-check faithfulness.
  • Logging: record which sources were used so you can debug and audit.
Grounding is not “paste more context”

Dumping a giant document into the prompt increases cost and reduces precision. Grounding is about selecting the right sources and forcing traceability.

RAG in one sentence (and what it’s not)

Retrieval-Augmented Generation (RAG) is a pattern where you retrieve relevant source chunks at runtime and include them in the model prompt so the model can answer using those sources.

RAG is not:

  • a guarantee of correctness (retrieval can fetch the wrong chunks),
  • a replacement for evaluation (you still need tests and review),
  • a substitute for access control (retrieval must respect permissions),
  • an excuse to ignore prompt injection (documents can be “attacks”).

What you’ll be able to do after Part VIII

  • Decide when you need RAG versus simpler approaches.
  • Chunk documents effectively (size, overlap, metadata) for retrieval and citations.
  • Understand embeddings well enough to debug retrieval quality issues.
  • Improve retrieval with ranking, reranking, and query strategies.
  • Write prompts that force the model to stay within retrieved context.
  • Build a basic RAG app end-to-end (Project 2) with evaluation and maintenance.
  • Add guardrails: “sources-only,” uncertainty UX, conflict handling, escalation, auditing.

The “grounded system” mental model

Think in layers. Each layer has its own failure modes and tests:

  1. Corpus: what documents exist? are they current? are they accessible?
  2. Chunking: how you cut docs into retrievable units (precision vs context).
  3. Retrieval: how you select candidate chunks for a query (recall vs speed).
  4. Ranking/reranking: how you pick the best chunks from candidates.
  5. Prompt composition: how you present sources and rules to the model.
  6. Answer format: structured output + per-claim citations + “not found.”
  7. UX guardrails: confidence, conflict, refusal, escalation, transparency.
  8. Observability: logs that tie each answer to sources and versions.

Most RAG problems are not model problems. They are pipeline problems.

Part VIII map (Sections 24–26)

Where to go next