24.1 When you need RAG (and when you don't)

Overview and links for this section of the guide.

Goal: decide when RAG is worth the complexity

RAG adds infrastructure and failure modes: chunking, embedding, storage, retrieval, ranking, evaluation, and logging.

The goal of this page is to help you make a clean decision:

Build RAG when it materially improves correctness and user trust. Don’t build it when a simpler workflow does the job.

Rule of thumb

If the answer needs to be grounded in a corpus that is too large to paste into the prompt reliably, you’re in RAG territory.

What RAG is good at solving

RAG is high leverage when:

  • Your knowledge lives in documents: policies, manuals, specs, tickets, runbooks, wikis.
  • The corpus is large or frequently updated: copying everything into prompts doesn’t scale.
  • Correctness matters: wrong answers are costly, risky, or reputationally damaging.
  • You need traceability: users ask “where did you get that?” and you must answer.
  • You need “not found” behavior: the system should refuse to guess when sources don’t support the claim.
  • You want to reduce hallucination: by narrowing the model’s allowed evidence.

Typical “yes, build RAG” situations:

  • Internal Q&A: answer questions about your company docs, with citations and access control.
  • Support enablement: suggest troubleshooting steps grounded in known articles and policies.
  • Developer docs assistants: answer questions about API behavior from versioned docs.
  • Compliance and policy guidance: answer “what does our policy say?” with direct quotes.

When RAG is a bad idea (or premature)

RAG is often the wrong tool when:

  • The task is not knowledge retrieval: e.g., code refactors, planning, brainstorming, UI scaffolding.
  • The corpus is tiny and stable: a single short doc can just be included directly in context.
  • You can tolerate approximation: e.g., creative writing, ideation, rough drafts.
  • You don’t need citations: users don’t require traceability.
  • You don’t have a maintenance plan: stale indexes and broken permissions will create outages and trust loss.
The most common “premature RAG” smell

You haven’t written a spec or an eval set yet, but you want to pick a vector database. Start with requirements and test questions.

The “complexity ladder” (start simple)

Don’t jump straight to embeddings. Use a ladder and climb only as needed:

  1. Manual paste: paste the relevant excerpt and demand citations. Great for prototyping.
  2. Curated context file: maintain a small “source of truth” doc for the system prompt.
  3. Chunk index + keyword search: split docs, then retrieve chunks with basic search.
  4. Embeddings retrieval: semantic search over chunks with filters.
  5. Hybrid search: combine keyword + embeddings for better recall and precision.
  6. Reranking: use a stronger model/reranker on top candidates.
  7. Evaluation + monitoring: continuous quality checks as the corpus evolves.
  8. Access control + auditing: required for real systems handling sensitive docs.

Many teams stop successfully at “chunk index + keyword search” for months.

A practical decision checklist

Answer these questions:

  • Corpus size: can you fit the relevant info in the context window consistently?
  • Update rate: do docs change weekly/daily? do answers need to reflect the latest version?
  • Correctness cost: what happens when the model is wrong?
  • Traceability: do you need citations/quotes? do users need to audit the answer?
  • Repetition: will you answer many questions over the same corpus (worth indexing)?
  • Permissions: do different users have access to different docs?
  • Ambiguity tolerance: can you say “not found” often, or must you always answer?

If you answer “yes” to most of these, RAG is probably worth it:

  • large corpus, frequent changes, high correctness cost, need citations, repeated use, permissions required.

Ship points

  • Ship point 1: you have a small eval set of real questions.
  • Ship point 2: retrieval returns obviously relevant chunks for those questions.
  • Ship point 3: answers include citations and “not found” works reliably.
  • Ship point 4: you log which chunks were used for each answer.

Copy-paste prompts

Prompt: should we build RAG?

We are considering building a RAG system. Help us decide.

Context:
- Users: [who will use it?]
- Corpus: [doc types, size, update rate]
- Required behavior: [citations? not-found? permissions?]
- Risk of wrong answers: [low/medium/high with examples]

Task:
1) Decide whether RAG is justified now, later, or not needed.
2) If justified later, propose a simpler “ladder step” we should do first.
3) List the top 5 failure modes we must plan for.
4) Propose 25 eval questions that represent real usage (no fluff).

Output as a checklist + decision summary.

Where to go next