24.1 When you need RAG (and when you don't)

On this page

Goal: decide when RAG is worth the complexity
What RAG is good at solving
When RAG is a bad idea (or premature)
The “complexity ladder” (start simple)
A practical decision checklist
Ship points
Copy-paste prompts
Where to go next

Goal: decide when RAG is worth the complexity

RAG adds infrastructure and failure modes: chunking, embedding, storage, retrieval, ranking, evaluation, and logging.

The goal of this page is to help you make a clean decision:

Build RAG when it materially improves correctness and user trust. Don’t build it when a simpler workflow does the job.

Rule of thumb

If the answer needs to be grounded in a corpus that is too large to paste into the prompt reliably, you’re in RAG territory.

What RAG is good at solving

RAG is high leverage when:

Your knowledge lives in documents: policies, manuals, specs, tickets, runbooks, wikis.
The corpus is large or frequently updated: copying everything into prompts doesn’t scale.
Correctness matters: wrong answers are costly, risky, or reputationally damaging.
You need traceability: users ask “where did you get that?” and you must answer.
You need “not found” behavior: the system should refuse to guess when sources don’t support the claim.
You want to reduce hallucination: by narrowing the model’s allowed evidence.

Typical “yes, build RAG” situations:

Internal Q&A: answer questions about your company docs, with citations and access control.
Support enablement: suggest troubleshooting steps grounded in known articles and policies.
Developer docs assistants: answer questions about API behavior from versioned docs.
Compliance and policy guidance: answer “what does our policy say?” with direct quotes.

When RAG is a bad idea (or premature)

RAG is often the wrong tool when:

The task is not knowledge retrieval: e.g., code refactors, planning, brainstorming, UI scaffolding.
The corpus is tiny and stable: a single short doc can just be included directly in context.
You can tolerate approximation: e.g., creative writing, ideation, rough drafts.
You don’t need citations: users don’t require traceability.
You don’t have a maintenance plan: stale indexes and broken permissions will create outages and trust loss.

The most common “premature RAG” smell

You haven’t written a spec or an eval set yet, but you want to pick a vector database. Start with requirements and test questions.

The “complexity ladder” (start simple)

Don’t jump straight to embeddings. Use a ladder and climb only as needed:

Manual paste: paste the relevant excerpt and demand citations. Great for prototyping.
Curated context file: maintain a small “source of truth” doc for the system prompt.
Chunk index + keyword search: split docs, then retrieve chunks with basic search.
Embeddings retrieval: semantic search over chunks with filters.
Hybrid search: combine keyword + embeddings for better recall and precision.
Reranking: use a stronger model/reranker on top candidates.
Evaluation + monitoring: continuous quality checks as the corpus evolves.
Access control + auditing: required for real systems handling sensitive docs.

Many teams stop successfully at “chunk index + keyword search” for months.

A practical decision checklist

Answer these questions:

Corpus size: can you fit the relevant info in the context window consistently?
Update rate: do docs change weekly/daily? do answers need to reflect the latest version?
Correctness cost: what happens when the model is wrong?
Traceability: do you need citations/quotes? do users need to audit the answer?
Repetition: will you answer many questions over the same corpus (worth indexing)?
Permissions: do different users have access to different docs?
Ambiguity tolerance: can you say “not found” often, or must you always answer?

If you answer “yes” to most of these, RAG is probably worth it:

large corpus, frequent changes, high correctness cost, need citations, repeated use, permissions required.

Ship points

Ship point 1: you have a small eval set of real questions.
Ship point 2: retrieval returns obviously relevant chunks for those questions.
Ship point 3: answers include citations and “not found” works reliably.
Ship point 4: you log which chunks were used for each answer.

Copy-paste prompts

Prompt: should we build RAG?

We are considering building a RAG system. Help us decide.

Context:
- Users: [who will use it?]
- Corpus: [doc types, size, update rate]
- Required behavior: [citations? not-found? permissions?]
- Risk of wrong answers: [low/medium/high with examples]

Task:
1) Decide whether RAG is justified now, later, or not needed.
2) If justified later, propose a simpler “ladder step” we should do first.
3) List the top 5 failure modes we must plan for.
4) Propose 25 eval questions that represent real usage (no fluff).

Output as a checklist + decision summary.

24.1 When you need RAG (and when you don't)

Goal: decide when RAG is worth the complexity

What RAG is good at solving

When RAG is a bad idea (or premature)

The “complexity ladder” (start simple)

A practical decision checklist

Ship points

Copy-paste prompts

Prompt: should we build RAG?

Where to go next