24.1 When you need RAG (and when you don't)
Overview and links for this section of the guide.
On this page
Goal: decide when RAG is worth the complexity
RAG adds infrastructure and failure modes: chunking, embedding, storage, retrieval, ranking, evaluation, and logging.
The goal of this page is to help you make a clean decision:
Build RAG when it materially improves correctness and user trust. Don’t build it when a simpler workflow does the job.
If the answer needs to be grounded in a corpus that is too large to paste into the prompt reliably, you’re in RAG territory.
What RAG is good at solving
RAG is high leverage when:
- Your knowledge lives in documents: policies, manuals, specs, tickets, runbooks, wikis.
- The corpus is large or frequently updated: copying everything into prompts doesn’t scale.
- Correctness matters: wrong answers are costly, risky, or reputationally damaging.
- You need traceability: users ask “where did you get that?” and you must answer.
- You need “not found” behavior: the system should refuse to guess when sources don’t support the claim.
- You want to reduce hallucination: by narrowing the model’s allowed evidence.
Typical “yes, build RAG” situations:
- Internal Q&A: answer questions about your company docs, with citations and access control.
- Support enablement: suggest troubleshooting steps grounded in known articles and policies.
- Developer docs assistants: answer questions about API behavior from versioned docs.
- Compliance and policy guidance: answer “what does our policy say?” with direct quotes.
When RAG is a bad idea (or premature)
RAG is often the wrong tool when:
- The task is not knowledge retrieval: e.g., code refactors, planning, brainstorming, UI scaffolding.
- The corpus is tiny and stable: a single short doc can just be included directly in context.
- You can tolerate approximation: e.g., creative writing, ideation, rough drafts.
- You don’t need citations: users don’t require traceability.
- You don’t have a maintenance plan: stale indexes and broken permissions will create outages and trust loss.
You haven’t written a spec or an eval set yet, but you want to pick a vector database. Start with requirements and test questions.
The “complexity ladder” (start simple)
Don’t jump straight to embeddings. Use a ladder and climb only as needed:
- Manual paste: paste the relevant excerpt and demand citations. Great for prototyping.
- Curated context file: maintain a small “source of truth” doc for the system prompt.
- Chunk index + keyword search: split docs, then retrieve chunks with basic search.
- Embeddings retrieval: semantic search over chunks with filters.
- Hybrid search: combine keyword + embeddings for better recall and precision.
- Reranking: use a stronger model/reranker on top candidates.
- Evaluation + monitoring: continuous quality checks as the corpus evolves.
- Access control + auditing: required for real systems handling sensitive docs.
Many teams stop successfully at “chunk index + keyword search” for months.
A practical decision checklist
Answer these questions:
- Corpus size: can you fit the relevant info in the context window consistently?
- Update rate: do docs change weekly/daily? do answers need to reflect the latest version?
- Correctness cost: what happens when the model is wrong?
- Traceability: do you need citations/quotes? do users need to audit the answer?
- Repetition: will you answer many questions over the same corpus (worth indexing)?
- Permissions: do different users have access to different docs?
- Ambiguity tolerance: can you say “not found” often, or must you always answer?
If you answer “yes” to most of these, RAG is probably worth it:
- large corpus, frequent changes, high correctness cost, need citations, repeated use, permissions required.
Ship points
- Ship point 1: you have a small eval set of real questions.
- Ship point 2: retrieval returns obviously relevant chunks for those questions.
- Ship point 3: answers include citations and “not found” works reliably.
- Ship point 4: you log which chunks were used for each answer.
Copy-paste prompts
Prompt: should we build RAG?
We are considering building a RAG system. Help us decide.
Context:
- Users: [who will use it?]
- Corpus: [doc types, size, update rate]
- Required behavior: [citations? not-found? permissions?]
- Risk of wrong answers: [low/medium/high with examples]
Task:
1) Decide whether RAG is justified now, later, or not needed.
2) If justified later, propose a simpler “ladder step” we should do first.
3) List the top 5 failure modes we must plan for.
4) Propose 25 eval questions that represent real usage (no fluff).
Output as a checklist + decision summary.