Home/
Part VIII — Retrieval, Grounding, and "Don't Make Stuff Up" Engineering/25. Building a RAG App (Project 2)/25.5 Maintenance: updating the index and handling deletions
25.5 Maintenance: updating the index and handling deletions
Overview and links for this section of the guide.
On this page
Goal: keep the system correct as docs change
RAG systems degrade over time if you don’t maintain the index.
Maintenance is not optional because:
- documents change and old chunks become wrong,
- permissions change and access control must stay correct,
- embedding models and retrieval logic evolve,
- users notice stale answers quickly and lose trust.
Stale indexes create “confidently wrong with citations”
The system will cite outdated chunks and still look trustworthy. This is why maintenance needs process and monitoring.
Updating documents and re-indexing
Use hashes and versioning to drive incremental updates:
- doc_hash: detect doc changes.
- chunk_hash: detect which chunks changed.
- embedding_version: detect when you need re-embedding.
Incremental update flow:
- ingest and compute doc_hash,
- if unchanged: skip,
- if changed: re-chunk and compute chunk_hashes,
- embed only new/changed chunks,
- delete embeddings for removed chunks,
- update metadata and permissions tags.
Handling deletions safely
Deletions matter for correctness, privacy, and compliance.
Decisions to make:
- Hard delete vs tombstone: do you remove chunk text entirely or keep a minimal record?
- Propagation: how quickly must deletions stop influencing answers?
- Audit: do you need to prove deletion happened?
Practical safeguards:
- delete from vector index and chunk store (or mark inactive),
- ensure retrieval filters exclude inactive chunks,
- invalidate caches that might re-serve deleted content,
- run a “deletion smoke test” that queries for the deleted doc and confirms it is not retrievable.
Re-embedding strategy (model changes)
Embedding model changes are like schema migrations:
- Plan: decide when to migrate (cost, downtime, rollout strategy).
- Dual index: optionally build a new index alongside old, then switch.
- Backfill: embed all chunks with the new model, record version.
- Evaluate: run retrieval and answer evals before switching.
Never mix embeddings from different models in the same similarity space unless you know they’re compatible.
Monitoring and alerting
Monitor signals that indicate index health:
- retrieval empty rate: how often retrieval returns nothing.
- not_found rate: changes can indicate broken retrieval or corpus drift.
- citation rate: if citations disappear, your prompt or validator broke.
- latency: rising latency may indicate reranker issues or index size changes.
- permission violations: tests and audits for access control.
A maintenance runbook (checklist)
- Daily/weekly: ingest updates and run incremental indexing.
- After big doc changes: re-run eval set and spot-check citations.
- After embedding/ranking changes: run retrieval metrics + answer faithfulness review.
- On deletion request: delete chunks, invalidate caches, run deletion verification queries.
- Incident response: if answers are wrong, inspect logs: retrieved chunks, versions, prompts.