31. Practical Defenses You'll Actually Implement
Overview and links for this section of the guide.
On this page
What this section is for
Section 31 is a practical playbook: the defenses you will actually implement in real AI features.
These defenses are not “LLM magic.” They are normal engineering controls:
- input validation,
- schema enforcement,
- permissions checks,
- least privilege,
- secrets hygiene,
- testing and red-teaming.
The goal is to build a system that stays safe even when the model is confused, manipulated, or wrong.
Most defenses here are small, repeatable patterns. Once you build them once (validators, allowlists, logging policy), you reuse them across features.
Defense philosophy: deterministic controls around probabilistic systems
LLMs are probabilistic. Your defenses should be deterministic.
That means:
- don’t rely on the model to refuse, enforce refusal in code when needed;
- don’t rely on the model to sanitize, sanitize and validate deterministically;
- don’t rely on the model to obey permissions, enforce permissions before retrieval/tool calls;
- don’t rely on the model to keep secrets, keep secrets out of prompts and logs.
Defense layers (input → retrieval → model → tools → logs)
Think in layers so you don’t bet everything on one control:
- Input layer: sanitize, cap size, allowlist permitted actions.
- Retrieval layer: permissions filtering, corpus allowlists, authority weighting.
- Model layer: structured output, strict instructions, “sources are untrusted.”
- Tool layer: least privilege, schema validation, approvals, budgets.
- Logging layer: safe logging, redaction, retention, auditability.
Any single defense can fail: models ignore prompts, filters miss patterns, humans misconfigure access. Layers are how you make failures non-catastrophic.
How to implement defenses without slowing velocity
Practical approach:
- Start with a safe default template: structured outputs + validators + not-found behavior.
- Add a small allowlist layer: what tasks are allowed; what tools are allowed.
- Implement secrets hygiene early: prevent “oops we logged the key.”
- Write a red-team corpus: 25–100 adversarial cases you rerun on every change.
- Automate gates: schema, citations, permissions, budgets (Part IX style).
Section 31 map (31.1–31.5)
- 31.1 Input sanitization and allowlists
- 31.2 Output filtering and schema enforcement
- 31.3 Least-privilege tool design
- 31.4 Secrets handling: never in prompts, never in logs
- 31.5 Red-teaming your own prompts
Where to start
Explore next