31.2 Output filtering and schema enforcement
Overview and links for this section of the guide.
On this page
- Goal: treat model output as untrusted input
- Schema enforcement: your strongest “no hallucination” tool
- Validation pipeline (parse → validate → sanitize)
- Output filtering: what to block or redact
- Citation integrity checks (for RAG)
- Retries and fallbacks (fail closed)
- Template: safe output contract
- Where to go next
Goal: treat model output as untrusted input
Model output is not “trusted application state.” It is untrusted data that must be validated before it can:
- be rendered to users,
- be written to a database,
- be used to call tools,
- be used to make product decisions.
Your job is to enforce a contract that makes unsafe outputs harmless.
If the model output is used as code, SQL, or shell commands, you must add strong validation and sandboxing. Prefer “proposal-only” patterns where humans approve actions.
Schema enforcement: your strongest “no hallucination” tool
Schema enforcement doesn’t stop hallucination of content, but it stops hallucination of structure and makes the system testable.
Practical benefits:
- your app can parse outputs reliably,
- you can validate required fields,
- you can reject unknown keys or dangerous fields,
- you can enforce “not found,” “conflict,” and “refused” states explicitly.
Good schema design rules:
- Make abstention explicit: include
statusandnot_foundfields. - Use enums: constrain modes and statuses.
- Allow nulls for unknowns: don’t force guessing.
- Keep it small: giant schemas are brittle.
Validation pipeline (parse → validate → sanitize)
A safe output pipeline usually looks like:
- Parse: strict JSON parse (or strict structured format).
- Validate schema: required fields, types, enums.
- Validate policy: no forbidden content; no disallowed tool requests.
- Sanitize for rendering: escape HTML; prevent script injection; truncate overly long fields.
- Apply business rules: reject unsafe scopes; require approvals; enforce permissions.
If any step fails: retry with stricter instructions, or fall back to a safe refusal/not_found state.
Prompting helps, but validation is enforceable. Put critical constraints in code so they can’t be bypassed by injection.
Output filtering: what to block or redact
Filtering rules depend on your product, but common categories include:
- Secrets and tokens: block or redact if patterns are detected.
- PII: redact unless explicitly allowed and necessary.
- Policy violations: disallowed content or instructions.
- Unsafe instructions: attempts to call tools or request privileged operations.
- Excessive output: truncate or reject outputs beyond size budgets.
Filtering is not perfect, but it is a strong “last line of defense” against accidental leaks and obvious injection outcomes.
Citation integrity checks (for RAG)
For grounded systems, citations are a contract. Validate them:
- Chunk id validity: cited ids must be among retrieved sources.
- Quote containment: quote text must appear in the chunk text.
- Per-claim citations: every claim must have at least one citation.
This prevents “invented citations” and makes injection harder to hide.
Retries and fallbacks (fail closed)
When output validation fails, you should not “ship the best effort.”
Practical retry strategy:
- Retry 1: remind “JSON only,” repeat schema, reduce verbosity.
- Retry 2: reduce context, reduce number of tasks, switch to a more deterministic model.
- Fallback: return
status="not_found"or"refused"with next steps.
Hard cap retries. Otherwise validation failures become cost bombs.
Template: safe output contract
A minimal safe contract for many features:
{
"status": "answered" | "not_found" | "needs_clarification" | "conflict" | "refused",
"answer": { "summary": string, "bullets": string[] } | null,
"missing_info": string[],
"follow_up_question": string | null
}
For RAG, extend bullets into per-claim citations (see Part VIII citation schemas).