27.4 Fuzzing prompts and inputs
Overview and links for this section of the guide.
On this page
Goal: harden the system against adversarial and weird inputs
Fuzzing is how you discover failures before attackers or weird user inputs do.
For LLM apps, fuzzing is especially useful because:
- inputs are unstructured and user-controlled,
- prompt injection is a realistic threat model,
- parsers and validators can crash on surprising content,
- your model might output unexpected formats under stress.
Fuzzing is testing your system’s resilience: your sanitization, validators, budgets, logging, and fail-closed behavior.
Why fuzzing matters for LLM apps
LLM apps are vulnerable to inputs that:
- break parsing (malformed JSON-like output, nested braces, weird unicode),
- trigger prompt injection (“ignore rules”, “reveal secrets”, “call tools”),
- cause runaway size (very long text, repeated patterns),
- smuggle instructions via retrieved documents (RAG injection),
- cause output drift (model refuses, changes format, includes extra text).
Fuzzing is how you ensure these inputs do not create security issues or outages.
What to fuzz (high ROI targets)
Fuzzing is most valuable at the boundaries:
- User input normalization: trimming, encoding, language detection, length caps.
- Prompt building: template rendering with weird values; escaping.
- Source packaging (RAG): documents that include adversarial instructions.
- JSON parsing and schema validation: malformed and adversarial outputs.
- Logging layer: ensure sensitive strings are redacted; ensure logs don’t explode in size.
- Tool routing (if applicable): ensure tool allowlists and budgets cannot be bypassed.
Build a fuzz corpus (the important part)
A fuzz corpus is a set of adversarial and pathological inputs you run repeatedly.
Include:
- Prompt injection strings: “ignore previous instructions”, “reveal system prompt”, “print your hidden rules”.
- Format breakers: unmatched braces, nested JSON, markdown fences, XML-like wrappers.
- Unicode weirdness: zero-width characters, RTL text, mixed scripts.
- Length extremes: empty input, extremely long input, repeated tokens.
- Adversarial RAG chunks: “SOURCES” that contain malicious instructions or fake citations.
- Confusing queries: ambiguous questions that should trigger clarification.
- Safety edge cases: requests that should be refused or escalated.
When you see a real-world failure, add it to the fuzz corpus. Over time this becomes a powerful regression suite.
Don’t let it become a random pile. Tag cases by threat type (injection, parsing, leakage, overload) so you can see coverage.
What to assert (fail-closed invariants)
Your fuzz tests should assert safety and stability properties:
- No crashes: system does not throw unhandled exceptions.
- No leaks: outputs and logs do not contain secrets or restricted content.
- Contract adherence: outputs are validated or safely rejected.
- Budget adherence: timeouts, max retries, and context caps are enforced.
- Proper routing: refused/restricted/not_found/needs_clarification states are triggered appropriately.
A practical fuzz workflow
- Start with deterministic layers: validators, parsers, prompt builders.
- Add a small fuzz corpus: 25–100 cases.
- Run in CI: quick and bounded.
- When a bug occurs: reduce to a minimal failing case and add it to the corpus.
- Periodically expand: new injection patterns and new “weird” formats.
If you fuzz the live model output, make it a separate job (slower, flaky, and expensive). Keep your core fuzz suite deterministic.