27.4 Fuzzing prompts and inputs

On this page

Goal: harden the system against adversarial and weird inputs

Fuzzing is how you discover failures before attackers or weird user inputs do.

For LLM apps, fuzzing is especially useful because:

Fuzzing is not “testing the model”

Fuzzing is testing your system’s resilience: your sanitization, validators, budgets, logging, and fail-closed behavior.

LLM apps are vulnerable to inputs that:

Fuzzing is how you ensure these inputs do not create security issues or outages.

Fuzzing is most valuable at the boundaries:

User input normalization: trimming, encoding, language detection, length caps.
Prompt building: template rendering with weird values; escaping.
Source packaging (RAG): documents that include adversarial instructions.
JSON parsing and schema validation: malformed and adversarial outputs.
Logging layer: ensure sensitive strings are redacted; ensure logs don’t explode in size.
Tool routing (if applicable): ensure tool allowlists and budgets cannot be bypassed.

A fuzz corpus is a set of adversarial and pathological inputs you run repeatedly.

Include:

Prompt injection strings: “ignore previous instructions”, “reveal system prompt”, “print your hidden rules”.
Format breakers: unmatched braces, nested JSON, markdown fences, XML-like wrappers.
Unicode weirdness: zero-width characters, RTL text, mixed scripts.
Length extremes: empty input, extremely long input, repeated tokens.
Adversarial RAG chunks: “SOURCES” that contain malicious instructions or fake citations.
Confusing queries: ambiguous questions that should trigger clarification.
Safety edge cases: requests that should be refused or escalated.

When you see a real-world failure, add it to the fuzz corpus. Over time this becomes a powerful regression suite.

Treat fuzz corpus like a security artifact

Don’t let it become a random pile. Tag cases by threat type (injection, parsing, leakage, overload) so you can see coverage.

Your fuzz tests should assert safety and stability properties:

No crashes: system does not throw unhandled exceptions.
No leaks: outputs and logs do not contain secrets or restricted content.
Contract adherence: outputs are validated or safely rejected.
Budget adherence: timeouts, max retries, and context caps are enforced.
Proper routing: refused/restricted/not_found/needs_clarification states are triggered appropriately.

Start with deterministic layers: validators, parsers, prompt builders.
Add a small fuzz corpus: 25–100 cases.
Run in CI: quick and bounded.
When a bug occurs: reduce to a minimal failing case and add it to the corpus.
Periodically expand: new injection patterns and new “weird” formats.

If you fuzz the live model output, make it a separate job (slower, flaky, and expensive). Keep your core fuzz suite deterministic.