19.2 Reproducing bugs with minimal test cases
Overview and links for this section of the guide.
On this page
Goal: one-command reproduction
Your goal is to produce a reproduction that is:
- reliable (fails consistently),
- small (minimal inputs),
- fast (runs quickly),
- portable (a teammate can run it).
Ideally it is one command or one test.
Without a repro, you can’t know if a fix worked. With a repro, you can iterate safely and quickly.
Why reproduction is the highest leverage step
Reproduction converts an incident from “mysterious behavior” into “a failing check.” Once you have a failing check:
- hypotheses become testable,
- fixes become verifiable,
- regressions become preventable.
The minimal reproducible error (MRE) procedure
- Start from a failing case: one request id / one input sample.
- Make it local: reproduce in staging or dev if possible.
- Reduce variables: disable concurrency, retries, and randomness.
- Pin versions: runtime version, dependency versions, prompt versions.
- Minimize input: remove irrelevant parts while keeping failure.
- Write it down: one command with expected vs actual output.
Once this is done, you can ask the model for fixes with high confidence.
Shrinking a repro (delta debugging mindset)
When inputs are large (documents, payloads), use a shrink loop:
- remove half the input, rerun
- if it still fails, keep shrinking
- if it stops failing, add back the last removed chunk and try a different removal
This is a practical way to isolate the minimal trigger.
The model can propose which parts of an input are likely irrelevant. You still validate by rerunning the repro.
Turn repro into a failing test
Once you have a one-command repro, convert it into a test:
- a unit test if the bug is in pure logic
- an integration test if the bug is in wiring/IO
- a golden test if output shape must remain stable
The test is your regression lock.
Copy-paste prompts
Prompt: create an MRE plan
Help me create a minimal reproducible error (MRE) for this bug.
Evidence:
- Expected: ...
- Actual: ...
- Logs/output: ...
Task:
1) Propose a step-by-step plan to reproduce it reliably.
2) Propose how to shrink the input to a minimal case.
3) Propose what a regression test should look like (unit vs integration).
Stop after the plan.
Prompt: write the regression test only
Write a regression test for this bug. Do NOT change implementation yet.
Repro:
- Input: ...
- Expected: ...
- Actual: ...
Output: diff-only changes (tests only)