19.2 Reproducing bugs with minimal test cases

On this page

Goal: one-command reproduction
Why reproduction is the highest leverage step
The minimal reproducible error (MRE) procedure
Shrinking a repro (delta debugging mindset)
Turn repro into a failing test
Copy-paste prompts
Where to go next

Goal: one-command reproduction

Your goal is to produce a reproduction that is:

reliable (fails consistently),
small (minimal inputs),
fast (runs quickly),
portable (a teammate can run it).

Ideally it is one command or one test.

Repro is how you stop guessing

Without a repro, you can’t know if a fix worked. With a repro, you can iterate safely and quickly.

Why reproduction is the highest leverage step

Reproduction converts an incident from “mysterious behavior” into “a failing check.” Once you have a failing check:

hypotheses become testable,
fixes become verifiable,
regressions become preventable.

The minimal reproducible error (MRE) procedure

Start from a failing case: one request id / one input sample.
Make it local: reproduce in staging or dev if possible.
Reduce variables: disable concurrency, retries, and randomness.
Pin versions: runtime version, dependency versions, prompt versions.
Minimize input: remove irrelevant parts while keeping failure.
Write it down: one command with expected vs actual output.

Once this is done, you can ask the model for fixes with high confidence.

Shrinking a repro (delta debugging mindset)

When inputs are large (documents, payloads), use a shrink loop:

remove half the input, rerun
if it still fails, keep shrinking
if it stops failing, add back the last removed chunk and try a different removal

This is a practical way to isolate the minimal trigger.

Ask the model to help shrink

The model can propose which parts of an input are likely irrelevant. You still validate by rerunning the repro.

Turn repro into a failing test

Once you have a one-command repro, convert it into a test:

a unit test if the bug is in pure logic
an integration test if the bug is in wiring/IO
a golden test if output shape must remain stable

The test is your regression lock.

Copy-paste prompts

Prompt: create an MRE plan

Help me create a minimal reproducible error (MRE) for this bug.

Evidence:
- Expected: ...
- Actual: ...
- Logs/output: ...

Task:
1) Propose a step-by-step plan to reproduce it reliably.
2) Propose how to shrink the input to a minimal case.
3) Propose what a regression test should look like (unit vs integration).
Stop after the plan.

Prompt: write the regression test only

Write a regression test for this bug. Do NOT change implementation yet.

Repro:
- Input: ...
- Expected: ...
- Actual: ...

Output: diff-only changes (tests only)

19.2 Reproducing bugs with minimal test cases

Goal: one-command reproduction

Why reproduction is the highest leverage step

The minimal reproducible error (MRE) procedure

Shrinking a repro (delta debugging mindset)

Turn repro into a failing test

Copy-paste prompts

Prompt: create an MRE plan

Prompt: write the regression test only

Where to go next