7.3 The "write tests first" vibe pattern
Overview and links for this section of the guide.
On this page
The core idea
In vibe coding, the model can generate code faster than you can judge it. Tests flip that power balance: they give you a mechanical way to decide whether the output is correct.
The pattern is simple:
- Write tests first from acceptance criteria.
- Confirm tests match intent (this is a spec review).
- Implement the smallest change to make tests pass.
- Run tests after every diff and iterate.
It’s the fastest way to keep AI output honest. A failing test is better feedback than 20 paragraphs of explanation.
Why tests-first speeds up vibe coding
It speeds you up because it reduces two expensive activities:
- Argument debugging: “it should work” back-and-forth without evidence.
- Regression hunting: discovering later that you broke something earlier.
With tests, your loop becomes:
- Change → run tests → see failure → fix → repeat.
- Not: Change → hope → discover break later → panic.
Ask it to propose test cases from acceptance criteria. You still review them, but it can generate the scaffolding and edge-case set quickly.
When to use tests-first
Tests-first is a great default when:
- you are fixing a bug (regression test first),
- you are refactoring (lock behavior before moving code),
- the behavior is tricky (parsing, validation, serialization),
- the output must be stable (structured output / schemas),
- you are worried about breaking existing behavior.
You can skip tests-first when the change is truly trivial (but even then, at least run existing tests).
A practical workflow (tests → implementation → verify)
Step 0: define “done”
Tests-first starts with acceptance criteria (Section 6.3). If “done” is fuzzy, tests will be fuzzy too.
Step 1: ask the model for tests only
Your prompt should explicitly forbid implementation. You are creating a spec artifact, not code.
Step 2: review the tests like a code review
Tests are part of your product contract. Review for:
- coverage of the acceptance criteria,
- edge cases and failure behavior,
- determinism (no time/network flakiness),
- minimal coupling to implementation details.
Step 3: make tests fail for the right reason
Run the tests before implementing. If tests already pass, they’re not testing the new behavior (or they’re too weak).
Step 4: implement the smallest change to pass
Ask for diff-only changes. One small diff per failing test cluster is usually ideal.
Step 5: iterate and lock in
Repeat until green, then commit. This becomes your stable base for the next feature.
If the model makes tests pass by deleting the test, weakening assertions, or changing the acceptance criteria, that’s not success. That’s cheating. Keep the contract stable.
What good tests look like (for AI-generated code)
Good tests have three traits:
- They test behavior, not structure: outputs, errors, exit codes, schemas.
- They are easy to read: a test is documentation for “what we mean.”
- They fail clearly: when broken, you can tell what changed.
Example: CLI behavior test
For CLI tools, prefer tests that call the entrypoint function and capture stdout/stderr, rather than shelling out to a subprocess (unless you need a true integration test).
If your CLI reads from real stdin/out, refactor to allow injecting streams. This single design choice makes tests-first much easier.
Copy-paste prompt sequence
Prompt A: tests only
We are using the “write tests first” pattern.
Task:
[Describe the change/feature.]
Acceptance criteria:
- [...]
Constraints:
- Language/runtime: [...]
- Dependencies: [...]
- Do NOT implement the feature yet
Output:
- Diff-only changes that add/modify tests ONLY
- After the diff, explain how each test maps to an acceptance criterion
Prompt B: implement to pass tests
Now implement the feature to make the new tests pass.
Constraints:
- Keep the public API stable unless the tests require a change
- Do not weaken or delete the new tests
- Keep diffs small and focused
Output:
- Diff-only changes
Prompt C: fix one failing test minimally
Here is the failing test output:
(paste)
Fix the failure with the smallest change possible.
Constraints:
- Do not change tests unless the test is wrong (explain if so)
- Diff-only changes
Common pitfalls (and fixes)
Pitfall: tests are too tied to implementation details
Fix: rewrite tests to focus on public behavior (function outputs, schemas, CLI output), not internal helper functions.
Pitfall: tests don’t fail before implementation
Fix: add assertions or cases that clearly require the new behavior. A test that always passes is just noise.
Pitfall: tests are flaky
Fix: remove time/network randomness; use fixed inputs; avoid relying on ordering unless defined.
Pitfall: the model “fixes” by weakening the contract
Fix: restate acceptance criteria and instruct: “Do not change expected behavior. Only change implementation.”
Ship points for tests-first
- SP1: tests added and reviewed; they fail for the right reason.
- SP2: implementation passes tests with minimal diff.
- SP3: refactor/cleanup (optional) with tests still green; commit.