6.4 Use examples as "mini tests"
Overview and links for this section of the guide.
On this page
Why examples are so effective
Examples are the simplest way to make behavior concrete. They work because they:
- reduce ambiguity (“this is what I mean”),
- anchor edge cases without long explanations,
- act like lightweight tests the model can reason about,
- give you immediate verification targets after generation.
When you include input/output pairs, you’re not asking the model to “be correct.” You’re defining correctness.
How to choose high-value examples
Good examples are small and diagnostic. Aim for 5–10 total examples per feature.
1) Happy path examples
Include 2–4 simple examples that prove the main use case works.
2) Edge case examples
Include 2–4 that stress boundaries:
- empty/whitespace input,
- invalid format,
- unexpected characters,
- large inputs,
- ambiguous cases that usually break parsers.
3) Error behavior examples
Include 1–2 examples that specify error outputs and codes. This prevents stack traces and inconsistent failure behavior.
Models often get the happy path right. It’s the failure modes that make apps brittle.
How to format examples in prompts
Pick a format that is unambiguous and easy to copy/paste into tests. Common formats:
Table format
Examples:
1) Input: "2+2" Output: "4"
2) Input: "1/2" Output: "0.5"
3) Input: "2+*3" Error: exit 2, stderr contains "invalid syntax"
JSON-ish format
examples = [
{"input": "2+2", "output": "4"},
{"input": "2*(3+4)", "output": "14"},
{"input": "2+*3", "error": {"exit_code": 2}},
]
Choose one format and keep it consistent across the guide and your projects.
Examples that catch edge cases
Here are the kinds of examples that catch real bugs:
- Whitespace:
" 2 + 2 " - Unary operators:
"-3 + 5" - Parentheses:
"2*(3+4)" - Invalid syntax:
"2+*3" - Division by zero: define expected behavior explicitly
- Empty input: define expected behavior explicitly
If you don’t define what happens on “division by zero” or “empty input,” the model will choose something. That choice will become your behavior unless you correct it later.
Building a tiny “example suite”
Over time, keep a small example suite for your project:
- It’s a test set you can rerun mentally.
- It’s a regression set you can paste into prompts.
- It reveals when a prompt change breaks prior behavior.
This is the seed of evaluation harnesses later in the guide—just scaled down to something you can do today.
A prompt template using mini tests
Task:
[Describe the change.]
Mini tests (examples):
- Input: [...]
Expected: [...]
- Input: [...]
Expected: [...]
- Input: [...]
Expected: [error behavior]
Constraints:
- Language/runtime: [...]
- Dependencies: [...]
Output:
- Diff-only changes
- After the diff, restate how each mini test is satisfied