9.4 Controlling determinism for repeatable builds
Overview and links for this section of the guide.
On this page
Why determinism matters for builders
As soon as your AI output affects real software behavior, you need repeatability. Determinism matters because it lets you:
- reproduce a “good” generation,
- debug regressions caused by prompt/model changes,
- build evaluation sets and compare outputs,
- avoid “it worked yesterday” randomness.
It’s a combination of model choice, settings, prompt stability, context stability, and output constraints.
Where variability comes from
Common sources of “why is it different this time?”
- Sampling randomness: temperature/top-p style settings.
- Context drift: different chat history changes the output.
- Instruction drift: prompts evolve without being versioned.
- Model updates: underlying model behavior changes over time.
- Non-deterministic tools: external APIs/data that change.
- Ambiguity: vague prompts produce multiple “reasonable” answers.
Knobs that affect determinism
Practical controls you can use:
- Lower randomness: reduce temperature; prefer constrained output formats.
- Stabilize prompts: version your prompts and reuse templates.
- Stabilize context: summarize state into a stable “context block” instead of relying on chat history.
- Constrain output: schemas, checklists, strict formats.
- Use tests: deterministic tests convert variability into pass/fail.
If your prompt leaves decisions open, you will get different outputs even at low temperature. Make the decisions explicit.
A repeatable-build workflow
- Write prompts as files (not just chat), with versions.
- Pin constraints (runtime, deps, style, file scope).
- Use plan-first for non-trivial work.
- Use diff-only for code changes.
- Run tests after each step.
- Record the settings (model + key parameters) used for successful runs.
This turns “AI output” into “build artifact.”
Structured outputs as a determinism tool
Structured output reduces degrees of freedom:
- JSON schema outputs are less “creative” than free text.
- Checklists force the model to follow steps consistently.
- Explicit fields reduce omission and drift.
You’ll go deeper on structured output later in the guide, but the key idea here is: structure improves repeatability.
Common mistakes (and fixes)
Mistake: trying to solve repeatability with temperature alone
Fix: stabilize prompts and context; constrain outputs; use tests.
Mistake: relying on long chat history as “the spec”
Fix: periodically reset into a stable summary block and treat that as the authoritative context.
Mistake: not versioning prompts
Fix: treat prompts like code. Changes to prompts are behavior changes.