15.2 Designing schemas that are hard to break
Overview and links for this section of the guide.
On this page
Goal: schemas that enforce reliability
A good schema is not a “documentation file.” It’s an enforcement tool. Your schema should:
- make valid outputs easy,
- make invalid outputs obvious,
- constrain output length and ambiguity,
- support versioning as your app evolves.
The schema defines what your app can reliably do. If it’s too loose, you’ll parse chaos. If it’s too strict, you’ll reject too many outputs. Aim for “strict enough.”
Schema principles (what makes them “hard to break”)
- Small: fewer fields = fewer ways to fail.
- Explicit: required fields, clear types, clear bounds.
- Bounded: max items, max length, enums.
- Stable: versioned; don’t overwrite v1.
- Validator-friendly: avoid exotic features until you need them.
Minimize degrees of freedom
The model will fill any space you leave open. Reduce degrees of freedom by:
- limiting bullet count (
minItems/maxItems), - limiting string length (
maxLength), - using enums for categories,
- keeping nesting shallow.
When you need flexibility, add it deliberately.
Be strict where it matters
For many tasks, strictness buys reliability:
- require fields even if arrays can be empty,
- disallow unknown keys (in JSON Schema:
additionalProperties: false), - use clear types (
string,number,array), - avoid “any type” fields.
If you allow arbitrary keys, the model may invent new ones. Your app will ignore them and you won’t notice correctness drift. Prefer strict schemas.
Prefer nulls over missing fields
Missing fields create ambiguous behavior: is it missing because it’s unknown, or because the model forgot?
A practical strategy:
- make fields required,
- allow
nullas a value when unknown, - use empty arrays for “none” rather than missing arrays.
This makes validation and UI rendering simpler.
Example schema for Project 1
This example shows the shape and strictness you want (not tied to a specific JSON Schema draft). Use it as a design reference:
{
"type": "object",
"additionalProperties": false,
"required": ["title", "summary_bullets", "key_entities", "claims", "caveats"],
"properties": {
"title": {"type": ["string", "null"], "maxLength": 200},
"summary_bullets": {
"type": "array",
"minItems": 5,
"maxItems": 10,
"items": {"type": "string", "maxLength": 220}
},
"key_entities": {
"type": "array",
"maxItems": 12,
"items": {"type": "string", "maxLength": 80}
},
"claims": {
"type": "array",
"maxItems": 8,
"items": {
"type": "object",
"additionalProperties": false,
"required": ["claim", "support"],
"properties": {
"claim": {"type": "string", "maxLength": 240},
"support": {"type": ["string", "null"], "maxLength": 240}
}
}
},
"caveats": {
"type": "array",
"maxItems": 6,
"items": {"type": "string", "maxLength": 220}
}
}
}
Notice the pattern: strict keys, bounded arrays, bounded strings, and required fields.
Schema review checklist
- Is every field you rely on required?
- Are arrays bounded with maxItems (and minItems where appropriate)?
- Are strings bounded with maxLength?
- Are unknown keys disallowed?
- Do you have a way to represent “unknown” (null) without omitting fields?
- Is the schema small enough to be reliable?
- Is the schema versioned (v1, v2) instead of overwritten?