15.3 Handling partial/invalid JSON gracefully

Overview and links for this section of the guide.

Why invalid JSON is normal (and how to stay calm)

Even with good prompts, models sometimes produce:

  • extra text before/after JSON,
  • invalid JSON syntax (trailing commas),
  • almost-correct JSON with missing quotes,
  • valid JSON that doesn’t match your schema.

This is not a catastrophe. It’s an expected failure mode.

Treat invalid output as a category

When you treat invalid output as a normal outcome (invalid_output), your app stays stable and your debugging becomes straightforward.

A robust handling flow (parse → validate → repair → fallback)

A safe default flow in your LLM wrapper:

  1. Parse: attempt JSON parse.
  2. Validate: validate against schema.
  3. If parse fails: optional extraction/repair attempt (bounded).
  4. If schema fails: optional “repair” retry with strict instructions.
  5. If still failing: return invalid_output with safe diagnostic info.

The key is bounded attempts. One repair retry is usually enough.

Pseudocode sketch

try parse JSON
if parse ok:
  if schema valid: return ok(result)
  else: maybe repair once; else invalid_output
else:
  maybe extract JSON substring and retry parse once
  else invalid_output
Do not loop forever

Unbounded retries create cost spikes and rate limit storms. Always cap repair attempts.

Repair strategies (safe and practical)

Strategy 1: “JSON-only repair” prompt (recommended)

If you got non-JSON or schema-invalid JSON, you can ask the model to repair it:

  • paste the schema,
  • paste the model’s previous output,
  • instruct: “return corrected JSON only; no commentary.”

Strategy 2: Extract JSON substring (careful)

Sometimes the model outputs prose + JSON. A pragmatic approach is to:

  • find the first { and the last },
  • attempt to parse that substring.

This works often, but it can fail when braces appear in strings. Use it as a single bounded attempt, not a general solution.

Strategy 3: Tighten the prompt and schema

Long-term, the best “repair” is prevention:

  • simplify the schema,
  • reduce output length,
  • add examples of valid JSON,
  • reduce temperature,
  • use enums and bounded arrays (15.4).
Repair is a symptom

If you need repairs frequently, your prompt/schema is too loose or too complex. Fix the contract.

Streaming and partial JSON

If you stream model output, you may receive incomplete JSON chunks. Practical guidance:

  • do not parse until the stream is complete,
  • buffer the stream and parse at the end,
  • if you need progressive rendering, stream events instead of JSON (advanced; later).

For most beginner/intermediate projects, keep it simple: buffer full output, then parse/validate.

Logging invalid output safely

When debugging invalid outputs, don’t dump raw prompts and raw user content into logs by default.

Safer logging pattern:

  • log outcome category (invalid_output)
  • log schema version + prompt version
  • log a short validation error summary (“missing field X”)
  • log token sizes and request id
  • store raw output only in a restricted debug mode (if needed)
Raw outputs can contain sensitive data

For summarization apps, outputs often contain user-provided text. Treat raw output logs as sensitive storage.

Anti-patterns (what not to do)

  • Regex “parse” of JSON: brittle and unsafe; use a real parser.
  • Silent best-effort parsing: hides correctness issues; fail loudly and categorically.
  • Unlimited retries: creates retry storms and cost spikes.
  • Accepting unknown keys: creates drift and makes outputs unreviewable.

Copy-paste templates

Template: repair prompt

You previously attempted to output JSON but it was invalid.

Rules:
- Output ONLY valid JSON.
- Do not include any extra text.
- The JSON MUST match this schema:
(paste schema)

Here is the invalid output to repair:
```text
...
```

Now output corrected JSON only.

Template: invalid output error object

{
  "status": "invalid_output",
  "message": "Model output did not match the required schema.",
  "request_id": "...",
  "prompt_version": "...",
  "schema_version": "...",
  "details": ["missing field summary_bullets", "summary_bullets must be an array"]
}

Where to go next