15.3 Handling partial/invalid JSON gracefully

On this page

Why invalid JSON is normal (and how to stay calm)
A robust handling flow (parse → validate → repair → fallback)
Repair strategies (safe and practical)
Streaming and partial JSON
Logging invalid output safely
Anti-patterns (what not to do)
Copy-paste templates
Where to go next

Why invalid JSON is normal (and how to stay calm)

Even with good prompts, models sometimes produce:

extra text before/after JSON,
invalid JSON syntax (trailing commas),
almost-correct JSON with missing quotes,
valid JSON that doesn’t match your schema.

This is not a catastrophe. It’s an expected failure mode.

Treat invalid output as a category

When you treat invalid output as a normal outcome (invalid_output), your app stays stable and your debugging becomes straightforward.

A robust handling flow (parse → validate → repair → fallback)

A safe default flow in your LLM wrapper:

Parse: attempt JSON parse.
Validate: validate against schema.
If parse fails: optional extraction/repair attempt (bounded).
If schema fails: optional “repair” retry with strict instructions.
If still failing: return invalid_output with safe diagnostic info.

The key is bounded attempts. One repair retry is usually enough.

Pseudocode sketch

try parse JSON
if parse ok:
  if schema valid: return ok(result)
  else: maybe repair once; else invalid_output
else:
  maybe extract JSON substring and retry parse once
  else invalid_output

Do not loop forever

Unbounded retries create cost spikes and rate limit storms. Always cap repair attempts.

Repair strategies (safe and practical)

Strategy 1: “JSON-only repair” prompt (recommended)

If you got non-JSON or schema-invalid JSON, you can ask the model to repair it:

paste the schema,
paste the model’s previous output,
instruct: “return corrected JSON only; no commentary.”

Strategy 2: Extract JSON substring (careful)

Sometimes the model outputs prose + JSON. A pragmatic approach is to:

find the first { and the last },
attempt to parse that substring.

This works often, but it can fail when braces appear in strings. Use it as a single bounded attempt, not a general solution.

Strategy 3: Tighten the prompt and schema

Long-term, the best “repair” is prevention:

simplify the schema,
reduce output length,
add examples of valid JSON,
reduce temperature,
use enums and bounded arrays (15.4).

Repair is a symptom

If you need repairs frequently, your prompt/schema is too loose or too complex. Fix the contract.

Streaming and partial JSON

If you stream model output, you may receive incomplete JSON chunks. Practical guidance:

do not parse until the stream is complete,
buffer the stream and parse at the end,
if you need progressive rendering, stream events instead of JSON (advanced; later).

For most beginner/intermediate projects, keep it simple: buffer full output, then parse/validate.

Logging invalid output safely

When debugging invalid outputs, don’t dump raw prompts and raw user content into logs by default.

Safer logging pattern:

log outcome category (invalid_output)
log schema version + prompt version
log a short validation error summary (“missing field X”)
log token sizes and request id
store raw output only in a restricted debug mode (if needed)

Raw outputs can contain sensitive data

For summarization apps, outputs often contain user-provided text. Treat raw output logs as sensitive storage.

Anti-patterns (what not to do)

Regex “parse” of JSON: brittle and unsafe; use a real parser.
Silent best-effort parsing: hides correctness issues; fail loudly and categorically.
Unlimited retries: creates retry storms and cost spikes.
Accepting unknown keys: creates drift and makes outputs unreviewable.

Copy-paste templates

Template: repair prompt

You previously attempted to output JSON but it was invalid.

Rules:
- Output ONLY valid JSON.
- Do not include any extra text.
- The JSON MUST match this schema:
(paste schema)

Here is the invalid output to repair:
```text
...
```

Now output corrected JSON only.

Template: invalid output error object

{
  "status": "invalid_output",
  "message": "Model output did not match the required schema.",
  "request_id": "...",
  "prompt_version": "...",
  "schema_version": "...",
  "details": ["missing field summary_bullets", "summary_bullets must be an array"]
}

15.3 Handling partial/invalid JSON gracefully

Why invalid JSON is normal (and how to stay calm)

A robust handling flow (parse → validate → repair → fallback)

Pseudocode sketch

Repair strategies (safe and practical)

Strategy 1: “JSON-only repair” prompt (recommended)

Strategy 2: Extract JSON substring (careful)

Strategy 3: Tighten the prompt and schema

Streaming and partial JSON

Logging invalid output safely

Anti-patterns (what not to do)

Copy-paste templates

Template: repair prompt

Template: invalid output error object

Where to go next