21.2 Extracting structured data from images (carefully)

On this page

Goal: extraction you can trust
Why image extraction fails in predictable ways
Define an extraction contract (schema + rules)
Tables, charts, and dense layouts
Verification loop and quality checks
Copy-paste prompts
Where to go next

Goal: extraction you can trust

Extracting data from images is tempting because it feels “direct”: upload a screenshot of a receipt, get structured fields back.

The reality: image extraction is a lossy process. The only way to make it reliable is to treat it like engineering:

define a schema,
allow unknowns,
capture confidence and evidence,
verify with sampling and automated checks.

Never force the model to guess missing fields

In extraction, guessing is corruption. Prefer null + a reason.

Why image extraction fails in predictable ways

Most failures fall into a few buckets:

Legibility: low resolution, blur, compression artifacts, small fonts.
Ambiguity: similar-looking characters (0/O, 1/l), unclear decimals, cut-off text.
Layout confusion: multi-column documents, tables, dense receipts, wrapped lines.
Missing context: currency/unit not visible, date format ambiguous, partial screenshot.
Invented structure: the model outputs “the kind of fields this document usually has,” not what’s actually shown.

Your prompts should directly counter these failure modes.

Define an extraction contract (schema + rules)

An extraction contract has two parts:

Schema: exact JSON fields, types, and allowed values.
Rules: what to do when information is missing or unclear.

Good contract rules include:

No guessing: use null when not visible.
Capture uncertainty: per-field confidence (e.g., high|medium|low).
Capture evidence: include the exact text snippet you read (or a short quote).
Normalize carefully: preserve original formatting in raw fields when in doubt.
Include an “unparsed” bucket: leftover text that didn’t fit the schema.

Two-pass extraction is often better

Pass 1: identify document type and what fields are present. Pass 2: extract with the exact schema for that document type.

Tables, charts, and dense layouts

For tables and dense receipts, you want to prevent “helpful reformatting”:

Preserve ordering: represent rows as arrays; keep row order as seen.
Separate raw vs parsed: store the exact string and a normalized numeric value (if possible).
Validate arithmetic: totals should equal the sum of line items (when applicable).
Explicitly define units: currency, time zone, measurement units.

For charts, be especially careful: pixel-based reading of axis labels is error-prone. If the task is important, prefer a tool-based approach (OCR + chart parsing) and use the model to validate or interpret, not to extract every tick value.

Verification loop and quality checks

To make extraction production-worthy, add checks:

Sampling review: manually inspect a small set of extractions per batch.
Schema validation: reject malformed JSON and retry with a stricter prompt.
Consistency checks: totals, date formats, currency formats.
Confidence gating: route low-confidence fields for human review.
Golden set: maintain 25–100 labeled examples and track regression over time.

Copy-paste prompts

Prompt: strict JSON extraction with evidence

Extract data from the attached image. Output MUST be valid JSON.

Rules:
- Do not guess. If a value is not clearly visible, use null.
- For each extracted field, include a confidence: "high" | "medium" | "low".
- Include the exact raw text snippet you used as evidence.

Return JSON with this schema:
{
  "document_type": string,
  "fields": {
    "date": { "value": string|null, "confidence": string, "evidence": string|null },
    "total": { "value": number|null, "confidence": string, "evidence": string|null, "raw": string|null },
    "currency": { "value": string|null, "confidence": string, "evidence": string|null }
  },
  "line_items": [{
    "description": { "value": string|null, "confidence": string, "evidence": string|null },
    "amount": { "value": number|null, "confidence": string, "evidence": string|null, "raw": string|null }
  }],
  "unparsed_text": string[]
}

Prompt: arithmetic validation

You extracted line items and a total from the image.

Task:
1) Check whether sum(line_items.amount) matches total (within 0.01).
2) If it doesn’t match, list the most likely causes (missing item, tax, unreadable amount).
3) Propose the smallest follow-up question or re-crop needed to resolve.
Return a short checklist.

21.2 Extracting structured data from images (carefully)

Goal: extraction you can trust

Why image extraction fails in predictable ways

Define an extraction contract (schema + rules)

Tables, charts, and dense layouts

Verification loop and quality checks

Copy-paste prompts

Prompt: strict JSON extraction with evidence

Prompt: arithmetic validation

Where to go next