21. Working With Images
Overview and links for this section of the guide.
On this page
What images are useful for
Images are best when you’re trying to communicate something that is expensive to describe in text:
- UI debugging: “this button is misaligned on mobile,” “the modal overflows,” “the table header overlaps.”
- UX critique: hierarchy, clarity, affordances, accessibility signals, and copy issues.
- Data extraction: receipts, forms, screenshots of dashboards, charts, tables, or handwritten notes (with care).
- Test generation: enumerate UI states and edge cases from mockups/snapshots.
Images are not magic: the model sees pixels, not your DOM, not your data, and not your users. Your prompt and constraints determine whether it becomes helpful or just confidently noisy.
Use images to reduce ambiguity, then immediately translate insights into verifiable steps: inspect a CSS property, add a test, or extract a schema.
Build an “image pack” (high-signal inputs)
An image pack is the minimum set of visuals and context that makes the task solvable:
- The image(s): one clear screenshot is better than five blurry ones.
- Environment: device, viewport size, browser, OS, theme (light/dark), zoom level.
- Expected vs observed: a one-sentence statement of the intended behavior.
- Scope: which page/component/state the screenshot represents.
- Constraints: CSS framework, design system rules, “don’t change markup,” “no new dependencies,” etc.
If you can, include a “control” screenshot: what it looks like when it’s correct (or a mockup).
How to constrain image-based work
Image tasks go wrong when the model is allowed to improvise. Constrain aggressively:
- Ask for a diagnosis plan first: “List likely causes + how to confirm in DevTools.”
- Demand minimal diffs: “Change at most 10 lines of CSS; explain why each matters.”
- Use structured outputs: JSON for extraction; ranked lists for critique; tests as a checklist.
- Force evidence: “Point to what in the screenshot suggests that.”
- Allow unknowns: “If you can’t tell, ask for a clearer screenshot or the CSS.”
The model cannot see your runtime layout engine, computed styles, or the code that produced the UI. Use the screenshot to narrow the search, then verify with real tooling.
A default workflow for image tasks
- Clarify: what decision should come out of this? (patch, extraction, critique, tests)
- Constrain: state constraints and output format.
- Diagnose/extract: get a first pass that includes uncertainty.
- Verify: confirm one thing (a CSS rule, a field, a claim).
- Iterate: ask for the smallest next step or minimal diff.
Section 21 map (21.1–21.5)
- 21.1 Image inputs for debugging UI issues
- 21.2 Extracting structured data from images (carefully)
- 21.3 Image-based UX critique prompts
- 21.4 Visual test case generation
- 21.5 Safety and privacy with user images
Where to go next
Explore next