12. Safety, Policies, and Guardrails

Overview and links for this section of the guide.

What this section is for

As soon as your AI feature touches real users, safety becomes an engineering requirement, not a vibe.

This section teaches you how to build systems that behave predictably when:

  • the model refuses or content is blocked,
  • users submit ambiguous, adversarial, or sensitive inputs,
  • outputs need to be safe and policy-aware,
  • you need auditability without leaking private data.
Safety is a product feature

Users don’t care whether a failure was caused by a “safety filter” or a bug. They care that the product is predictable, respectful, and usable.

The mental model: safety is multi-layer

Safety is not one setting. It’s multiple layers working together:

  • Model behavior: built-in refusal patterns and boundaries.
  • Platform safety settings: category thresholds and filtering.
  • App-level guardrails: your prompts, schemas, validation, UX, logging, and access controls.

Platform safety helps, but it cannot replace app design.

The builder’s job (what you control)

As a builder, your job is to make safety behavior:

  • predictable: clear outcomes and clear UX states,
  • recoverable: users can rephrase or choose safe alternatives,
  • auditable: you can understand what happened without storing dangerous data,
  • bounded: untrusted input doesn’t become instructions, tools don’t run wild.
Safety is not optional for “prototypes” that ship

Most real incidents come from “just a prototype” quietly becoming a real product. Design for safety early, especially around secrets and sensitive data.

Section 12 map (12.1–12.5)

Where to go next