30. Threat Modeling for AI Features

Overview and links for this section of the guide.

What this section is for

Threat modeling is how you decide what could go wrong before it goes wrong.

For AI features, threat modeling helps you avoid two common mistakes:

  • Over-trusting the model: “it will follow instructions” (it won’t, reliably).
  • Under-scoping the risk: “it’s just text” (it becomes actions and data access quickly).

This section gives you a concrete threat model tailored to LLM apps: RAG, tool calling, logging, and generated code.

Threat modeling is a product skill

The goal is not perfect security. The goal is to prioritize controls for the highest-impact risks and ship safely.

A simple threat modeling model for AI features

Use four questions:

  1. What are the assets? (secrets, PII, money, production access, proprietary docs)
  2. Who are the attackers? (anonymous users, insiders, compromised accounts, malicious documents)
  3. What are the entry points? (user prompts, file uploads, retrieved docs, tool inputs, logs)
  4. What are the worst outcomes? (data exfiltration, unsafe actions, policy violations, outages)

Then design defenses that break the attacker’s path.

What you’re protecting (assets)

Most AI features touch at least one of these assets:

  • Secrets: API keys, tokens, service credentials.
  • PII: names, emails, addresses, account ids, recordings, screenshots.
  • Proprietary data: internal docs, code, roadmap, incident details.
  • Tooling access: ability to call APIs, execute actions, create tickets, modify data.
  • Availability: quotas, rate limits, latency budgets.

Threat modeling starts by listing which of these your feature touches.

Trust boundaries in LLM systems

LLM systems have multiple trust boundaries:

  • User input boundary: untrusted prompts and files.
  • Retrieval boundary: untrusted documents fetched from corpora.
  • Model boundary: model output is untrusted and must be validated.
  • Tool boundary: any external API call or action is high risk.
  • Logging boundary: logs can leak sensitive content if not controlled.
The “confused deputy” problem

Your app has permissions. Attackers try to trick your app (via the model) into using those permissions for them. Tool design and permissions are your real security boundary.

Threat modeling workflow (fast and practical)

Use this workflow per feature:

  1. Diagram the pipeline: input → retrieval → prompt → model → validation → tools → output → logs.
  2. List assets: secrets, PII, proprietary data, actions.
  3. List entry points: user input, docs, tool params, logs.
  4. Enumerate attacker goals: exfiltrate data, trigger actions, degrade service.
  5. Pick top risks: highest impact + highest likelihood.
  6. Add controls: least privilege, allowlists, schema enforcement, budgets, audit logs.
  7. Define tests: fuzz cases, red-team prompts, permission tests, logging checks.

Section 30 map (30.1–30.5)

Where to start