30. Threat Modeling for AI Features
Overview and links for this section of the guide.
On this page
What this section is for
Threat modeling is how you decide what could go wrong before it goes wrong.
For AI features, threat modeling helps you avoid two common mistakes:
- Over-trusting the model: “it will follow instructions” (it won’t, reliably).
- Under-scoping the risk: “it’s just text” (it becomes actions and data access quickly).
This section gives you a concrete threat model tailored to LLM apps: RAG, tool calling, logging, and generated code.
The goal is not perfect security. The goal is to prioritize controls for the highest-impact risks and ship safely.
A simple threat modeling model for AI features
Use four questions:
- What are the assets? (secrets, PII, money, production access, proprietary docs)
- Who are the attackers? (anonymous users, insiders, compromised accounts, malicious documents)
- What are the entry points? (user prompts, file uploads, retrieved docs, tool inputs, logs)
- What are the worst outcomes? (data exfiltration, unsafe actions, policy violations, outages)
Then design defenses that break the attacker’s path.
What you’re protecting (assets)
Most AI features touch at least one of these assets:
- Secrets: API keys, tokens, service credentials.
- PII: names, emails, addresses, account ids, recordings, screenshots.
- Proprietary data: internal docs, code, roadmap, incident details.
- Tooling access: ability to call APIs, execute actions, create tickets, modify data.
- Availability: quotas, rate limits, latency budgets.
Threat modeling starts by listing which of these your feature touches.
Trust boundaries in LLM systems
LLM systems have multiple trust boundaries:
- User input boundary: untrusted prompts and files.
- Retrieval boundary: untrusted documents fetched from corpora.
- Model boundary: model output is untrusted and must be validated.
- Tool boundary: any external API call or action is high risk.
- Logging boundary: logs can leak sensitive content if not controlled.
Your app has permissions. Attackers try to trick your app (via the model) into using those permissions for them. Tool design and permissions are your real security boundary.
Threat modeling workflow (fast and practical)
Use this workflow per feature:
- Diagram the pipeline: input → retrieval → prompt → model → validation → tools → output → logs.
- List assets: secrets, PII, proprietary data, actions.
- List entry points: user input, docs, tool params, logs.
- Enumerate attacker goals: exfiltrate data, trigger actions, degrade service.
- Pick top risks: highest impact + highest likelihood.
- Add controls: least privilege, allowlists, schema enforcement, budgets, audit logs.
- Define tests: fuzz cases, red-team prompts, permission tests, logging checks.
Section 30 map (30.1–30.5)
- 30.1 What attackers want from your AI app
- 30.2 Data exfiltration risks (secrets, PII, proprietary data)
- 30.3 Indirect prompt injection (documents as attackers)
- 30.4 Tool misuse (dangerous function calls)
- 30.5 Supply chain risks (dependencies + generated code)
Where to start
Explore next
30. Threat Modeling for AI Features sub-sections
30.1 What attackers want from your AI app
Open page
30.2 Data exfiltration risks (secrets, PII, proprietary data)
Open page
30.3 Indirect prompt injection (documents as attackers)
Open page
30.4 Tool misuse (dangerous function calls)
Open page
30.5 Supply chain risks (dependencies + generated code)
Open page