30.2 Data exfiltration risks (secrets, PII, proprietary data)
Overview and links for this section of the guide.
On this page
- Goal: prevent the most expensive class of failure
- What “data exfiltration” means in LLM apps
- Categories: secrets vs PII vs proprietary data
- Common leak paths (where data escapes)
- Controls that actually work
- Logging and caching: the quiet leak vector
- RAG-specific exfiltration risks
- Copy-paste prompts (redaction and policy)
- Practical checklist
- Where to go next
Goal: prevent the most expensive class of failure
Data exfiltration is the highest-impact failure mode for many AI products. It’s expensive because it can trigger:
- customer trust loss,
- regulatory obligations,
- incident response costs,
- mandatory notifications and legal exposure,
- and long-term product risk.
The good news: most exfiltration risks are preventable with a small number of strong system controls.
Treat prompts, model outputs, and logs as potential leak surfaces. If sensitive data touches those surfaces, assume it can be exposed unless you actively prevent it.
What “data exfiltration” means in LLM apps
In LLM apps, exfiltration is any path where sensitive information leaves its intended boundary.
That includes:
- Direct leakage: the model outputs sensitive data to the user.
- Indirect leakage: sensitive data appears in logs, analytics events, traces, error reports, or caches.
- Tool-mediated leakage: the model calls a tool that returns sensitive data and includes it in the answer.
- Cross-tenant leakage: one user sees another user’s data due to retrieval/filter/cache bugs.
Categories: secrets vs PII vs proprietary data
Not all sensitive data is the same. Treat these categories differently because they require different controls.
Secrets
Secrets are credentials and tokens that enable access:
- API keys, OAuth tokens, session cookies, service account keys
- database credentials, webhook secrets, signing keys
Secrets are uniquely dangerous: one leak can enable further compromise.
PII (and sensitive personal data)
PII is information that identifies (or can be linked to) a person:
- names, emails, phone numbers, addresses
- account ids, order ids (often indirectly identifying)
- screenshots/recordings containing personal information
PII risk is often compliance-driven and requires minimization, consent, retention, and access control.
Proprietary / confidential business data
This includes internal docs, code, incident details, roadmaps, customer lists, and business metrics. It’s often under contractual obligations and can be strategically damaging.
Secrets should basically never appear in prompts/logs. PII/proprietary data may be processed if you have policy, consent, and strong controls—but default to minimization.
Common leak paths (where data escapes)
These are the leak paths you should assume exist unless you’ve designed against them:
- User prompt contains sensitive data: users paste secrets, logs, or customer records.
- App includes sensitive context: you include internal documents or account data in context without strict filtering.
- Model “helpfully repeats” data: outputs include the exact sensitive content you provided.
- Tool results are echoed: tool call returns raw records; model paraphrases or dumps them.
- Logs capture payloads: you log prompts/responses for debugging and forget to redact.
- Analytics captures text: product analytics events store raw user inputs/outputs.
- Caches mix tenants: cached answers are keyed incorrectly and served to the wrong user.
- RAG retrieval ignores permissions: retrieval fetches chunks outside the user’s scope.
Controls that actually work
Defenses that reliably reduce exfiltration risk:
1) Minimize what enters prompts
- Don’t include entire documents when only a few fields are needed.
- Prefer ids and summaries over raw records.
- Limit “system context” to what is strictly necessary for the task.
2) Redact by default
Redact sensitive fields before they enter:
- prompts,
- retrieved chunks,
- logs/traces,
- error reports.
Redaction should be a deterministic preprocessing step, not something you “ask the model to do.”
3) Validate outputs and filter sensitive patterns
- Use schema enforcement and output validation (Section 31.2).
- Block obviously sensitive patterns (tokens, keys) from appearing in responses.
- Fail closed: if output violates policy, return a safe refusal or redacted response.
4) Isolate by tenant and enforce permissions early
- Permissions must be enforced in retrieval before the model sees content.
- Partition indexes and caches by tenant/user context.
- Do not rely on “the model will only use what it should.”
5) Audit and detect
- Log chunk ids used, not raw chunk text.
- Detect suspicious patterns: repeated “show me secrets” attempts, unusual query volume, repeated failures.
- Have a deletion workflow for logs and cached artifacts.
Even if you don’t store prompts intentionally, prompts can leak via traces, analytics, debug dumps, crash reports, and browser logs. Assume text can be stored unless you design against it.
Logging and caching: the quiet leak vector
Many data leaks come from “helpful debugging.” Decide what is safe to log.
Practical safe defaults:
- Log metadata: prompt_version, model, token counts, latency, request ids.
- Log references: retrieved chunk ids, doc ids, versions, scores.
- Avoid raw payload logging: don’t log full prompts/outputs by default.
- Redact aggressively: if you must log text, redact secrets and PII.
- Partition by environment: production logs should be stricter than dev logs.
Caching safety rules:
- Key on user context: tenant/role must be part of cache key.
- Key on versions: prompt/model/corpus versions prevent stale leaks.
- Prefer caching ids: cache retrieval results (chunk ids), not full text.
- Set retention: short TTLs for sensitive caches.
RAG-specific exfiltration risks
RAG adds two special risks:
- Permission leakage: retrieval returns restricted chunks.
- Indirect injection: retrieved docs contain instructions that attempt to override your rules.
Mitigations:
- enforce permissions before retrieval results enter the prompt,
- treat retrieved text as untrusted data,
- require citations and quotes per claim,
- validate that citations reference only retrieved ids.
Copy-paste prompts (redaction and policy)
Prompt: define what may enter prompts
We are building an AI feature. Help us define a data policy for prompts and logs.
Context:
- Data types handled: [PII/proprietary/support tickets/etc]
- Environments: [dev/staging/prod]
- Users: [internal/external]
Task:
1) Define what data is allowed in prompts (and what is forbidden).
2) Define what data is allowed in logs/traces/analytics (and what is forbidden).
3) Propose redaction rules and retention windows.
4) List the top 5 exfiltration risks and mitigations.
Prompt: redact before analysis (model-assisted, not authoritative)
Scan the text below for potentially sensitive information (secrets, PII, proprietary identifiers).
Output:
1) A checklist of what to redact (categories only).
2) A redacted version of the text where sensitive spans are replaced with [REDACTED].
Do not repeat the sensitive spans in your analysis.
Text:
```text
...
```
Model-assisted redaction can miss things. Use deterministic redaction for known secret patterns and treat model redaction as a helper, not a guarantee.
Practical checklist
- Minimize: include the smallest possible context in prompts.
- Secrets: never include secrets in prompts or logs.
- PII: redact by default; include only with consent/policy and strict retention.
- Permissions: enforce before retrieval and before tool calls.
- Logs: log ids and versions; avoid raw text by default.
- Caches: partition by tenant; key on versions; TTL sensitive data.
- Validation: block outputs that contain sensitive patterns.