30.2 Data exfiltration risks (secrets, PII, proprietary data)

On this page

Goal: prevent the most expensive class of failure
What “data exfiltration” means in LLM apps
Categories: secrets vs PII vs proprietary data
Common leak paths (where data escapes)
Controls that actually work
Logging and caching: the quiet leak vector
RAG-specific exfiltration risks
Copy-paste prompts (redaction and policy)
Practical checklist
Where to go next

Goal: prevent the most expensive class of failure

Data exfiltration is the highest-impact failure mode for many AI products. It’s expensive because it can trigger:

customer trust loss,
regulatory obligations,
incident response costs,
mandatory notifications and legal exposure,
and long-term product risk.

The good news: most exfiltration risks are preventable with a small number of strong system controls.

Security posture

Treat prompts, model outputs, and logs as potential leak surfaces. If sensitive data touches those surfaces, assume it can be exposed unless you actively prevent it.

What “data exfiltration” means in LLM apps

In LLM apps, exfiltration is any path where sensitive information leaves its intended boundary.

That includes:

Direct leakage: the model outputs sensitive data to the user.
Indirect leakage: sensitive data appears in logs, analytics events, traces, error reports, or caches.
Tool-mediated leakage: the model calls a tool that returns sensitive data and includes it in the answer.
Cross-tenant leakage: one user sees another user’s data due to retrieval/filter/cache bugs.

Categories: secrets vs PII vs proprietary data

Not all sensitive data is the same. Treat these categories differently because they require different controls.

Secrets

Secrets are credentials and tokens that enable access:

API keys, OAuth tokens, session cookies, service account keys
database credentials, webhook secrets, signing keys

Secrets are uniquely dangerous: one leak can enable further compromise.

PII (and sensitive personal data)

PII is information that identifies (or can be linked to) a person:

names, emails, phone numbers, addresses
account ids, order ids (often indirectly identifying)
screenshots/recordings containing personal information

PII risk is often compliance-driven and requires minimization, consent, retention, and access control.

Proprietary / confidential business data

This includes internal docs, code, incident details, roadmaps, customer lists, and business metrics. It’s often under contractual obligations and can be strategically damaging.

Different data, different default

Secrets should basically never appear in prompts/logs. PII/proprietary data may be processed if you have policy, consent, and strong controls—but default to minimization.

Common leak paths (where data escapes)

These are the leak paths you should assume exist unless you’ve designed against them:

User prompt contains sensitive data: users paste secrets, logs, or customer records.
App includes sensitive context: you include internal documents or account data in context without strict filtering.
Model “helpfully repeats” data: outputs include the exact sensitive content you provided.
Tool results are echoed: tool call returns raw records; model paraphrases or dumps them.
Logs capture payloads: you log prompts/responses for debugging and forget to redact.
Analytics captures text: product analytics events store raw user inputs/outputs.
Caches mix tenants: cached answers are keyed incorrectly and served to the wrong user.
RAG retrieval ignores permissions: retrieval fetches chunks outside the user’s scope.

Controls that actually work

Defenses that reliably reduce exfiltration risk:

1) Minimize what enters prompts

Don’t include entire documents when only a few fields are needed.
Prefer ids and summaries over raw records.
Limit “system context” to what is strictly necessary for the task.

2) Redact by default

Redact sensitive fields before they enter:

prompts,
retrieved chunks,
logs/traces,
error reports.

Redaction should be a deterministic preprocessing step, not something you “ask the model to do.”

3) Validate outputs and filter sensitive patterns

Use schema enforcement and output validation (Section 31.2).
Block obviously sensitive patterns (tokens, keys) from appearing in responses.
Fail closed: if output violates policy, return a safe refusal or redacted response.

4) Isolate by tenant and enforce permissions early

Permissions must be enforced in retrieval before the model sees content.
Partition indexes and caches by tenant/user context.
Do not rely on “the model will only use what it should.”

5) Audit and detect

Log chunk ids used, not raw chunk text.
Detect suspicious patterns: repeated “show me secrets” attempts, unusual query volume, repeated failures.
Have a deletion workflow for logs and cached artifacts.

“We don’t store prompts” is not enough

Even if you don’t store prompts intentionally, prompts can leak via traces, analytics, debug dumps, crash reports, and browser logs. Assume text can be stored unless you design against it.

Logging and caching: the quiet leak vector

Many data leaks come from “helpful debugging.” Decide what is safe to log.

Practical safe defaults:

Log metadata: prompt_version, model, token counts, latency, request ids.
Log references: retrieved chunk ids, doc ids, versions, scores.
Avoid raw payload logging: don’t log full prompts/outputs by default.
Redact aggressively: if you must log text, redact secrets and PII.
Partition by environment: production logs should be stricter than dev logs.

Caching safety rules:

Key on user context: tenant/role must be part of cache key.
Key on versions: prompt/model/corpus versions prevent stale leaks.
Prefer caching ids: cache retrieval results (chunk ids), not full text.
Set retention: short TTLs for sensitive caches.

RAG-specific exfiltration risks

RAG adds two special risks:

Permission leakage: retrieval returns restricted chunks.
Indirect injection: retrieved docs contain instructions that attempt to override your rules.

Mitigations:

enforce permissions before retrieval results enter the prompt,
treat retrieved text as untrusted data,
require citations and quotes per claim,
validate that citations reference only retrieved ids.

Copy-paste prompts (redaction and policy)

Prompt: define what may enter prompts

We are building an AI feature. Help us define a data policy for prompts and logs.

Context:
- Data types handled: [PII/proprietary/support tickets/etc]
- Environments: [dev/staging/prod]
- Users: [internal/external]

Task:
1) Define what data is allowed in prompts (and what is forbidden).
2) Define what data is allowed in logs/traces/analytics (and what is forbidden).
3) Propose redaction rules and retention windows.
4) List the top 5 exfiltration risks and mitigations.

Prompt: redact before analysis (model-assisted, not authoritative)

Scan the text below for potentially sensitive information (secrets, PII, proprietary identifiers).

Output:
1) A checklist of what to redact (categories only).
2) A redacted version of the text where sensitive spans are replaced with [REDACTED].

Do not repeat the sensitive spans in your analysis.

Text:
```text
...
```

Important

Model-assisted redaction can miss things. Use deterministic redaction for known secret patterns and treat model redaction as a helper, not a guarantee.

Practical checklist

Minimize: include the smallest possible context in prompts.
Secrets: never include secrets in prompts or logs.
PII: redact by default; include only with consent/policy and strict retention.
Permissions: enforce before retrieval and before tool calls.
Logs: log ids and versions; avoid raw text by default.
Caches: partition by tenant; key on versions; TTL sensitive data.
Validation: block outputs that contain sensitive patterns.

30.2 Data exfiltration risks (secrets, PII, proprietary data)

Goal: prevent the most expensive class of failure

What “data exfiltration” means in LLM apps

Categories: secrets vs PII vs proprietary data

Secrets

PII (and sensitive personal data)

Proprietary / confidential business data

Common leak paths (where data escapes)

Controls that actually work

1) Minimize what enters prompts

2) Redact by default

3) Validate outputs and filter sensitive patterns

4) Isolate by tenant and enforce permissions early

5) Audit and detect

Logging and caching: the quiet leak vector

RAG-specific exfiltration risks

Copy-paste prompts (redaction and policy)

Prompt: define what may enter prompts

Prompt: redact before analysis (model-assisted, not authoritative)

Practical checklist

Where to go next