16.2 Designing tool interfaces (inputs/outputs) cleanly
Overview and links for this section of the guide.
On this page
- Goal: tools that are reliable and safe
- Tool interface principles
- Input design (make it hard to misuse)
- Output design (make it easy to use)
- Error design (machine-readable)
- Idempotency and side effects
- Security controls (least privilege)
- Copy-paste tool spec templates
- Tool design checklist
- Where to go next
Goal: tools that are reliable and safe
A good tool interface makes the “right thing” easy and the “wrong thing” hard.
In practice, that means:
- inputs are explicit and validated,
- outputs are structured and stable,
- errors are categorized and predictable,
- side effects are gated and idempotent,
- sensitive data is minimized and protected.
Tools are internal APIs that an LLM calls. Treat them like public APIs: version them, validate them, and keep them small.
Tool interface principles
- Small surface area: fewer tools and fewer parameters per tool.
- Explicit names:
get_order_by_idbeatsget_orderif ambiguity exists. - Typed inputs: schema-validated args; avoid free-form “query” strings where possible.
- Structured outputs: stable JSON objects, not prose.
- Deterministic where possible: tools should return the same output for the same input.
- Clear errors: categorize failures so retries are safe.
Input design (make it hard to misuse)
Common input design patterns:
- Use IDs, not names: “order_id=123” is safer than “order=John’s last order.”
- Bounded strings: limit lengths; forbid newlines if not needed.
- Enums: restrict modes to known options.
- Allowlists: only allow known fields/sorts/filters.
- Validation: reject invalid inputs before calling external systems.
A tool like sql(query: string) is extremely dangerous. Prefer narrow tools like get_order_by_id with validated fields.
Output design (make it easy to use)
Tool outputs should be:
- structured: JSON with stable keys
- minimal: only the fields the model needs
- safe: avoid returning sensitive data unless required
- versioned: include a version id if the shape may evolve
If you return huge blobs, you waste context budget and increase leakage risk.
Error design (machine-readable)
Tools should return errors in a structured way so the model (and your system) can respond correctly.
A practical error envelope:
{
"ok": false,
"error": {
"category": "not_found" | "invalid_input" | "auth" | "rate_limit" | "timeout" | "transient" | "unknown",
"message": "string",
"retryable": true | false
}
}
For success:
{
"ok": true,
"data": { ... }
}
If you want safe retries, the tool should tell you if retrying is safe. Don’t make the model guess.
Idempotency and side effects
Write tools require special care:
- Idempotency keys: repeated calls should not duplicate actions.
- Explicit confirmation: require a human-approved “execute” step.
- Dry-run mode: preview changes before applying.
- Audit logs: record who/what requested the action and what happened.
A safe pattern is: model proposes a change → human confirms → tool executes.
Security controls (least privilege)
Tool security should include:
- allowlisted tools: only expose needed tools to the model
- scoped permissions: tools use credentials with least privilege
- parameter validation: strict schema validation
- output filtering: redact sensitive fields before returning to the model
- budgets: max tool calls per request and per minute
If a tool can access sensitive data, the model can be tricked into requesting it. Minimize and redact what tools can return.
Copy-paste tool spec templates
Template: tool spec
Tool name: get_order_by_id
Purpose:
- Fetch order details needed to answer a user question.
Inputs (schema):
- order_id: string (pattern: ^[A-Z0-9-]{6,32}$)
Outputs:
- ok: boolean
- data (if ok): { order_id, status, created_at, items: [...] }
- error (if !ok): { category, message, retryable }
Security:
- Read-only
- Redact PII fields before returning
Notes:
- Do not return payment details or addresses
Template: tool calling contract for the model
Tool calling rules:
- Call tools only when needed to answer the question.
- Never request sensitive data unless explicitly required by the task.
- Treat tool outputs as the source of truth.
- If a tool returns an error, stop and explain what you need from the user.
Tool design checklist
- Is the tool’s purpose narrow and explicit?
- Are inputs schema-validated and bounded?
- Are outputs structured and minimal?
- Are errors categorized with explicit retryability?
- Is sensitive data minimized/redacted?
- For write tools: idempotency + approval + audit logs?
- Are budgets and allowlists enforced?