AI Tool Use Security: Securing LLM Function Calling
When an LLM can call tools — shell commands, HTTP requests, database queries, file writes — the attack surface explodes. A single prompt injection can pivot through an unsecured tool to the host, internal network, or sensitive data. This guide covers risk classification and concrete defenses for 7 common AI agent tool types.
Tool Risk Matrix
| Tool | Risk | Attack Vector | Defense |
|---|---|---|---|
| Shell / Code Execution | CRITICAL | Prompt injection → arbitrary command execution on host | Run in --read-only container with --cap-drop=ALL. Allowlist permitted commands. 30s hard timeout. Never run as root. |
| HTTP / Web Requests | HIGH | SSRF → internal network access, metadata endpoint, cloud credentials | Allowlist permitted domains/IPs. Block RFC-1918 ranges and link-local (169.254.x.x). Validate URLs before fetch. Log all requests. |
| File System Read | HIGH | Path traversal → read /etc/passwd, ~/.ssh/id_rsa, .env files | Restrict to declared workspace directory. Validate resolved path against workspace root. Block symlink traversal. |
| File System Write | CRITICAL | Overwrite config files, inject malicious code, modify agent behavior | Require human confirmation for all writes. Scope to temp directory only. Audit all write operations. |
| Database Queries | HIGH | SQL injection via LLM-generated queries, data exfiltration | Use parameterized queries only — never string-interpolated SQL. Read-only credentials for read operations. Scope to minimal required tables. |
| Email / Notifications | HIGH | Data exfiltration via email, spam/phishing via LLM-drafted content | Require human approval for all external sends. Allowlist recipients. Content review before send. Rate limit: max 10 emails/hour. |
| Calendar / Scheduling | MEDIUM | Unwanted calendar events, social engineering via agent-created meetings | Human-in-the-loop for all external calendar invites. Scope to own calendar only by default. |
Principle of Least Tool
Start with zero tools. Add back only what the specific task requires. A summarization agent needs no tools at all. A research agent needs HTTP read only. A coding agent needs file read + write in a scoped temp directory only.
# BAD: register all tools "just in case"
agent = Agent(tools=[ShellTool(), FileTool(), HTTPTool(),
EmailTool(), DBTool(), CalendarTool()])
# GOOD: minimum required for the specific task
summarizer = Agent(tools=[]) # No tools needed
researcher = Agent(tools=[HTTPTool(allowlist=["arxiv.org", "pubmed.ncbi.nlm.nih.gov"])])
coder = Agent(tools=[
FileTool(workspace="/tmp/agent-sandbox", mode="rw"),
# Shell removed — use isolated subprocess instead
])Frequently Asked Questions
What is the biggest security risk of LLM function calling?
Unscoped tool access combined with prompt injection. An LLM with access to a shell tool and no sandboxing can be prompted to execute arbitrary commands. The fix: every tool must have a declared scope, run in an isolated container, and dangerous tools (shell, file write, HTTP) require human confirmation or are restricted to an allowlist.
How do I implement human-in-the-loop for AI tool use?
For high-risk tools: before execution, present the proposed tool call (tool name + parameters) to a human operator via a review interface. Only execute after explicit approval. Log: approver identity, approval timestamp, original LLM reasoning. Implement a timeout — if no approval within X minutes, cancel the action.
Can I trust tool outputs fed back to the LLM?
Never unconditionally. Tool outputs can contain adversarial content (e.g., a web page with injected instructions). Sanitize all tool outputs before feeding back to the LLM: strip HTML, extract structured data only, apply the same injection detection as user inputs. Treat tool output as untrusted data, not as trusted system context.
How do I prevent SSRF via AI HTTP tools?
1) Allowlist permitted domains — reject everything else. 2) Resolve the URL and check the IP is not RFC-1918 (10.x, 172.16.x, 192.168.x) or link-local (169.254.x.x). 3) Follow redirects but re-validate each redirect target. 4) Block metadata endpoints: 169.254.169.254 (AWS), metadata.google.internal. 5) Log all HTTP tool calls with URL, response code, response size.