"Not a Pentest" Notice: This guide is for securing your own AI agent tools. No attack tools.

Moltbot AI Security · Batch 5

AI Tool Use Security: Securing LLM Function Calling

When an LLM can call tools — shell commands, HTTP requests, database queries, file writes — the attack surface explodes. A single prompt injection can pivot through an unsecured tool to the host, internal network, or sensitive data. This guide covers risk classification and concrete defenses for 7 common AI agent tool types.

Tool risk categories

CRITICAL-risk tool types

HITL

Required for write tools

Trusted tool outputs

Tool Risk Matrix

Tool	Risk	Attack Vector	Defense
Shell / Code Execution	CRITICAL	Prompt injection → arbitrary command execution on host	Run in --read-only container with --cap-drop=ALL. Allowlist permitted commands. 30s hard timeout. Never run as root.
HTTP / Web Requests	HIGH	SSRF → internal network access, metadata endpoint, cloud credentials	Allowlist permitted domains/IPs. Block RFC-1918 ranges and link-local (169.254.x.x). Validate URLs before fetch. Log all requests.
File System Read	HIGH	Path traversal → read /etc/passwd, ~/.ssh/id_rsa, .env files	Restrict to declared workspace directory. Validate resolved path against workspace root. Block symlink traversal.
File System Write	CRITICAL	Overwrite config files, inject malicious code, modify agent behavior	Require human confirmation for all writes. Scope to temp directory only. Audit all write operations.
Database Queries	HIGH	SQL injection via LLM-generated queries, data exfiltration	Use parameterized queries only — never string-interpolated SQL. Read-only credentials for read operations. Scope to minimal required tables.
Email / Notifications	HIGH	Data exfiltration via email, spam/phishing via LLM-drafted content	Require human approval for all external sends. Allowlist recipients. Content review before send. Rate limit: max 10 emails/hour.
Calendar / Scheduling	MEDIUM	Unwanted calendar events, social engineering via agent-created meetings	Human-in-the-loop for all external calendar invites. Scope to own calendar only by default.

Principle of Least Tool

Start with zero tools. Add back only what the specific task requires. A summarization agent needs no tools at all. A research agent needs HTTP read only. A coding agent needs file read + write in a scoped temp directory only.

# BAD: register all tools "just in case"
agent = Agent(tools=[ShellTool(), FileTool(), HTTPTool(),
                     EmailTool(), DBTool(), CalendarTool()])

# GOOD: minimum required for the specific task
summarizer = Agent(tools=[])  # No tools needed
researcher = Agent(tools=[HTTPTool(allowlist=["arxiv.org", "pubmed.ncbi.nlm.nih.gov"])])
coder = Agent(tools=[
  FileTool(workspace="/tmp/agent-sandbox", mode="rw"),
  # Shell removed — use isolated subprocess instead
])

Frequently Asked Questions

What is the biggest security risk of LLM function calling?

Unscoped tool access combined with prompt injection. An LLM with access to a shell tool and no sandboxing can be prompted to execute arbitrary commands. The fix: every tool must have a declared scope, run in an isolated container, and dangerous tools (shell, file write, HTTP) require human confirmation or are restricted to an allowlist.

How do I implement human-in-the-loop for AI tool use?

For high-risk tools: before execution, present the proposed tool call (tool name + parameters) to a human operator via a review interface. Only execute after explicit approval. Log: approver identity, approval timestamp, original LLM reasoning. Implement a timeout — if no approval within X minutes, cancel the action.

Can I trust tool outputs fed back to the LLM?

Never unconditionally. Tool outputs can contain adversarial content (e.g., a web page with injected instructions). Sanitize all tool outputs before feeding back to the LLM: strip HTML, extract structured data only, apply the same injection detection as user inputs. Treat tool output as untrusted data, not as trusted system context.

How do I prevent SSRF via AI HTTP tools?

1) Allowlist permitted domains — reject everything else. 2) Resolve the URL and check the IP is not RFC-1918 (10.x, 172.16.x, 192.168.x) or link-local (169.254.x.x). 3) Follow redirects but re-validate each redirect target. 4) Block metadata endpoints: 169.254.169.254 (AWS), metadata.google.internal. 5) Log all HTTP tool calls with URL, response code, response size.

Further Resources

AI Agent Security Hub

OWASP LLM Top 10 — full defense map

AI Agent Sandboxing

Container isolation for tool execution

Prompt Injection Defense

Block injection before tool invocation

AI Red Teaming

Test your tool security defenses