Self-Hosted LLM Gateway Hardening Guide 2026
Default Ollama installations expose port 11434 to all interfaces with zero authentication. If your LLM gateway is accessible on your local network, it's accessible to every device on that network — including attacker-controlled ones. This guide closes every gap.
🚨 Default State Is Dangerous
Running ollama serve out of the box listens on 0.0.0.0:11434 with no auth. Anyone who can reach your machine can:
- • Query your models for free (at your GPU cost)
- • Extract model behavior systematically
- • Inject malicious prompts into your pipeline
- • Read your chat history if not isolated per-user
Risk Assessment: Default LLM Gateway
Default Ollama/LocalAI installations expose port 11434 with no authentication. Anyone on the network can query your models.
Unlimited requests drain GPU resources, run up cloud bills, and enable model extraction attacks.
LLM traffic (including prompt contents) transmitted in plaintext — interceptable on local network or by co-tenant in cloud.
Zero visibility into who queried what, when, with what prompts. Forensically blind.
LLM gateway accessible from all subnets instead of only application services that need it.
Hardening Steps
1. Bind to localhost, not 0.0.0.0
# Ollama OLLAMA_HOST=127.0.0.1:11434 ollama serve # LocalAI config.yaml address: "127.0.0.1:8080" # NOT 0.0.0.0 # Verify — should ONLY show 127.0.0.1 ss -tlnp | grep 11434
2. Reverse proxy with authentication
# nginx config for LLM gateway
server {
listen 443 ssl;
server_name llm.internal.example.com;
# mTLS — only trusted clients
ssl_client_certificate /etc/nginx/certs/internal-ca.crt;
ssl_verify_client on;
# OR API key auth
location / {
auth_request /validate-key;
proxy_pass http://127.0.0.1:11434;
}
location = /validate-key {
internal;
proxy_pass http://127.0.0.1:8081/validate;
}
}3. Rate limiting per API key
# nginx rate limiting
limit_req_zone $http_x_api_key zone=llm_per_key:10m rate=10r/m;
limit_req_zone $binary_remote_addr zone=llm_per_ip:10m rate=30r/m;
location /api/ {
limit_req zone=llm_per_key burst=5 nodelay;
limit_req zone=llm_per_ip burst=10 nodelay;
proxy_pass http://127.0.0.1:11434;
}4. Audit logging via proxy
// LLM gateway audit middleware (Node.js/Express)
app.use('/api', async (req, res, next) => {
const start = Date.now()
const logEntry = {
ts: new Date().toISOString(),
apiKey: hashKey(req.headers['x-api-key']),
ip: req.ip,
model: req.body?.model,
promptHash: sha256(req.body?.prompt ?? ''),
promptLength: (req.body?.prompt ?? '').length,
}
res.on('finish', () => {
logEntry.duration = Date.now() - start
logEntry.status = res.statusCode
auditLog.write(logEntry)
})
next()
})5. Network isolation with iptables/nftables
# Allow only app server to reach LLM gateway iptables -A INPUT -p tcp --dport 11434 -s 10.0.1.5 -j ACCEPT # app server IP iptables -A INPUT -p tcp --dport 11434 -j DROP # block all others # Verify iptables -L INPUT -n -v | grep 11434
LiteLLM Proxy as Secure Gateway
LiteLLM Proxy provides a hardened, unified gateway for multiple LLM providers with built-in auth, rate limiting, and spend tracking:
# litellm_config.yaml
model_list:
- model_name: "moltbot-llm"
litellm_params:
model: "ollama/mistral"
api_base: "http://127.0.0.1:11434"
general_settings:
master_key: "sk-${LITELLM_MASTER_KEY}" # from env var
litellm_settings:
max_budget: 100 # USD spend limit
budget_duration: "1mo"
success_callback: ["langfuse"] # audit trail
router_settings:
routing_strategy: "usage-based-routing"
num_retries: 3