LLM Observability: Monitoring & Tracing für KI-Agenten
LLMs sind nicht deterministisch — klassische APM-Tools versagen. Moltbot bringt vollständige Observability: Prompt-Traces, Qualitätsmetriken, Sicherheits-Events und Cost-Tracking — vollständig self-hosted.
Was ist LLM Observability? Einfach erklärt
LLM Observability überwacht nicht-deterministische KI-Systeme: Performance-Metriken wie Latenz P50/P95/P99, Time to First Token und Tokens per Second messen Modell-Throughput. Cost-Metriken wie Token Usage, Cost per Request und Cache Hit Rate kontrollieren Budget. Quality & Security-Metriken wie Halluzination Rate, Refusal Rate, Injection Detection Rate und PII Exposure Rate schützen vor Quality-Degradation und Angriffen. Prompt-Traces erstellen vollständige Causal-Trace von User Request bis Response. Ohne Observability sind KI-Systeme Black-Boxes ohne Insight.
↓ Springe zu Schlüsselmetriken
Schlüsselmetriken
Performance
End-to-end response time per model and agent. Alert on regressions.
Streaming latency — time until first token arrives at client.
Model throughput. Critical for capacity planning and SLA.
% of context window used per request. Alert at 80%+ — quality degrades.
Cost
Input + output tokens per request, agent, user, and time period.
Calculated cost based on model pricing. Budget alerts per team/project.
Business-level metric: cost per successful task completion.
Semantic cache hits. Higher = lower cost. Track per prompt template.
Quality & Security
% responses flagged by factual consistency checker. Track per model version.
% requests refused by model. Spike = prompt engineering issue or injection attempt.
% inputs flagged as potential prompt injection. Spike = active attack.
% responses containing PII before redaction. Must be 0% in production.
Prometheus-Integration
# Moltbot exposes Prometheus metrics at /metrics
# prometheus.yml scrape config:
scrape_configs:
- job_name: moltbot_llm
static_configs:
- targets: ['moltbot:9090']
metrics_path: /metrics
# Key metrics exposed:
# moltbot_llm_request_duration_seconds{model, agent, status}
# moltbot_llm_tokens_total{model, type} # type: input|output
# moltbot_llm_cost_usd_total{model, agent}
# moltbot_security_injections_detected_total
# moltbot_security_pii_redactions_total
# moltbot_agent_tool_calls_total{tool, agent, status}
# moltbot_hitl_pending_approvals
# Grafana dashboard import:
# ClawGuru LLM Dashboard ID: 21847 (grafana.com/dashboards)Häufige Fragen
Why is LLM observability different from traditional APM?
Traditional APM measures deterministic systems: same input → same output → same latency. LLMs are stochastic: same input can produce different outputs with different quality levels. This requires new metrics: hallucination rate (did the model make up facts?), refusal rate (is the model refusing valid requests?), semantic similarity (is the output meaningfully different from last week?). Traditional APM tools miss all of these. Moltbot's LLM observability layer was built specifically for probabilistic AI systems.
How does prompt tracing work?
Every LLM call is recorded with: input prompt (hashed + optionally stored), system message, model parameters (temperature, top_p, max_tokens), output tokens generated, latency breakdown (time to first token, generation time), tool calls made, security scan results, and a unique trace ID that links parent agent calls to child LLM calls. This creates a complete causal trace from user request → agent decision → LLM call → tool execution → response.
How do I detect LLM quality regressions?
Moltbot supports three quality regression detection methods: 1) Automated evals — run a fixed test set against every model/prompt change, compare output similarity to golden set. 2) Statistical process control — flag when hallucination rate or refusal rate exceeds 2-sigma from baseline. 3) User feedback correlation — link thumbs down / escalations to specific prompt versions and model settings. Any of these triggers a regression alert in your monitoring dashboard.
Can I run LLM observability without sending data to the cloud?
Yes — this is Moltbot's primary value proposition. All traces, metrics and logs are stored locally in your infrastructure (ClickHouse or PostgreSQL). The observability dashboard runs as a self-hosted web app. No data leaves your network. For air-gapped or high-security environments, Moltbot supports offline mode where even model calls go to local Ollama/LocalAI — full observability with zero external dependencies.