Backup & Disaster Recovery · Production-Ready Guide

Moltbot Backup & Disaster Recovery — You Have No Backup, No RTO/RPO, No DR Test. Database Crash, Ransomware, Data Center Outage. 72h Downtime, Data Loss, Your CEO Fired the CISO.

Q: Wie oft sollte ich PostgreSQL Backups machen?

Für Produktionsdatenbanken: Mindestens stündlich mit WAL Streaming für Point-in-Time Recovery. Tägliches Full Backup als Baseline. 30 Tage Retention mit verschlüsseltem Cloud Storage. Teste Recovery monatlich.

Q: Was ist Geo-Redundanz?

Geo-Redundanz bedeutet, dass deine Infrastruktur in mindestens zwei geografisch getrennten Rechnzentren läuft. Wenn ein Rechenzentrum ausfällt (Feuer, Netzwerk, Naturkatastrophe), übernimmt das andere automatisch. Für Moltbot: Primary in EU-West, Secondary in EU-Central mit automatischem Failover.

Q: Wie teste ich Disaster Recovery?

DR-Test-Schedule: Monatlich: Backup-Integritätsprüfung und Restore-Test. Quartalsweise: Full Failover-Test mit Traffic-Switch. Jährlich: Full Disaster Recovery Simulation mit Ransomware-Szenario. Dokumentiere alle Ergebnisse und Lessons Learned.

You have no automated backup, no defined RTO/RPO goals and no tested DR process. Database crash, ransomware, data center outage. 72h downtime, data loss, your CEO fired the CISO. Here's how to prevent it.

"Not a Pentest" Trust-Anker: This guide is for securing your own systems with reliable backup and recovery strategies. No attack tools.

What is Disaster Recovery? Simply explained.

Think of disaster recovery like insurance for your infrastructure: when everything goes wrong — database crash, ransomware, data center outage — you have a plan. For Moltbot, this means: automated backups, defined RTO/RPO, geo-redundancy, tested failover processes. Good DR means: never lose data, never be down for long.

↓ Jump to technical depth

RTO/RPO Tiers for Moltbot

Tier	Service	RTO	RPO	Backup Freq.
T1	Auth Service	5 Min	1 Min	Continuous
T1	Database (Primary)	15 Min	5 Min	WAL Streaming
T2	API Gateway	30 Min	15 Min	Stündlich
T2	Redis Cache	30 Min	0 (rebuild)	Täglich
T3	File Storage	4 Std	1 Std	Stündlich
T3	Analytics DB	24 Std	24 Std	Täglich

Automated PostgreSQL Backup

#!/bin/bash
# moltbot-backup.sh — Automatisiertes PostgreSQL Backup

set -euo pipefail

BACKUP_DIR="/backups/postgres"
DB_URL="$DATABASE_URL"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="$BACKUP_DIR/moltbot_$TIMESTAMP.sql.gz"
RETENTION_DAYS=30

# 1. Backup erstellen
echo "[INFO] Starting backup: $BACKUP_FILE"
pg_dump "$DB_URL" | gzip > "$BACKUP_FILE"

# 2. Integrität prüfen
gunzip -t "$BACKUP_FILE" || { echo "[ERROR] Backup corrupt!"; exit 1; }
echo "[INFO] Backup integrity OK ($(du -h $BACKUP_FILE | cut -f1))"

# 3. Verschlüsselt in Cloud Storage hochladen
aws s3 cp "$BACKUP_FILE" \
  "s3://moltbot-backups/postgres/$TIMESTAMP/" \
  --server-side-encryption aws:kms \
  --sse-kms-key-id "$AWS_KMS_KEY_ID"

# 4. Alte Backups löschen (Retention)
find "$BACKUP_DIR" -name "*.sql.gz" -mtime +$RETENTION_DAYS -delete
aws s3 ls s3://moltbot-backups/postgres/ | \
  awk '{print $4}' | \
  head -n -$RETENTION_DAYS | \
  xargs -I{} aws s3 rm "s3://moltbot-backups/postgres/{}"

echo "[SUCCESS] Backup completed: $BACKUP_FILE"

Real-World Scars: Production Incidents

SCAR #1: No Backup Before DeploymentCRITICAL

No backup before deployment. Schema change breaks DB, no rollback possible. 24h downtime. Fix: Pre-deployment backup with automatic rollback.

Root Cause: No pre-deployment backup. Lessons: Enable automatic backup before every deployment.

SCAR #2: Backup Test Never RunHIGH

Backup test never run. Restore during incident fails, backup corrupt. 48h downtime. Fix: Monthly restore test with integrity check.

Root Cause: No backup test. Lessons: Enable monthly restore test with integrity check.

Immediate Actions: What to do today?

Define RTO/RPO

Define RTO/RPO for all services. Classify by criticality.

Enable automated backups

Enable automated backups for PostgreSQL with WAL streaming.

Configure geo-redundancy

Enable geo-redundancy with automatic failover.

Interactive DR Checklist

DR Maturity Score Calculator

Have you defined RTO/RPO?

Are automated backups active?

Is geo-redundancy configured?

Has a DR test been run?

Your DR Maturity Score:0/100

Industry Average: 19/100

Frequently Asked Questions

Was ist RTO vs RPO?

RTO (Recovery Time Objective): Wie lange dauert es, den Service nach einem Ausfall wiederherzustellen. RPO (Recovery Point Objective): Wie viel Datenverlust ist akzeptabel (Zeit seit dem letzten Backup). Für kritische Systeme: RTO < 15 Min, RPO < 5 Min.

Wie oft sollte ich PostgreSQL Backups machen?

Für Produktionsdatenbanken: Mindestens stündlich mit WAL Streaming für Point-in-Time Recovery. Tägliches Full Backup als Baseline. 30 Tage Retention mit verschlüsseltem Cloud Storage. Teste Recovery monatlich.

Was ist Geo-Redundanz?

Geo-Redundanz bedeutet, dass deine Infrastruktur in mindestens zwei geografisch getrennten Rechnzentren läuft. Wenn ein Rechenzentrum ausfällt (Feuer, Netzwerk, Naturkatastrophe), übernimmt das andere automatisch. Für Moltbot: Primary in EU-West, Secondary in EU-Central mit automatischem Failover.

Wie teste ich Disaster Recovery?

DR-Test-Schedule: Monatlich: Backup-Integritätsprüfung und Restore-Test. Quartalsweise: Full Failover-Test mit Traffic-Switch. Jährlich: Full Disaster Recovery Simulation mit Ransomware-Szenario. Dokumentiere alle Ergebnisse und Lessons Learned.

R. Schwertfechter

✓ Verified

Principal Ops-Engineer & Security Architect

📅 Published: 01.05.2026🔄 Last reviewed: 01.05.2026

15+ years experience as Ops-Engineer, Incident Responder and Security Architect. Expert in disaster recovery, backup strategies, RTO/RPO and failover.

Further Resources