211 lines
5.9 KiB
Markdown
211 lines
5.9 KiB
Markdown
# Production DB Recovery and Safety Runbook
|
|
|
|
Last updated: March 10, 2026
|
|
|
|
## Purpose
|
|
|
|
Use this runbook when production data appears lost, reset, or unexpectedly empty.
|
|
|
|
## Safety Rules (Do Not Bypass)
|
|
|
|
1. Never run destructive Prisma commands in production:
|
|
- `prisma migrate reset`
|
|
- `prisma migrate dev`
|
|
- `prisma db push --accept-data-loss`
|
|
2. Never run `docker compose down -v` (or `docker-compose down -v`) against production.
|
|
3. Always restore into an isolated database first.
|
|
4. Always take a fresh backup before migration/cutover actions.
|
|
|
|
## 1) Determine recoverability
|
|
|
|
### 1.1 Identify active Postgres container + volume
|
|
|
|
```bash
|
|
docker ps --format '{{.Names}}'
|
|
docker inspect <postgres-container> --format '{{json .Mounts}}'
|
|
```
|
|
|
|
Record:
|
|
- container name
|
|
- mounted volume name
|
|
- data mount path
|
|
|
|
### 1.2 Enumerate candidate old volumes
|
|
|
|
```bash
|
|
docker volume ls | grep -E 'pgdata|skymoney|postgres'
|
|
```
|
|
|
|
### 1.3 Inspect candidate volumes for PostgreSQL cluster files
|
|
|
|
```bash
|
|
for v in $(docker volume ls --format '{{.Name}}' | grep -E 'pgdata|skymoney|postgres'); do
|
|
echo "== $v =="
|
|
docker run --rm -v "${v}:/var/lib/postgresql/data" alpine sh -lc \
|
|
"ls -la /var/lib/postgresql/data | head -n 30"
|
|
done
|
|
```
|
|
|
|
Look for:
|
|
- `PG_VERSION`
|
|
- `base/`
|
|
- `global/`
|
|
- `pg_wal/`
|
|
|
|
### 1.4 Check filesystem backups and automation history
|
|
|
|
```bash
|
|
ls -la /opt/skymoney/backups
|
|
ls -la /var/backups
|
|
systemctl list-timers --all | grep -Ei 'backup|postgres|skymoney'
|
|
grep -R \"backup.sh\" /etc/cron* /opt/skymoney 2>/dev/null
|
|
```
|
|
|
|
Decision:
|
|
- If valid dump/volume exists -> continue to Section 2.
|
|
- If none exists -> mark irrecoverable and continue to Section 3.
|
|
|
|
## 2) Recover from artifact
|
|
|
|
### 2.1 Restore into isolated recovery DB
|
|
|
|
```bash
|
|
export RECOVERY_DB="skymoney_recovery_$(date +%Y%m%d%H%M)"
|
|
export BACKUP_FILE="/opt/skymoney/backups/<chosen-file>.dump"
|
|
export DATABASE_URL="postgres://<admin-user>:<admin-pass>@127.0.0.1:5432/skymoney"
|
|
export RESTORE_DATABASE_URL="postgres://<admin-user>:<admin-pass>@127.0.0.1:5432/${RECOVERY_DB}"
|
|
export RESTORE_DB="$RECOVERY_DB"
|
|
|
|
cd /opt/skymoney
|
|
./scripts/restore.sh
|
|
```
|
|
|
|
### 2.2 Validate restored data
|
|
|
|
```bash
|
|
psql "$RESTORE_DATABASE_URL" -c 'SELECT COUNT(*) FROM "User";'
|
|
psql "$RESTORE_DATABASE_URL" -c 'SELECT COUNT(*) FROM "Transaction";'
|
|
psql "$RESTORE_DATABASE_URL" -c 'SELECT COUNT(*) FROM "EmailToken";'
|
|
```
|
|
|
|
### 2.3 Cutover (if recovery DB is valid)
|
|
|
|
1. Put app in brief maintenance mode (or stop writes).
|
|
2. Take a fresh backup of current prod DB.
|
|
3. Restore/copy recovered data into production DB.
|
|
4. Run:
|
|
```bash
|
|
docker-compose exec -T api npx prisma migrate deploy
|
|
```
|
|
5. Remove maintenance mode.
|
|
|
|
### 2.4 Post-recovery validation
|
|
|
|
1. Login flow works.
|
|
2. Dashboard and critical routes return expected results.
|
|
3. Security events/logging continue to emit.
|
|
|
|
## 3) Re-create admin/operator access (if no recovery)
|
|
|
|
Note: current schema does not include a built-in `isAdmin` role field. This step restores operational access user credentials only.
|
|
|
|
### 3.1 Create verified operator user
|
|
|
|
```bash
|
|
cd /opt/skymoney/api
|
|
node -e '
|
|
const { PrismaClient } = require("@prisma/client");
|
|
const argon2 = require("argon2");
|
|
(async () => {
|
|
const prisma = new PrismaClient();
|
|
const email = process.env.BOOTSTRAP_EMAIL;
|
|
const password = process.env.BOOTSTRAP_PASSWORD;
|
|
if (!email || !password) throw new Error("BOOTSTRAP_EMAIL and BOOTSTRAP_PASSWORD are required");
|
|
const passwordHash = await argon2.hash(password);
|
|
await prisma.user.upsert({
|
|
where: { email },
|
|
update: { passwordHash, emailVerified: true },
|
|
create: { email, passwordHash, emailVerified: true },
|
|
});
|
|
await prisma.$disconnect();
|
|
console.log("Bootstrap user ready:", email);
|
|
})().catch((e) => { console.error(e); process.exit(1); });
|
|
'
|
|
```
|
|
|
|
### 3.2 Credential handling
|
|
|
|
1. Generate random password in terminal.
|
|
2. Store in password manager.
|
|
3. Rotate immediately after first successful login.
|
|
|
|
## 4) Backup/restore validation against live VPS
|
|
|
|
### 4.1 Run real backup
|
|
|
|
```bash
|
|
cd /opt/skymoney
|
|
BACKUP_ENFORCE_TARGET_CHECK=1 \
|
|
EXPECTED_PROD_DB_HOST=postgres \
|
|
EXPECTED_PROD_DB_NAME=skymoney \
|
|
BACKUP_DIR=/opt/skymoney/backups \
|
|
./scripts/backup.sh
|
|
```
|
|
|
|
### 4.2 Verify latest checksum
|
|
|
|
```bash
|
|
LATEST_DUMP="$(ls -1t /opt/skymoney/backups/*.dump | head -n 1)"
|
|
sha256sum -c "${LATEST_DUMP}.sha256"
|
|
```
|
|
|
|
### 4.3 Restore drill into non-prod DB
|
|
|
|
```bash
|
|
RESTORE_DB="skymoney_restore_test_$(date +%Y%m%d%H%M)"
|
|
RESTORE_DATABASE_URL="postgres://<admin-user>:<admin-pass>@127.0.0.1:5432/${RESTORE_DB}" \
|
|
DATABASE_URL="postgres://<admin-user>:<admin-pass>@127.0.0.1:5432/skymoney" \
|
|
BACKUP_FILE="$LATEST_DUMP" \
|
|
RESTORE_DB="$RESTORE_DB" \
|
|
./scripts/restore.sh
|
|
```
|
|
|
|
### 4.4 Validate restore drill
|
|
|
|
```bash
|
|
psql "postgres://<admin-user>:<admin-pass>@127.0.0.1:5432/${RESTORE_DB}" -c 'SELECT COUNT(*) FROM "User";'
|
|
```
|
|
|
|
### 4.5 Cleanup drill DB
|
|
|
|
```bash
|
|
psql "postgres://<admin-user>:<admin-pass>@127.0.0.1:5432/skymoney" \
|
|
-c "DROP DATABASE IF EXISTS \"${RESTORE_DB}\";"
|
|
```
|
|
|
|
## 5) Deploy hardening controls
|
|
|
|
1. `docker-compose.yml` pins volume name: `skymoney_pgdata`.
|
|
2. Deploy workflow sets `COMPOSE_PROJECT_NAME=skymoney`.
|
|
3. Deploy workflow runs `scripts/validate-prod-db-target.sh`.
|
|
4. Deploy workflow runs `scripts/guard-prod-volume.sh` and blocks deploy when prod volume is missing/empty.
|
|
5. Deploy workflow runs pre-migration `scripts/backup.sh`.
|
|
6. Deploy workflow uses `prisma migrate deploy` only.
|
|
7. Security DB workflow runs `scripts/validate-test-db-target.sh` and refuses protected DB names (`skymoney`, `postgres`, `template*`).
|
|
8. DB-backed test runtime (`api/tests/setup.ts`) refuses protected DB targets before any `deleteMany` cleanup runs.
|
|
|
|
### Intentional rebuild override
|
|
|
|
If you intentionally need to initialize an empty production volume (rare), set:
|
|
|
|
```bash
|
|
ALLOW_EMPTY_PROD_VOLUME=1
|
|
```
|
|
|
|
for one deploy run only, then reset it back to `0`.
|
|
|
|
## Quarterly drill requirement
|
|
|
|
Run a full backup + restore drill every quarter and record evidence in:
|
|
- `tests-results-for-OWASP/evidence-log-template.md`
|