codingstairs
NotesEDULifeContact
⌕Search⌘K
koen

Navigation

  • Intro
  • Blog
  • Life

Get in touch

Send without signing in. Add your email if you'd like a reply.

  • Leave a message anonymously →
  • ✉ warragon112@gmail.com
  • KakaoTalk Open Chat ↗

© 2026 codingstairs

  • Notes
  • EDU
  • Search
  • Life
  • Contact
  • Legal
  • RSS
  • GitHub
Notes›data

Backup and Restore

Published 2026-04-28· Updated 2026-05-18·0 views

Backup and Restore

Making data is not a backup. Restoring is. The difference only shows up after an incident. Here is a summary of PostgreSQL's tools, policies, and rehearsals.

1. Types of backups

PostgreSQL backups split into two branches.

Type Format Tools Characteristics
Logical SQL or custom archive pg_dump, pg_dumpall, pg_restore Possible at table or schema level. Portable across versions and architectures. Slow on large DBs.
Physical Data directory file copy pg_basebackup, file system snapshot + WAL Fast restore. Compatible only on the same major version and same architecture.

Logical is the basic operational backup. Physical comes up often as the base for PITR.

2. pg_dump, pg_dumpall, pg_restore

pg_dump produces a logical backup of one DB. It can output as SQL text, custom archive (-Fc), or directory format (-Fd).

pg_dump -Fc -d mydb -f mydb.dump
pg_restore -d mydb_new mydb.dump

-Fc enables compression, parallel restore, and selective restore, and is the format most often recommended in production.

pg_dumpall bundles the entire cluster. Roles, tablespaces, and all DBs come together, but data comes out as SQL text only. pg_restore restores pg_dump's custom and directory formats, with parallel restore via -j.

3. WAL archiving

PostgreSQL writes every change to the transaction log (WAL) first. Continuously moving this log to separate storage allows reconstructing any point in time from a full backup plus accumulated WAL.

The configuration core is three settings.

wal_level = replica
archive_mode = on
archive_command = 'cp %p /mnt/wal-archive/%f'

archive_library can replace it as well. The archived WAL is the input to PITR.

4. PITR — Point-In-Time Recovery

The PITR flow has 6 steps.

① Periodically create base backups (pg_basebackup)
② Keep the WAL archive without gaps
③ On incident: restore the base backup into an empty data directory
④ Create the recovery.signal file and configure restore_command
⑤ Specify the target with recovery_target_time (or _lsn or _xid)
⑥ Start Postgres → replay WAL → stop on reaching the target

Recovery starts from the base backup's point in time, so the further back the base, the longer the replay. The base backup cadence and WAL retention policy decide RPO and RTO.

  • RPO (Recovery Point Objective) — acceptable data loss. Influenced by WAL transfer cadence and delay.
  • RTO (Recovery Time Objective) — acceptable time to restore. Influenced by base cadence, hardware, and WAL volume.

PITR alone does not bring RPO close to zero. Some places additionally need a synchronous replication standby.

5. Operational tools

Rather than wiring archive_command directly, using a tool is more common.

pgBackRest — written in C. Supports parallel compression, incremental backups, encryption, and S3-compatible storage directly. The most often recommended path for production.

Barman — backed by EnterpriseDB. Python-based. Supports full and incremental backups and WAL streaming.

WAL-G — started by Citus and Yandex. WAL compression, encryption, and friendly support for cloud storage (S3, GCS, Azure Blob). Small operational surface is its strength.

Cloud managed — Amazon RDS, Aurora, Cloud SQL, and Azure Database provide automated backups and PITR managed. The operator focuses on retention period and restore procedure. Limits are that the restored result becomes a new instance and platform lock-in.

6. The 3-2-1 retention rule

A traditional backup retention guideline.

  • 3 copies
  • 2 types of media (local disk + object storage, etc.)
  • 1 offsite (different region, different cloud, different physical location)

The meaning of an offsite copy shows up in broad incidents like data center failures and ransomware.

Retention windows commonly run 1430 days daily, 812 weeks weekly, 12 months monthly, and 3~7 years yearly (per regulation and contract). Accumulated backups cost money, so it is safer to automate cleanup with code-driven policies.

7. Encryption and key management

  • In transit: TLS.
  • At rest: server-side encryption or client-side GPG, age.
  • Key management: KMS, Vault, cloud Key Service.

Inspect key access permissions during restore rehearsals. Lose the key and an encrypted backup cannot be restored.

8. Restore rehearsals

The most common incident is "the backup existed but the restore did not work." Periodic rehearsals prevent it.

① PITR with base backup + WAL into a separate environment
② Verify row counts and checksums on the resulting DB
③ Application smoke tests
④ Record elapsed time → validate RTO

Operational examples that automate the rehearsal itself are common.

9. Replication is not a backup

Streaming replication is a failover tool. Logical replication is a major-version and schema-migration tool. Neither is a backup.

If a user deletes data deliberately or by mistake, the replica deletes too. PITR holds the point-in-time recovery capability. Disk snapshots (EBS, ZFS) are fast for backup and restore but consistent snapshots require the pg_backup_start / pg_backup_stop flow (PostgreSQL 15+).

10. Common pitfalls

Settling for pg_dump only — as the DB grows, dump and restore stretch into hours. Unsuitable for PITR.

Gaps in WAL archive — temporary archive_command failures break PITR. Monitoring is mandatory.

Trusting only the backup-completion alarm — alarms may merely say "a file was created." Complement with integrity checks and restore rehearsals.

Version and platform compatibility — physical backups restore only on the same major version and same architecture. Logical backups loosen that assumption.

Extension dependencies — if the restore environment lacks the same extensions, it fails. State them in the environment build step.

Permission void at incident time — when normal-day backup access is closed off from people, the restore can be blocked when an incident hits. Person/role separation and an emergency runbook are needed.

Closing thoughts

Backups are harder to restore than to make. Spend more time on rehearsals than on picking tools. Whether to go all the way to PITR or to settle for daily dumps is decided by RPO.

Next

  • multi-pg-pool-orchestration
  • postgres-deep

References: PostgreSQL Backup and Restore, Continuous Archiving and PITR, pgBackRest official, WAL-G GitHub.

More in data

All in this category →
  • Keep DB seed sources outside the code tree
  • Supabase Storage — File Upload and Permissions
  • Kafka in Practice — Topic Design and Message Flow
  • Orchestrating multiple PostgreSQL pools
  • Image Pipeline
  • Push Notifications — FCM and Web Push