Backup and Restore

Making data is not a backup. Restoring is. The difference only shows up after an incident. Here is a summary of PostgreSQL's tools, policies, and rehearsals.

1. Types of backups

PostgreSQL backups split into two branches.

Type	Format	Tools	Characteristics
Logical	SQL or custom archive	`pg_dump`, `pg_dumpall`, `pg_restore`	Possible at table or schema level. Portable across versions and architectures. Slow on large DBs.
Physical	Data directory file copy	`pg_basebackup`, file system snapshot + WAL	Fast restore. Compatible only on the same major version and same architecture.

Logical is the basic operational backup. Physical comes up often as the base for PITR.

2. pg_dump, pg_dumpall, pg_restore

pg_dump produces a logical backup of one DB. It can output as SQL text, custom archive (-Fc), or directory format (-Fd).

pg_dump -Fc -d mydb -f mydb.dump
pg_restore -d mydb_new mydb.dump

-Fc enables compression, parallel restore, and selective restore, and is the format most often recommended in production.

pg_dumpall bundles the entire cluster. Roles, tablespaces, and all DBs come together, but data comes out as SQL text only. pg_restore restores pg_dump's custom and directory formats, with parallel restore via -j.

3. WAL archiving

PostgreSQL writes every change to the transaction log (WAL) first. Continuously moving this log to separate storage allows reconstructing any point in time from a full backup plus accumulated WAL.

The configuration core is three settings.

wal_level = replica
archive_mode = on
archive_command = 'cp %p /mnt/wal-archive/%f'

archive_library can replace it as well. The archived WAL is the input to PITR.

4. PITR — Point-In-Time Recovery

The PITR flow has 6 steps.

① Periodically create base backups (pg_basebackup)
② Keep the WAL archive without gaps
③ On incident: restore the base backup into an empty data directory
④ Create the recovery.signal file and configure restore_command
⑤ Specify the target with recovery_target_time (or _lsn or _xid)
⑥ Start Postgres → replay WAL → stop on reaching the target

Recovery starts from the base backup's point in time, so the further back the base, the longer the replay. The base backup cadence and WAL retention policy decide RPO and RTO.

RPO (Recovery Point Objective) — acceptable data loss. Influenced by WAL transfer cadence and delay.
RTO (Recovery Time Objective) — acceptable time to restore. Influenced by base cadence, hardware, and WAL volume.

PITR alone does not bring RPO close to zero. Some places additionally need a synchronous replication standby.

5. Operational tools

Rather than wiring archive_command directly, using a tool is more common.

pgBackRest — written in C. Supports parallel compression, incremental backups, encryption, and S3-compatible storage directly. The most often recommended path for production.

Barman — backed by EnterpriseDB. Python-based. Supports full and incremental backups and WAL streaming.

WAL-G — started by Citus and Yandex. WAL compression, encryption, and friendly support for cloud storage (S3, GCS, Azure Blob). Small operational surface is its strength.

Cloud managed — Amazon RDS, Aurora, Cloud SQL, and Azure Database provide automated backups and PITR managed. The operator focuses on retention period and restore procedure. Limits are that the restored result becomes a new instance and platform lock-in.

6. The 3-2-1 retention rule

A traditional backup retention guideline.

3 copies
2 types of media (local disk + object storage, etc.)
1 offsite (different region, different cloud, different physical location)

The meaning of an offsite copy shows up in broad incidents like data center failures and ransomware.

Retention windows commonly run 14~~30 days daily, 8~~12 weeks weekly, 12 months monthly, and 3~7 years yearly (per regulation and contract). Accumulated backups cost money, so it is safer to automate cleanup with code-driven policies.

7. Encryption and key management

In transit: TLS.
At rest: server-side encryption or client-side GPG, age.
Key management: KMS, Vault, cloud Key Service.

Inspect key access permissions during restore rehearsals. Lose the key and an encrypted backup cannot be restored.

8. Restore rehearsals

The most common incident is "the backup existed but the restore did not work." Periodic rehearsals prevent it.

① PITR with base backup + WAL into a separate environment
② Verify row counts and checksums on the resulting DB
③ Application smoke tests
④ Record elapsed time → validate RTO

Operational examples that automate the rehearsal itself are common.

9. Replication is not a backup

Streaming replication is a failover tool. Logical replication is a major-version and schema-migration tool. Neither is a backup.

If a user deletes data deliberately or by mistake, the replica deletes too. PITR holds the point-in-time recovery capability. Disk snapshots (EBS, ZFS) are fast for backup and restore but consistent snapshots require the pg_backup_start / pg_backup_stop flow (PostgreSQL 15+).

10. Common pitfalls

Settling for pg_dump only — as the DB grows, dump and restore stretch into hours. Unsuitable for PITR.

Gaps in WAL archive — temporary archive_command failures break PITR. Monitoring is mandatory.

Trusting only the backup-completion alarm — alarms may merely say "a file was created." Complement with integrity checks and restore rehearsals.

Version and platform compatibility — physical backups restore only on the same major version and same architecture. Logical backups loosen that assumption.

Extension dependencies — if the restore environment lacks the same extensions, it fails. State them in the environment build step.

Permission void at incident time — when normal-day backup access is closed off from people, the restore can be blocked when an incident hits. Person/role separation and an emergency runbook are needed.

Closing thoughts

Backups are harder to restore than to make. Spend more time on rehearsals than on picking tools. Whether to go all the way to PITR or to settle for daily dumps is decided by RPO.

multi-pg-pool-orchestration
postgres-deep

References: PostgreSQL Backup and Restore, Continuous Archiving and PITR, pgBackRest official, WAL-G GitHub.

Backup and Restore

Making data is not a backup. Restoring is. The difference only shows up after an incident. Here is a summary of PostgreSQL's tools, policies, and rehearsals.

1. Types of backups

PostgreSQL backups split into two branches.

Type	Format	Tools	Characteristics
Logical	SQL or custom archive	`pg_dump`, `pg_dumpall`, `pg_restore`	Possible at table or schema level. Portable across versions and architectures. Slow on large DBs.
Physical	Data directory file copy	`pg_basebackup`, file system snapshot + WAL	Fast restore. Compatible only on the same major version and same architecture.

Logical is the basic operational backup. Physical comes up often as the base for PITR.

2. pg_dump, pg_dumpall, pg_restore

pg_dump produces a logical backup of one DB. It can output as SQL text, custom archive (-Fc), or directory format (-Fd).

pg_dump -Fc -d mydb -f mydb.dump
pg_restore -d mydb_new mydb.dump

-Fc enables compression, parallel restore, and selective restore, and is the format most often recommended in production.

3. WAL archiving

PostgreSQL writes every change to the transaction log (WAL) first. Continuously moving this log to separate storage allows reconstructing any point in time from a full backup plus accumulated WAL.

The configuration core is three settings.

wal_level = replica
archive_mode = on
archive_command = 'cp %p /mnt/wal-archive/%f'

archive_library can replace it as well. The archived WAL is the input to PITR.

4. PITR — Point-In-Time Recovery

The PITR flow has 6 steps.

① Periodically create base backups (pg_basebackup)
② Keep the WAL archive without gaps
③ On incident: restore the base backup into an empty data directory
④ Create the recovery.signal file and configure restore_command
⑤ Specify the target with recovery_target_time (or _lsn or _xid)
⑥ Start Postgres → replay WAL → stop on reaching the target

Recovery starts from the base backup's point in time, so the further back the base, the longer the replay. The base backup cadence and WAL retention policy decide RPO and RTO.

RPO (Recovery Point Objective) — acceptable data loss. Influenced by WAL transfer cadence and delay.
RTO (Recovery Time Objective) — acceptable time to restore. Influenced by base cadence, hardware, and WAL volume.

PITR alone does not bring RPO close to zero. Some places additionally need a synchronous replication standby.

5. Operational tools

Rather than wiring archive_command directly, using a tool is more common.

pgBackRest — written in C. Supports parallel compression, incremental backups, encryption, and S3-compatible storage directly. The most often recommended path for production.

Barman — backed by EnterpriseDB. Python-based. Supports full and incremental backups and WAL streaming.

WAL-G — started by Citus and Yandex. WAL compression, encryption, and friendly support for cloud storage (S3, GCS, Azure Blob). Small operational surface is its strength.

6. The 3-2-1 retention rule

A traditional backup retention guideline.

3 copies
2 types of media (local disk + object storage, etc.)
1 offsite (different region, different cloud, different physical location)

The meaning of an offsite copy shows up in broad incidents like data center failures and ransomware.

7. Encryption and key management

In transit: TLS.
At rest: server-side encryption or client-side GPG, age.
Key management: KMS, Vault, cloud Key Service.

Inspect key access permissions during restore rehearsals. Lose the key and an encrypted backup cannot be restored.

8. Restore rehearsals

The most common incident is "the backup existed but the restore did not work." Periodic rehearsals prevent it.

① PITR with base backup + WAL into a separate environment
② Verify row counts and checksums on the resulting DB
③ Application smoke tests
④ Record elapsed time → validate RTO

Operational examples that automate the rehearsal itself are common.

9. Replication is not a backup

Streaming replication is a failover tool. Logical replication is a major-version and schema-migration tool. Neither is a backup.

10. Common pitfalls

Settling for pg_dump only — as the DB grows, dump and restore stretch into hours. Unsuitable for PITR.

Gaps in WAL archive — temporary archive_command failures break PITR. Monitoring is mandatory.

Trusting only the backup-completion alarm — alarms may merely say "a file was created." Complement with integrity checks and restore rehearsals.

Version and platform compatibility — physical backups restore only on the same major version and same architecture. Logical backups loosen that assumption.

Extension dependencies — if the restore environment lacks the same extensions, it fails. State them in the environment build step.

Closing thoughts

Backups are harder to restore than to make. Spend more time on rehearsals than on picking tools. Whether to go all the way to PITR or to settle for daily dumps is decided by RPO.

multi-pg-pool-orchestration
postgres-deep

References: PostgreSQL Backup and Restore, Continuous Archiving and PITR, pgBackRest official, WAL-G GitHub.

Backup and Restore

Backup and Restore

1. Types of backups

2. pg_dump, pg_dumpall, pg_restore

3. WAL archiving

4. PITR — Point-In-Time Recovery

5. Operational tools

6. The 3-2-1 retention rule

7. Encryption and key management

8. Restore rehearsals

9. Replication is not a backup

10. Common pitfalls

Closing thoughts

Next

More in data

Backup and Restore

Backup and Restore

1. Types of backups

2. pg_dump, pg_dumpall, pg_restore

3. WAL archiving

4. PITR — Point-In-Time Recovery

5. Operational tools

6. The 3-2-1 retention rule

7. Encryption and key management

8. Restore rehearsals

9. Replication is not a backup

10. Common pitfalls

Closing thoughts

Next

More in data