SQL as the single source of truth

Where should the schema's truth live? ORM models, migration files, DDL SQL — there are several candidates. We compare the cumulative-migration model with the single-SQL-file + CREATE IF NOT EXISTS model.

1. The place of migration tools

Representative migration tools:

Tool	First appeared	Language/ecosystem	Model
Liquibase	2006	JVM	XML/YAML/SQL changesets accumulated
Flyway	2010	JVM	Versioned SQL files accumulated (`V1__init.sql`)
Alembic	early 2010s	Python (Mike Bayer, SQLAlchemy)	Python scripts accumulated
Django migrations	2014 (1.7)	Python/Django	Models → auto-generated + accumulated
Rails migrations	2005	Ruby	Ruby DSL accumulated
TypeORM migrations	2016	TS/JS	TS classes accumulated
Prisma Migrate	2020	TS/JS	Schema file + diff-based SQL generation
Drizzle Kit	2023	TS	Schema code + diff-based SQL generation

These all share the same view of truth: "the order of accumulated changes." A meta table tracks which migrations have been applied to a DB (e.g. Flyway's flyway_schema_history).

The other model is the declarative single SQL file. All tables of a domain are written in one SQL file with CREATE TABLE IF NOT EXISTS · CREATE INDEX IF NOT EXISTS, and changes are made by editing the same file. Application is idempotent.

2. The cumulative ALTER model

db/migrations/
├── V1__init.sql           # CREATE TABLE users(...)
├── V2__add_email.sql      # ALTER TABLE users ADD COLUMN email TEXT
└── V3__index_email.sql    # CREATE INDEX idx_users_email ...

Tools record hashes and versions to ensure each file is applied only once. The promise that any environment (dev/staging/prod) reaches the same result through the same order is the strength. The downside is that as files accumulate, bootstrapping a new environment takes longer, and to know a table's current shape we need to merge several files.

3. Declarative single file + IF NOT EXISTS

CREATE TABLE IF NOT EXISTS users (
  id BIGSERIAL PRIMARY KEY,
  email TEXT UNIQUE NOT NULL,
  created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX IF NOT EXISTS idx_users_email ON users(email);

Strengths:

One file shows the current schema.
Bootstrap is short (run the file once).
Intent shows up directly in code review.

Limits:

Column changes/deletions do not follow automatically. Changes that need ALTER require a separate procedure.
Cross-environment auto-sync is weak — there is no meta table to track changes already applied to a production DB.
Data migrations (DML) do not fit in the same file.

4. Separating DDL and DML

DDL (schema) and DML (data) are commonly separated.

DDL is easy to make idempotent (IF NOT EXISTS).
DML is hard to make idempotent (the same INSERT twice gives a different result), and transaction/rollback policy differs.
Data seeding usually lives in a separate script (seed.sql) or fixture.

5. Tradeoffs of migration tools

Reproducibility — the cumulative model wins. Any environment at any point in time can be brought to the same state.
Rollback — most cumulative tools support down migrations, but in production a "no down, only forward" policy is common. After data has already been mutated, down is not safe.
Multi-environment sync — meta-table tracking is the strength.
Operational complexity — needs tool, CLI, and CI integration.

6. The diff-generation model

Tools like Prisma · Drizzle · atlas look at the difference between "desired state (schema file)" and "current DB" and generate SQL. It is a form that combines the convenience of the declarative model with part of cumulative tracking. The auto-generated ALTERs need review.

7. Where each model fits

Where the declarative single file + IF NOT EXISTS fits:

Small services in early stages.
Projects where the schema changes often but environment counts are small.
"Regenerable" data such as caches and aggregate tables for public data or external APIs.

Where cumulative migrations fit:

Many production environments (blue/green, region split).
Places where data loss is a corporate risk.
Several teams editing the same schema concurrently.

In larger organizations, a compromise where production schemas are managed with cumulative migrations while initial SQL for new domains starts declarative and is moved to the migration tool once it stabilizes is also common.

8. Common pitfalls

Silence of IF NOT EXISTS — if a table already exists, a new definition is ignored even if the column definitions differ. To catch definition drift, separate verification (a schema comparison tool) is needed.

Index name collisions — in PostgreSQL, index names must be unique within the schema. The combination of IF NOT EXISTS and a same-name policy can leave behind an index that differs from intent.

Corruption of a migration tool's meta table — manual edits to flyway_schema_history are dangerous.

Mixing DML with DDL migrations — lock-hold time and transaction semantics differ and can lead to operational incidents.

9. A case — column-level drift

In the declarative single-file model, the most frightening state is "the table name matches but the columns are off." One project nearly let this exact mistake reach production.

The crawler code called INSERT INTO stores (store_id, name, address, tel, region, area, last_crawled_at, ...) while the SSOT for that domain was stores (store_id, store_nm, addr, tel_no, region_cd, region_nm, area_cd, area_nm, updated_at, ...). Even the table names differed, and 6 column mappings were misaligned. It went unnoticed because the route is rarely called in production.

There are two ways to catch this.

① Table-level comparison
   Diff between SSOT CREATE TABLE names and pg_tables names
   → Even matching 73:73 counts is not enough (definition diffs are not caught)

② Column-level comparison
   Regex-extract columns from raw SQL in code → diff against information_schema.columns
   → Each mismatch is a latent BUG

The second approach is simple but powerful. One project's scripts/sql_column_audit.py is about 200 lines and does:

Regex out INSERT/UPDATE/SELECT statements and normalize column tokens.
One query against information_schema.columns to build a {table: {col1, col2, ...}} map.
For each SQL statement, print any used column not in the map.

This single tool caught the product_crawler's phantom table and the order_tracking's undefined table at the same time. This is a kind of drift the cumulative ALTER model would not catch — when environments where ALTER ran and where it did not coexist, the migration meta table cannot guarantee truth either.

The core lesson is this. The declarative SSOT model does not guarantee consistency by definition alone. There are two strands both claiming to be truth — code and SSOT — and the moment they diverge, which is truth depends on who checks. There needs to be a column-level automated audit for it to mean anything.

10. A case — collision between FK constraints and TRUNCATE

In the declarative SSOT model, adding a new FK is common. A few days after one project added the product_prices.store_id → stores.store_id FK, we found that the data_crawler normalization had been silently failing every hour.

ERROR: cannot truncate a table referenced in a foreign key constraint
DETAIL: Table "product_prices" references "stores".
HINT: Truncate table "product_prices" at the same time, or use TRUNCATE ... CASCADE.

refresh_normalized_tables() had been written as TRUNCATE + INSERT to recompute stores from 1.1M raw_items. The pattern broke the moment the FK was introduced. The POST response still returned 200, but the normalization result froze on yesterday's date — the BUG is invisible without checking the error log.

Three fix candidates:

① TRUNCATE ... CASCADE
   Wipes children too → unsuitable for accumulating data like product_prices

② Make FK NULLABLE + SET NULL
   Preserves child rows / loses parent reference meaning

③ DELETE WHERE NOT IN(current keys) + INSERT ON CONFLICT DO UPDATE
   Cleans only parents that disappeared (children clean up via ON DELETE CASCADE)
   Surviving parents UPSERT — FK respected

The third is two lines of single SQL and preserves accumulated child data.

Lesson — the moment a master-child relationship is created, a TRUNCATE of the master is an immediately backfillable BUG. Whenever a FK is added to the SSOT, search the codebase for "who wipes the master?".

Closing thoughts

The declarative SSOT model is the simplest starting point for a small team. But by definition alone it is not safe. Putting a single column-audit tool next to it cuts down operational incidents significantly.

api-handler-pattern
jobs-apscheduler

See Flyway · Liquibase · Alembic · Prisma Migrate · Drizzle Kit · Atlas · PostgreSQL DDL.

SQL as the single source of truth

1. The place of migration tools

Representative migration tools:

Tool	First appeared	Language/ecosystem	Model
Liquibase	2006	JVM	XML/YAML/SQL changesets accumulated
Flyway	2010	JVM	Versioned SQL files accumulated (`V1__init.sql`)
Alembic	early 2010s	Python (Mike Bayer, SQLAlchemy)	Python scripts accumulated
Django migrations	2014 (1.7)	Python/Django	Models → auto-generated + accumulated
Rails migrations	2005	Ruby	Ruby DSL accumulated
TypeORM migrations	2016	TS/JS	TS classes accumulated
Prisma Migrate	2020	TS/JS	Schema file + diff-based SQL generation
Drizzle Kit	2023	TS	Schema code + diff-based SQL generation

These all share the same view of truth: "the order of accumulated changes." A meta table tracks which migrations have been applied to a DB (e.g. Flyway's flyway_schema_history).

2. The cumulative ALTER model

db/migrations/
├── V1__init.sql           # CREATE TABLE users(...)
├── V2__add_email.sql      # ALTER TABLE users ADD COLUMN email TEXT
└── V3__index_email.sql    # CREATE INDEX idx_users_email ...

3. Declarative single file + IF NOT EXISTS

CREATE TABLE IF NOT EXISTS users (
  id BIGSERIAL PRIMARY KEY,
  email TEXT UNIQUE NOT NULL,
  created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX IF NOT EXISTS idx_users_email ON users(email);

Strengths:

One file shows the current schema.
Bootstrap is short (run the file once).
Intent shows up directly in code review.

Limits:

Column changes/deletions do not follow automatically. Changes that need ALTER require a separate procedure.
Cross-environment auto-sync is weak — there is no meta table to track changes already applied to a production DB.
Data migrations (DML) do not fit in the same file.

4. Separating DDL and DML

DDL (schema) and DML (data) are commonly separated.

DDL is easy to make idempotent (IF NOT EXISTS).
DML is hard to make idempotent (the same INSERT twice gives a different result), and transaction/rollback policy differs.
Data seeding usually lives in a separate script (seed.sql) or fixture.

5. Tradeoffs of migration tools

Reproducibility — the cumulative model wins. Any environment at any point in time can be brought to the same state.
Rollback — most cumulative tools support down migrations, but in production a "no down, only forward" policy is common. After data has already been mutated, down is not safe.
Multi-environment sync — meta-table tracking is the strength.
Operational complexity — needs tool, CLI, and CI integration.

6. The diff-generation model

7. Where each model fits

Where the declarative single file + IF NOT EXISTS fits:

Small services in early stages.
Projects where the schema changes often but environment counts are small.
"Regenerable" data such as caches and aggregate tables for public data or external APIs.

Where cumulative migrations fit:

Many production environments (blue/green, region split).
Places where data loss is a corporate risk.
Several teams editing the same schema concurrently.

8. Common pitfalls

Index name collisions — in PostgreSQL, index names must be unique within the schema. The combination of IF NOT EXISTS and a same-name policy can leave behind an index that differs from intent.

Corruption of a migration tool's meta table — manual edits to flyway_schema_history are dangerous.

Mixing DML with DDL migrations — lock-hold time and transaction semantics differ and can lead to operational incidents.

9. A case — column-level drift

In the declarative single-file model, the most frightening state is "the table name matches but the columns are off." One project nearly let this exact mistake reach production.

There are two ways to catch this.

① Table-level comparison
   Diff between SSOT CREATE TABLE names and pg_tables names
   → Even matching 73:73 counts is not enough (definition diffs are not caught)

② Column-level comparison
   Regex-extract columns from raw SQL in code → diff against information_schema.columns
   → Each mismatch is a latent BUG

The second approach is simple but powerful. One project's scripts/sql_column_audit.py is about 200 lines and does:

Regex out INSERT/UPDATE/SELECT statements and normalize column tokens.
One query against information_schema.columns to build a {table: {col1, col2, ...}} map.
For each SQL statement, print any used column not in the map.

10. A case — collision between FK constraints and TRUNCATE

ERROR: cannot truncate a table referenced in a foreign key constraint
DETAIL: Table "product_prices" references "stores".
HINT: Truncate table "product_prices" at the same time, or use TRUNCATE ... CASCADE.

Three fix candidates:

① TRUNCATE ... CASCADE
   Wipes children too → unsuitable for accumulating data like product_prices

② Make FK NULLABLE + SET NULL
   Preserves child rows / loses parent reference meaning

③ DELETE WHERE NOT IN(current keys) + INSERT ON CONFLICT DO UPDATE
   Cleans only parents that disappeared (children clean up via ON DELETE CASCADE)
   Surviving parents UPSERT — FK respected

The third is two lines of single SQL and preserves accumulated child data.

Closing thoughts

api-handler-pattern
jobs-apscheduler

See Flyway · Liquibase · Alembic · Prisma Migrate · Drizzle Kit · Atlas · PostgreSQL DDL.

SQL as the single source of truth

SQL as the single source of truth

1. The place of migration tools

2. The cumulative ALTER model

3. Declarative single file + IF NOT EXISTS

4. Separating DDL and DML

5. Tradeoffs of migration tools

6. The diff-generation model

7. Where each model fits

8. Common pitfalls

9. A case — column-level drift

10. A case — collision between FK constraints and TRUNCATE

Closing thoughts

Next

More in backend

SQL as the single source of truth

SQL as the single source of truth

1. The place of migration tools

2. The cumulative ALTER model

3. Declarative single file + IF NOT EXISTS

4. Separating DDL and DML

5. Tradeoffs of migration tools

6. The diff-generation model

7. Where each model fits

8. Common pitfalls

9. A case — column-level drift

10. A case — collision between FK constraints and TRUNCATE

Closing thoughts

Next

More in backend