Skip to content

fix(db): serialize concurrent migrations with a Postgres advisory lock#4939

Merged
waleedlatif1 merged 3 commits into
stagingfrom
fix/migration
Jun 10, 2026
Merged

fix(db): serialize concurrent migrations with a Postgres advisory lock#4939
waleedlatif1 merged 3 commits into
stagingfrom
fix/migration

Conversation

@waleedlatif1

Copy link
Copy Markdown
Collaborator

Summary

  • Wrap drizzle migrate() in a session-level pg_advisory_lock so concurrent migration sidecars serialize instead of racing
  • Root cause: a deploy starts N app replicas at once, each with a migration sidecar; drizzle has no cross-process lock, so all N see the same migration pending and apply it concurrently — one wins, the losers run the same DDL against already-mutated state and exit 1 (DROP TABLE "form"table "form" does not exist / TaskFailedToStart)
  • With the lock, the winner migrates while the losers block, then re-read __drizzle_migrations, find nothing pending, and exit cleanly. Session locks auto-release on disconnect, so a crashed runner never wedges the lock

Type of Change

  • Bug fix

Testing

Tested manually — typecheck + lint pass. The current prod DB already has the migration applied (the race self-resolved on retry); this prevents recurrence on the next migration-bearing deploy.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

Deployments start N app replicas at once, each with a migration sidecar.
drizzle migrate() has no cross-process lock, so all N read
__drizzle_migrations, all see the same migration pending, and all apply it
concurrently — one wins, the losers run the same DDL against already-mutated
state and exit 1 (e.g. DROP TABLE "form" -> table does not exist /
TaskFailedToStart). Wrap migrate() in a session-level pg_advisory_lock so
runners serialize: the winner migrates, the losers block, then re-read and
find nothing pending. Session locks auto-release on disconnect, so a crashed
runner never wedges the lock.
@vercel

vercel Bot commented Jun 10, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment Jun 10, 2026 4:39am

Request Review

@cursor

cursor Bot commented Jun 10, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Changes deploy-time DDL execution and can block parallel migration runners until the lock holder finishes, but scope is limited to the migration script and reduces race-induced deploy failures.

Overview
Fixes concurrent migration sidecars on multi-replica deploys racing the same pending Drizzle migration and failing with duplicate DDL errors (e.g. dropping a table that another runner already removed).

The migrate script now acquires a session-level pg_advisory_lock with a fixed app-wide key before migrate(), runs migrations inside a try/finally, and releases the lock via releaseMigrationLock() so only one process applies pending SQL while others block, then see nothing left to run and exit cleanly. Unlock failures are logged as warnings and do not fail the process, so a successful migration is not reported as failed if unlock fails after commit; session locks still drop on disconnect.

Reviewed by Cursor Bugbot for commit 34980b0. Configure here.

@greptile-apps

greptile-apps Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR wraps drizzle's migrate() in a PostgreSQL session-level advisory lock (pg_advisory_lock) to prevent concurrent migration sidecars from racing on the same pending DDL during a rolling deploy. The nested try/finally structure ensures the lock is released after every migration outcome, and unlock failures are swallowed (with a warning log) since the session-level lock auto-releases on disconnect anyway — correctly preventing a failed unlock from corrupting the exit code of an otherwise successful migration.

  • Adds MIGRATION_LOCK_KEY = 4_961_002_270n as a stable bigint constant; all replicas contend on this single key, so the first acquirer migrates while the rest block, then each subsequent runner reacquires, finds no pending migrations, and exits cleanly.
  • releaseMigrationLock() is isolated in its own try/catch so a thrown unlock (e.g. connection dropped post-commit) can never propagate to the outer catch and exit 1 after a successful migration.
  • The lock acquisition itself has no statement or lock timeout (inheriting statement_timeout = 0), which is intentional — losers should wait rather than fail — but there is no log line emitted while blocked, leaving stuck runners silent in the deployment log.

Confidence Score: 5/5

Safe to merge — the advisory lock correctly serializes concurrent migration sidecars, unlock failures are safely swallowed, and the session-level lock auto-releases on disconnect.

The change is a focused, single-file fix. The lock acquisition and release are correctly scoped, the nested try/finally prevents unlock errors from corrupting the exit code, and the BigInt key is within PostgreSQL's bigint range. No data-correctness or reliability issues were identified.

No files require special attention.

Important Files Changed

Filename Overview
packages/db/scripts/migrate.ts Wraps drizzle migrate() in a session-level pg_advisory_lock to serialize concurrent migration sidecars; unlock is safely swallowed in a nested finally so a failed release cannot falsely exit 1 after a successful migration

Sequence Diagram

sequenceDiagram
    participant R1 as Runner 1 (winner)
    participant R2 as Runner 2 (loser)
    participant PG as PostgreSQL

    par Concurrent start
        R1->>PG: "SET statement_timeout = 0"
        R2->>PG: "SET statement_timeout = 0"
    end

    R1->>PG: pg_advisory_lock(MIGRATION_LOCK_KEY)
    PG-->>R1: acquired ✓

    R2->>PG: pg_advisory_lock(MIGRATION_LOCK_KEY)
    Note over R2,PG: blocks — waits for R1

    R1->>PG: migrate() — applies pending DDL
    PG-->>R1: committed ✓

    R1->>PG: pg_advisory_unlock(MIGRATION_LOCK_KEY)
    PG-->>R1: released ✓
    R1->>R1: process.exit(0)

    PG-->>R2: lock acquired ✓
    R2->>PG: migrate() — reads __drizzle_migrations, nothing pending
    PG-->>R2: no-op ✓
    R2->>PG: pg_advisory_unlock(MIGRATION_LOCK_KEY)
    R2->>R2: process.exit(0)
Loading

Reviews (2): Last reviewed commit: "refactor(db): move unlock-guard rational..." | Re-trigger Greptile

Comment thread packages/db/scripts/migrate.ts
…ation

If the explicit unlock throws (e.g. connection drops in the window after
migrate() commits), the exception bubbled to the outer catch and exited 1 —
falsely reporting a failed migration to the deploy orchestrator. The session
lock auto-releases on disconnect anyway, so swallow and log instead.
@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1

Copy link
Copy Markdown
Collaborator Author

@cursor review

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 34980b0. Configure here.

@waleedlatif1 waleedlatif1 merged commit b2a485e into staging Jun 10, 2026
14 checks passed
@waleedlatif1 waleedlatif1 deleted the fix/migration branch June 10, 2026 04:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant