Skip to content

Speed up mtree table-diff by fixing CDC-drain overhead#126

Merged
mason-sharp merged 2 commits into
mainfrom
ace-192
Jun 25, 2026
Merged

Speed up mtree table-diff by fixing CDC-drain overhead#126
mason-sharp merged 2 commits into
mainfrom
ace-192

Conversation

@danolivo

Copy link
Copy Markdown
Contributor

Problem

mtree table-diff was reported ~2x slower than plain table-diff on a 499,999-row, 3-node cluster. mtree table-diff first reconciles each node's cached Merkle tree with current data by draining that node's CDC (Change Data Capture) stream (UpdateMtree); table-diff reads the live tables directly and has no cache to refresh, so it never drains. The drain was paying two avoidable costs on every diff:

  1. Sequential per-node drain. Each node's CDC stream was drained one after another. With N nodes that multiplied the per-node drain latency N times.
  2. Idle-timeout termination. In non-continuous mode the drain stopped only when it reached the target LSN via a data message, or after a ~1s idle receive-timeout. Because the publication streams only the tracked table, any unrelated WAL activity advances the global flush LSN while the table itself sees no data messages — so the "reached target" path rarely fires and the drain falls through to the full ~1s idle wait. This was paid per node, on every diff, even when nothing relevant had changed.

On a 3-node cluster that is a ~3s fixed floor (3 × ~1s) on every mtree table-diff, independent of whether anything changed — which is exactly why a brute-force table-diff could finish first.

Changes

  1. Drain CDC streams concurrently across nodes (internal/consistency/mtree/merkle.go). Each UpdateFromCDC targets a distinct node over its own connection and writes only that node's CDC metadata, so the calls are independent. They now run concurrently; per-node errors are collected and the first (in node order) is returned, preserving prior semantics. Collapses N sequential drains into ~one drain's wall-clock.
  2. Stop the drain promptly at the target LSN (internal/infra/cdc/listen.go). PostgreSQL keepalive messages carry ServerWALEnd ("I have sent you everything up to here"). When ServerWALEnd >= targetFlushLSN we are caught up and stop immediately, instead of waiting out the ~1s idle timeout. This is the correct catch-up signal for a publication-filtered stream (where the target often can't be reached via table data messages). Applies to non-continuous drains only; the long-running listener is unchanged.

Safety of (2): replication messages arrive in LSN order, so a keepalive past the target arrives only after every data message ≤ target was already delivered; received changes are applied in goroutines that the post-loop wg.Wait()/metaWg.Wait() await before returning, so setting the stop flag cannot drop a received change.

Correctness

table-diff path is not touched. The full mtree + CDC integration suites — including the under-reporting guards (TestUpdateMtreeReflectsInserts/Deletes/Updates, bidirectional/3-node diffs) and the CDC/slot tests — pass under go test -race.

Not in this PR

A follow-up bounded-drain / --quick mode (cap the drain when the WAL backlog is large, report per-node staleness, and gate table-repair from acting on an incomplete diff) is proposed separately; it targets the widespread/large-backlog case this PR does not fully close.

Andrei Lepikhov added 2 commits June 25, 2026 11:17
mtree update/diff drained each node's CDC stream sequentially. Because
the non-continuous drain ends on a ~1s idle receive timeout, every
mtree diff paid that latency once per node — a fixed floor that, on a
3-node cluster, dominated wall-clock even when nothing had changed
(measured ~3.4s on identical data, vs ~37ms for a full table-diff scan).

Each UpdateFromCDC targets a distinct node over its own connection and
writes only that node's CDC metadata, so the calls are independent. Run
them concurrently and collect per-node errors, returning the first in
node order to preserve the prior semantics.

Measured on a 3-node cluster, 100k rows/node:
  identical data:    mtree diff 3.41s -> 1.19s
  widespread diffs:  mtree diff 4.15s -> 2.07s

The residual ~1s is a single node's idle-timeout drain; stopping the
drain promptly at the target LSN would attack that separately.

Full mtree integration suite passes under -race.

Author: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
In non-continuous mode the drain previously stopped only when a data
message advanced lastLSN past targetFlushLSN, or — in the common case
where the tracked table did not change but other tables did — when
ReceiveMessage hit its 1-second idle timeout. Because the publication
streams only the tracked table, an unrelated change never produces a data
message at the target LSN, so the drain almost always ate the full 1s.

PostgreSQL keepalive messages carry ServerWALEnd, the server's "I have
sent you everything up to here" marker. Add a stop condition in the
keepalive branch: once ServerWALEnd >= targetFlushLSN we know the server
has delivered all WAL up to the target, so we can stop immediately
instead of waiting out the idle timeout.

Safety: replication messages arrive in LSN order, so by the time a
keepalive reports ServerWALEnd >= targetFlushLSN, every XLogData with
LSN <= targetFlushLSN has already been delivered and queued for apply.
Apply happens in goroutines tracked by wg; the post-loop wg.Wait() runs
after the receive loop breaks, so setting stopStreaming here cannot drop
already-received changes. The condition is guarded by !continuous, so the
long-running listener is unaffected.

Author: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@danolivo danolivo assigned danolivo and unassigned danolivo Jun 25, 2026
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

CDC replication now stops non-continuous streams when keepalive messages show the target flush LSN has been reached. UpdateMtree now starts per-node CDC updates concurrently, waits for all of them, and returns the first node error in node order.

Changes

CDC termination flow

Layer / File(s) Summary
Keepalive stop condition
internal/infra/cdc/listen.go
processReplicationStream sets stopStreaming when a keepalive ServerWALEnd reaches targetFlushLSN during non-continuous runs.
Concurrent CDC drain
internal/consistency/mtree/merkle.go
UpdateMtree launches cdc.UpdateFromCDC for each cluster node in parallel, waits for all goroutines, and returns the first non-nil error in node order.

Poem

A rabbit hopped through LSN light,
and nodded when the stream felt right.
One node, then many, sprang in sync —
the burrow hummed with softer brink.
🐇

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: reducing CDC-drain overhead to speed up mtree table-diff.
Description check ✅ Passed The description matches the changeset and explains the CDC-drain optimizations and their motivation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ace-192

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@codacy-production

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 0 duplication

Metric Results
Duplication 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/infra/cdc/listen.go`:
- Around line 332-335: Advance lastLSN before setting stopStreaming in the
keepalive catch-up path inside the replication loop. When pkm.ServerWALEnd has
reached targetFlushLSN in the non-continuous branch, make sure the value that
later feeds the final standby status and UpdateCDCMetadata is updated to the
target/observed WAL end before exiting, so the CDC checkpoint reflects the
drained position. Locate the fix near the keepalive catch-up check in listen.go
and adjust the replication-state variables used by the final status persistence.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0c2c0905-430f-48f1-ae5f-0a79f549147b

📥 Commits

Reviewing files that changed from the base of the PR and between d460f9e and 67ac95d.

📒 Files selected for processing (2)
  • internal/consistency/mtree/merkle.go
  • internal/infra/cdc/listen.go

Comment thread internal/infra/cdc/listen.go
@mason-sharp

Copy link
Copy Markdown
Member

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@mason-sharp mason-sharp merged commit 5fb45b6 into main Jun 25, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants