Speed up mtree table-diff by fixing CDC-drain overhead#126
Conversation
mtree update/diff drained each node's CDC stream sequentially. Because the non-continuous drain ends on a ~1s idle receive timeout, every mtree diff paid that latency once per node — a fixed floor that, on a 3-node cluster, dominated wall-clock even when nothing had changed (measured ~3.4s on identical data, vs ~37ms for a full table-diff scan). Each UpdateFromCDC targets a distinct node over its own connection and writes only that node's CDC metadata, so the calls are independent. Run them concurrently and collect per-node errors, returning the first in node order to preserve the prior semantics. Measured on a 3-node cluster, 100k rows/node: identical data: mtree diff 3.41s -> 1.19s widespread diffs: mtree diff 4.15s -> 2.07s The residual ~1s is a single node's idle-timeout drain; stopping the drain promptly at the target LSN would attack that separately. Full mtree integration suite passes under -race. Author: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
In non-continuous mode the drain previously stopped only when a data message advanced lastLSN past targetFlushLSN, or — in the common case where the tracked table did not change but other tables did — when ReceiveMessage hit its 1-second idle timeout. Because the publication streams only the tracked table, an unrelated change never produces a data message at the target LSN, so the drain almost always ate the full 1s. PostgreSQL keepalive messages carry ServerWALEnd, the server's "I have sent you everything up to here" marker. Add a stop condition in the keepalive branch: once ServerWALEnd >= targetFlushLSN we know the server has delivered all WAL up to the target, so we can stop immediately instead of waiting out the idle timeout. Safety: replication messages arrive in LSN order, so by the time a keepalive reports ServerWALEnd >= targetFlushLSN, every XLogData with LSN <= targetFlushLSN has already been delivered and queued for apply. Apply happens in goroutines tracked by wg; the post-loop wg.Wait() runs after the receive loop breaks, so setting stopStreaming here cannot drop already-received changes. The condition is guarded by !continuous, so the long-running listener is unaffected. Author: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughCDC replication now stops non-continuous streams when keepalive messages show the target flush LSN has been reached. ChangesCDC termination flow
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Duplication | 0 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/infra/cdc/listen.go`:
- Around line 332-335: Advance lastLSN before setting stopStreaming in the
keepalive catch-up path inside the replication loop. When pkm.ServerWALEnd has
reached targetFlushLSN in the non-continuous branch, make sure the value that
later feeds the final standby status and UpdateCDCMetadata is updated to the
target/observed WAL end before exiting, so the CDC checkpoint reflects the
drained position. Locate the fix near the keepalive catch-up check in listen.go
and adjust the replication-state variables used by the final status persistence.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0c2c0905-430f-48f1-ae5f-0a79f549147b
📒 Files selected for processing (2)
internal/consistency/mtree/merkle.gointernal/infra/cdc/listen.go
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
Problem
mtree table-diffwas reported ~2x slower than plaintable-diffon a 499,999-row, 3-node cluster.mtree table-difffirst reconciles each node's cached Merkle tree with current data by draining that node's CDC (Change Data Capture) stream (UpdateMtree);table-diffreads the live tables directly and has no cache to refresh, so it never drains. The drain was paying two avoidable costs on every diff:On a 3-node cluster that is a ~3s fixed floor (3 × ~1s) on every
mtree table-diff, independent of whether anything changed — which is exactly why a brute-forcetable-diffcould finish first.Changes
internal/consistency/mtree/merkle.go). EachUpdateFromCDCtargets a distinct node over its own connection and writes only that node's CDC metadata, so the calls are independent. They now run concurrently; per-node errors are collected and the first (in node order) is returned, preserving prior semantics. Collapses N sequential drains into ~one drain's wall-clock.internal/infra/cdc/listen.go). PostgreSQL keepalive messages carryServerWALEnd("I have sent you everything up to here"). WhenServerWALEnd >= targetFlushLSNwe are caught up and stop immediately, instead of waiting out the ~1s idle timeout. This is the correct catch-up signal for a publication-filtered stream (where the target often can't be reached via table data messages). Applies to non-continuous drains only; the long-running listener is unchanged.Safety of (2): replication messages arrive in LSN order, so a keepalive past the target arrives only after every data message ≤ target was already delivered; received changes are applied in goroutines that the post-loop
wg.Wait()/metaWg.Wait()await before returning, so setting the stop flag cannot drop a received change.Correctness
table-diffpath is not touched. The full mtree + CDC integration suites — including the under-reporting guards (TestUpdateMtreeReflectsInserts/Deletes/Updates, bidirectional/3-node diffs) and the CDC/slot tests — pass undergo test -race.Not in this PR
A follow-up
bounded-drain/--quickmode (cap the drain when the WAL backlog is large, report per-node staleness, and gatetable-repairfrom acting on an incomplete diff) is proposed separately; it targets the widespread/large-backlog case this PR does not fully close.