Cherry-picks from v5_STABLE (v5.0.10)#514
Conversation
The minimum-version check in check_spock_version_compatibility compared extversion strings with text operators, which order versions lexically. That makes '5.0.10' < '5.0.4' true, so a node running 5.0.10 was wrongly rejected as below the 5.0.4 minimum (e.g. test 009 failing). Add a spock.version_to_array() helper that parses a version string into an int[] (stripping any non-numeric suffix such as '-devel') and use it at all three minimum-version checks so comparison is numeric/component-wise.
check_spock_version_compatibility required every node to run a byte-identical Spock version, so adding a node built at a newer patch level (e.g. 5.0.10 joining a 5.0.9 cluster) was rejected even though the releases are compatible. Add a spock.version_major_minor() helper (built on version_to_array()) and compare nodes on major.minor only. Patch-level differences are now allowed; a true major.minor mismatch (e.g. 5.0.x vs 5.1.x) is still rejected. Use IS DISTINCT FROM so a malformed/unparseable version can't slip past the check. Also raise the minimum required version to 5.0.9.
Forward-port of the SQLSTATE message-quality work from v5_STABLE (9cc70cd, 0753b78, f56bfb4), reapplied onto main. main already carries the underlying "informative discard message" work (ed17523, b32ef95, 1f5738c) and discard_collateral_message()/get_exception_behaviour_name(), so only the SQLSTATE tagging is new here. Add errmsg_with_sqlstate(), which formats a captured ErrorData with a [SQLSTATE xxxxx] prefix so the root cause recorded in spock.exception_log (which has no sqlstate column) is unambiguous -- e.g. to tell a real constraint violation or deadlock from a transient conflict. The prefix is omitted for the uninformative ERRCODE_INTERNAL_ERROR (XX000): a bare elog(ERROR), such as spock's own "did not find row to update", defaults to XX000 and the prefix would be just noise. The formatted message is allocated in ApplyOperationContext (reset per row) rather than the long-lived TopTransactionContext current on the exception path, so logging every discarded row of a large transaction does not accumulate. Route the failing-row capture through errmsg_with_sqlstate() at every call site: handle_insert/update/delete, the TRUNCATE path, handle_sql_or_exception, and the "caught initial exception" log line plus the initial_error_message snapshot in stream_replay. Drop the now-removed Assert(error_message != NULL) in add_entry_to_exception_log (PR feedback from f56bfb4); the NULL-tolerant fallback to "unavailable" stays as defence in depth. Diagnostic change only: discard/disable/LSN behaviour is unchanged. Existing expected output is unaffected -- the spock-internal messages asserted in replication_set.out are XX000 and keep no prefix; the TAP tests (013, 020) match message text by substring and tolerate the prefix.
Reproduces a firewall-style replication outage: establish replication, block the provider's TCP port with iptables, commit a transaction during the outage (driven over the unix socket, which iptables does not touch), then restore the connection. Asserts the customer-relevant outcomes: rows committed during the outage do not reach the subscriber while blocked; in SUB_DISABLE mode the outage does not disable the subscription; after the block is lifted the subscription returns to replicating with no row loss; and the blip produces no spurious spock.exception_log entries. Whether iptables tears the connection down or merely stalls it is host dependent (over loopback it typically stalls), so the test reports which occurred as diag rather than asserting a specific teardown. Requires iptables usable as root or via passwordless sudo; skips cleanly otherwise. Not in the schedule -- run manually: PERL5LIB=t prove -v t/104_iptables_conn_block.pl
📝 WalkthroughWalkthroughThe PR updates ZODAN’s Spock version compatibility checks to use numeric parsing and major.minor matching, changes apply-time exception logging to include SQLSTATE-prefixed messages, and adds a TAP test that exercises replication across a TCP reset outage. ChangesZODAN version compatibility
Exception logging and connection-block coverage
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Duplication | 0 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/tap/t/104_iptables_conn_block.pl`:
- Around line 127-130: The fallback timeout in this test is being set on the
wrong side: `set_guc_n2('wal_sender_timeout = 5s')` only affects the sender, but
the blocked apply path is governed by the subscriber-side idle timeout in
`spock.apply_idle_timeout` from `src/spock_apply.c`. Update the test setup in
`tests/tap/t/104_iptables_conn_block.pl` to configure the subscriber-side
timeout via the appropriate `set_guc_n2` call so the stalled connection is
actually covered and the test does not flake on quiet stalls.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3f35015c-f281-4d16-aabd-c017b4a87929
📒 Files selected for processing (4)
samples/Z0DAN/zodan.sqlsrc/spock_apply.csrc/spock_exception_handler.ctests/tap/t/104_iptables_conn_block.pl
Pulls in changes from the v5_STABLE line that are missing on main.
ZODAN version checking
Exception-log diagnostics
Test
Notes