fix(statesync): derive witness scheme from port (https for :443)#218
fix(statesync): derive witness scheme from port (https for :443)#218bdchatham wants to merge 1 commit into
Conversation
PR SummaryMedium Risk Overview
Adds Reviewed by Cursor Bugbot for commit 1d4a0e6. Bugbot is set up for automated code reviews on this repo. Configure here. |
The configure-state-sync task hardcoded http:// for every witness endpoint. Canonical syncers resolved by sei-k8s-controller to public Istio HTTPRoute hostnames on :443 are TLS, so the /status probe sent a plaintext request to a TLS listener and got an immediate EOF — failing witness reachability and blocking every new K8s state-syncing node (e.g. arctic-1 rpc-node-1, stuck Initializing on "waiting for sidecar"). Scheme is now port-derived: https for :443, http otherwise. This unblocks the current :443 witnesses and stays correct for the in-cluster plaintext path (a syncer's internal Service on :26657). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
8ec6bce to
1d4a0e6
Compare
Problem
Every NEW state-syncing node on the K8s platform is blocked at
Initializing— seid waits on the sidecar, whoseconfigure-state-synctask loops forever:sei-k8s-controller resolves the canonical syncers (
Status.ResolvedStateSyncers) to the public Istio HTTPRoute hostnames on:443(TLS). The task probed/queried them over plaintexthttp://...:443— a plaintext request to a TLS listener returns an immediate EOF, so witness reachability fails and the task never completes. Proven in-cluster:http://...:443/statusfails (EOF);https://.../statusreturns 200.rpc-node-0 synced ~18d ago before this syncer config existed in its current form, so only new nodes hit it.
Fix
rpcClientForEndpointhardcodedhttp://. It now derives the scheme from the port:httpsfor:443,httpotherwise. This::443TLS witnesses immediately (no GitOps/ConfigMap change required for the arctic-1 cutover);:26657), so a future move to internal endpoints needs no further code change;host:port, the sidecar attaches the scheme;rpc-serversinconfig.tomlis still written as barehost:port(seid attaches its own).Blast radius
Not arctic-1-specific. The controller resolves canonical syncers chain-agnostically (
canonicalSyncers(chainID)→ verbatim pass-through), and the task hardcodedhttp://for all of them. Every chain whose syncer ConfigMap lists:443TLS hostnames was latent-broken for its next state-syncing node; arctic-1's content surfaced it first. This fix covers all chains.Tests
TestWitnessScheme— table guard on the port→scheme mapping.TestStateSyncConfigurer_TLSWitnessUsesHTTPS— regression guard: a:443witness is probed and queried overhttps; the mock only answers the https URL.TestStateSyncConfigurer_InternalWitnessUsesHTTP— the:26657in-cluster path stays on http.Full
./sidecar/...suite passes; statesync files are lint-clean.Cross-reference
Unblocks arctic-1 rpc-node-1 (platform PR #1241 follow-up) and the arctic-1 sei-infra→K8s cutover (Linear PLT-208).
🤖 Generated with Claude Code