Skip to content

fix(statesync): derive witness scheme from port (https for :443)#218

Open
bdchatham wants to merge 1 commit into
mainfrom
fix/statesync-witness-tls-scheme
Open

fix(statesync): derive witness scheme from port (https for :443)#218
bdchatham wants to merge 1 commit into
mainfrom
fix/statesync-witness-tls-scheme

Conversation

@bdchatham

Copy link
Copy Markdown
Contributor

Problem

Every NEW state-syncing node on the K8s platform is blocked at Initializing — seid waits on the sidecar, whose configure-state-sync task loops forever:

configure-state-sync: no reachable RPC witness among
  [archive-0-rpc.arctic-1.platform.sei.io:443, syncer-0-rpc.arctic-1.platform.sei.io:443]
  err: Get "http://archive-0-rpc.arctic-1.platform.sei.io:443/status": EOF

sei-k8s-controller resolves the canonical syncers (Status.ResolvedStateSyncers) to the public Istio HTTPRoute hostnames on :443 (TLS). The task probed/queried them over plaintext http://...:443 — a plaintext request to a TLS listener returns an immediate EOF, so witness reachability fails and the task never completes. Proven in-cluster: http://...:443/status fails (EOF); https://.../status returns 200.

rpc-node-0 synced ~18d ago before this syncer config existed in its current form, so only new nodes hit it.

Fix

rpcClientForEndpoint hardcoded http://. It now derives the scheme from the port: https for :443, http otherwise. This:

  • unblocks the current :443 TLS witnesses immediately (no GitOps/ConfigMap change required for the arctic-1 cutover);
  • stays correct for the in-cluster plaintext path (a syncer's internal Service on :26657), so a future move to internal endpoints needs no further code change;
  • preserves the controller's documented contract — syncers stay bare host:port, the sidecar attaches the scheme; rpc-servers in config.toml is still written as bare host:port (seid attaches its own).

Blast radius

Not arctic-1-specific. The controller resolves canonical syncers chain-agnostically (canonicalSyncers(chainID) → verbatim pass-through), and the task hardcoded http:// for all of them. Every chain whose syncer ConfigMap lists :443 TLS hostnames was latent-broken for its next state-syncing node; arctic-1's content surfaced it first. This fix covers all chains.

Tests

  • TestWitnessScheme — table guard on the port→scheme mapping.
  • TestStateSyncConfigurer_TLSWitnessUsesHTTPS — regression guard: a :443 witness is probed and queried over https; the mock only answers the https URL.
  • TestStateSyncConfigurer_InternalWitnessUsesHTTP — the :26657 in-cluster path stays on http.

Full ./sidecar/... suite passes; statesync files are lint-clean.

Cross-reference

Unblocks arctic-1 rpc-node-1 (platform PR #1241 follow-up) and the arctic-1 sei-infra→K8s cutover (Linear PLT-208).

🤖 Generated with Claude Code

@cursor

cursor Bot commented Jun 30, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Changes how all state-sync witness RPC probes and trust queries are reached across chains; wrong port→scheme logic would break new node bootstrap,RunStatus, but the mapping is narrow (:443 vs everything else) and is covered by regression tests.

Overview
Fixes state-sync witness probing when canonical syncers resolve to public :443 TLS gateways (Istio HTTPRoutes). configure-state-sync previously always used http:// for /status and trust-point RPC calls, so plaintext to a TLS listener failed with EOF and left new state-syncing nodes stuck with no reachable witness.

rpcClientForStatus now picks the URL scheme from the witness port via new witnessScheme: https for :443, http for all other ports (e.g. in-cluster :26657). Written rpc-servers in config.toml stay bare host:port; only the sidecar’s HTTP client URL changes.

Adds TestWitnessScheme, plus integration tests that :443 witnesses use HTTPS and :26657 internal witnesses stay HTTP.

Reviewed by Cursor Bugbot for commit 1d4a0e6. Bugbot is set up for automated code reviews on this repo. Configure here.

The configure-state-sync task hardcoded http:// for every witness
endpoint. Canonical syncers resolved by sei-k8s-controller to public
Istio HTTPRoute hostnames on :443 are TLS, so the /status probe sent a
plaintext request to a TLS listener and got an immediate EOF — failing
witness reachability and blocking every new K8s state-syncing node
(e.g. arctic-1 rpc-node-1, stuck Initializing on "waiting for sidecar").

Scheme is now port-derived: https for :443, http otherwise. This
unblocks the current :443 witnesses and stays correct for the
in-cluster plaintext path (a syncer's internal Service on :26657).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@bdchatham bdchatham force-pushed the fix/statesync-witness-tls-scheme branch from 8ec6bce to 1d4a0e6 Compare June 30, 2026 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant