Make the test suite parallel-safe and speed up CI#1273
Conversation
@executor-js/cli
@executor-js/config
@executor-js/execution
@executor-js/sdk
@executor-js/codemode-core
@executor-js/runtime-quickjs
@executor-js/plugin-file-secrets
@executor-js/plugin-graphql
@executor-js/plugin-keychain
@executor-js/plugin-mcp
@executor-js/plugin-onepassword
@executor-js/plugin-openapi
executor
commit: |
Greptile SummaryThis PR fixes a cluster of load-dependent CI flakes and adds caching to speed up runs. The root causes are each addressed with a targeted, minimal change: heavy app boots moved to
Confidence Score: 5/5Safe to merge; all changes are scoped to test infrastructure, CI configuration, and one well-tested behavioral tweak to isAsyncResultLoading. Every fix addresses a documented, specific root cause with a matching test or structural guard. The db.ts and async-result.ts changes are the only production-code modifications: the former removes a fire-and-forget that was already wrapped in Effect.ignore, and the latter has explicit before/after test coverage for both branches of the new condition. No auth, data, or request-path logic is touched. e2e/cloud/auth-routing-flow.test.ts contains a page.waitForTimeout(250) sleep; otherwise no files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
subgraph Before["Before (flaky)"]
B1["Module load: top-level await\nmakeSelfHostTestApp / makeSelfHostApiHandler\n× 6 files in parallel"]
B2["All 6 heavy boots run concurrently\nat module-evaluation time"]
B3["CPU starvation on 2-core runner\n→ in-flight requests stall → timeout"]
B1 --> B2 --> B3
end
subgraph After["After (stable)"]
A1["Module load: declare handler/dispose\nvariables only (no I/O)"]
A2["beforeAll: import + boot\none file at a time\n(fileParallelism: false, maxWorkers: 1)"]
A3["Tests run with 120s budget\nsized for loaded runner"]
A4["afterAll: dispose()"]
A1 --> A2 --> A3 --> A4
end
subgraph SupportingFixes["Supporting fixes"]
S1["makeTestHttpServer:\nTCP probe before returning\n(100 × 10ms retries)"]
S2["db.ts close():\nawait sql.end() — no more\nECONNRESET race"]
S3["GraphQL plugin tests:\nwaitForRecordedRequests polling\n(100 × 50ms) for eventual-\nconsistent Yoga recorder"]
S4["CI workflow:\nper-SHA concurrency group for push,\nbun + playwright cache,\nTURBO_TEST_CONCURRENCY=3"]
end
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
subgraph Before["Before (flaky)"]
B1["Module load: top-level await\nmakeSelfHostTestApp / makeSelfHostApiHandler\n× 6 files in parallel"]
B2["All 6 heavy boots run concurrently\nat module-evaluation time"]
B3["CPU starvation on 2-core runner\n→ in-flight requests stall → timeout"]
B1 --> B2 --> B3
end
subgraph After["After (stable)"]
A1["Module load: declare handler/dispose\nvariables only (no I/O)"]
A2["beforeAll: import + boot\none file at a time\n(fileParallelism: false, maxWorkers: 1)"]
A3["Tests run with 120s budget\nsized for loaded runner"]
A4["afterAll: dispose()"]
A1 --> A2 --> A3 --> A4
end
subgraph SupportingFixes["Supporting fixes"]
S1["makeTestHttpServer:\nTCP probe before returning\n(100 × 10ms retries)"]
S2["db.ts close():\nawait sql.end() — no more\nECONNRESET race"]
S3["GraphQL plugin tests:\nwaitForRecordedRequests polling\n(100 × 50ms) for eventual-\nconsistent Yoga recorder"]
S4["CI workflow:\nper-SHA concurrency group for push,\nbun + playwright cache,\nTURBO_TEST_CONCURRENCY=3"]
end
Reviews (3): Last reviewed commit: "Match the new sign-in heading in the sel..." | Re-trigger Greptile |
| - name: Cache Playwright browsers | ||
| uses: actions/cache@v4 | ||
| with: | ||
| path: ~/.cache/ms-playwright | ||
| key: ${{ runner.os }}-playwright-1.60.0 | ||
|
|
||
| # Install from e2e so bunx resolves ITS pinned playwright (the version | ||
| # the tests run against) rather than floating to the latest. | ||
| - name: Install Playwright Chromium |
There was a problem hiding this comment.
The Playwright browser cache key is hardcoded to
1.60.0. If the Playwright version is bumped in bun.lock without updating this string, CI will continue serving the old browser binaries from cache, which can cause subtle test failures (new features or fix behaviour tied to the new version absent, or ABI mismatches). Deriving the key from the lockfile keeps it automatically in sync.
| - name: Cache Playwright browsers | |
| uses: actions/cache@v4 | |
| with: | |
| path: ~/.cache/ms-playwright | |
| key: ${{ runner.os }}-playwright-1.60.0 | |
| # Install from e2e so bunx resolves ITS pinned playwright (the version | |
| # the tests run against) rather than floating to the latest. | |
| - name: Install Playwright Chromium | |
| - name: Cache Playwright browsers | |
| uses: actions/cache@v4 | |
| with: | |
| path: ~/.cache/ms-playwright | |
| key: ${{ runner.os }}-playwright-${{ hashFiles('e2e/bun.lock', 'bun.lock') }} | |
| # Install from e2e so bunx resolves ITS pinned playwright (the version | |
| # the tests run against) rather than floating to the latest. | |
| - name: Install Playwright Chromium |
| yield* Effect.callback<void, TestHttpServerServeError>((resume) => { | ||
| const socket = createConnection({ host: "127.0.0.1", port: address.port }, () => { | ||
| socket.end(); | ||
| resume(Effect.void); | ||
| }); | ||
| socket.on("error", (cause) => resume(Effect.fail(new TestHttpServerServeError({ cause })))); | ||
| }).pipe(Effect.retry(Schedule.both(Schedule.spaced("10 millis"), Schedule.recurs(100)))); |
There was a problem hiding this comment.
TCP probe socket not cleaned up on interruption
Effect.callback without a returned cleanup function means if the outer fiber is interrupted mid-probe (e.g., a test times out while the server is still booting), the in-flight createConnection socket is not destroyed. On a loaded CI runner this leaves dangling half-open sockets for the duration of the connect timeout. The fix is to return a cleanup from the callback that calls socket.destroy().
Cloudflare previewTorn down — the PR is closed. |
CI on main has been red on roughly half of recent runs, all from load-dependent flakes rather than product bugs. This reworks the test infrastructure so suites run reliably in parallel, and adds caching so runs are faster.
Flakes fixed
beforeAll, the package's files run serially (fileParallelism: false), and the CI Test job caps turbo to--concurrency=3viaTURBO_TEST_CONCURRENCY(local dev unaffected). Test budget sized for a loaded runner, since scope-isolation fans out dozens of concurrent requests.db.test.tsECONNRESET: the per-scope DB teardown fire-and-forgotsql.end(), so an old connection's teardown raced the next test's connect against the single-connection PGlite socket server. Teardown is now awaited in the finalizer.makeTestHttpServercould hand back a server before the socket reliably accepted under load. It now probes readiness with a raw TCP connect (invisible to request-recording fixtures) before returning.vite optimizeDepsboot, keeping the boot-wait < test-timeout gap so the boot diagnostic still surfaces.CI changes
Out of scope: the cloud dev-server SSE/OTel memory growth behind the e2e shard degradation (tracked separately) and shard rebalancing.
Verified with typecheck, lint, and repeated forced full-suite runs (turbo cache disabled); the suite is green back-to-back where it previously failed most forced runs.