Make the test suite parallel-safe and speed up CI by RhysSullivan · Pull Request #1273 · RhysSullivan/executor

RhysSullivan · 2026-07-02T19:01:15Z

CI on main has been red on roughly half of recent runs, all from load-dependent flakes rather than product bugs. This reworks the test infrastructure so suites run reliably in parallel, and adds caching so runs are faster.

Flakes fixed

host-selfhost cluster timeouts (the dominant failure): the six integration files booted a full app graph at module load while turbo ran every package's vitest concurrently, starving 2-core runners. Boots now happen in beforeAll, the package's files run serially (fileParallelism: false), and the CI Test job caps turbo to --concurrency=3 via TURBO_TEST_CONCURRENCY (local dev unaffected). Test budget sized for a loaded runner, since scope-isolation fans out dozens of concurrent requests.
cloud db.test.ts ECONNRESET: the per-scope DB teardown fire-and-forgot sql.end(), so an old connection's teardown raced the next test's connect against the single-connection PGlite socket server. Teardown is now awaited in the finalizer.
sdk oauth test-server races: makeTestHttpServer could hand back a server before the socket reliably accepted under load. It now probes readiness with a raw TCP connect (invisible to request-recording fixtures) before returning.
graphql plugin introspection assertions: the request recorder is eventually consistent with connect; tests now poll for the recorded introspection request instead of asserting immediately.
stdio-MCP e2e boot timeout: more headroom for the cold vite optimizeDeps boot, keeping the boot-wait < test-timeout gap so the boot diagnostic still surfaces.

CI changes

Push-to-main runs get unique concurrency groups (PRs keep cancel-in-progress), so rapid merges no longer cancel main's verdict — this previously let a lint failure land unnoticed.
Caching: bun package cache in every job, Playwright browsers in e2e jobs, GHA layer cache for the self-host Docker image.

Out of scope: the cloud dev-server SSE/OTel memory growth behind the e2e shard degradation (tracked separately) and shard rebalancing.

Verified with typecheck, lint, and repeated forced full-suite runs (turbo cache disabled); the suite is green back-to-back where it previously failed most forced runs.

pkg-pr-new · 2026-07-02T19:07:09Z

Open in StackBlitz

@executor-js/cli

npm i https://pkg.pr.new/@executor-js/cli@1273

@executor-js/config

npm i https://pkg.pr.new/@executor-js/config@1273

@executor-js/execution

npm i https://pkg.pr.new/@executor-js/execution@1273

@executor-js/sdk

npm i https://pkg.pr.new/@executor-js/sdk@1273

@executor-js/codemode-core

npm i https://pkg.pr.new/@executor-js/codemode-core@1273

@executor-js/runtime-quickjs

npm i https://pkg.pr.new/@executor-js/runtime-quickjs@1273

@executor-js/plugin-file-secrets

npm i https://pkg.pr.new/@executor-js/plugin-file-secrets@1273

@executor-js/plugin-graphql

npm i https://pkg.pr.new/@executor-js/plugin-graphql@1273

@executor-js/plugin-keychain

npm i https://pkg.pr.new/@executor-js/plugin-keychain@1273

@executor-js/plugin-mcp

npm i https://pkg.pr.new/@executor-js/plugin-mcp@1273

@executor-js/plugin-onepassword

npm i https://pkg.pr.new/@executor-js/plugin-onepassword@1273

@executor-js/plugin-openapi

npm i https://pkg.pr.new/@executor-js/plugin-openapi@1273

executor

npm i https://pkg.pr.new/executor@1273

commit: b0ed1da

greptile-apps · 2026-07-02T19:11:24Z

Greptile Summary

This PR fixes a cluster of load-dependent CI flakes and adds caching to speed up runs. The root causes are each addressed with a targeted, minimal change: heavy app boots moved to beforeAll, DB teardown now awaited, HTTP test servers probed at the TCP level before returning, and GraphQL introspection assertions converted to polling.

Boot serialization: all six host-selfhost integration files now boot their app graph in beforeAll rather than at module load; the vitest config switches to fileParallelism: false + maxWorkers: 1 with 120s/60s test and hook budgets, and TURBO_TEST_CONCURRENCY=3 caps turbo concurrency on CI runners.
Infrastructure fixes: sql.end() is now awaited in the DB finalizer (fixes ECONNRESET), makeTestHttpServer probes TCP readiness before returning (fixes OAuth server races), and GraphQL introspection assertions poll via waitForRecordedRequests (fixes eventually-consistent Yoga recorder).
isAsyncResultLoading semantic change: waiting states that carry a stale value are no longer treated as loading, enabling stale-while-revalidate rendering; the existing test suite is updated to cover both branches.

Confidence Score: 5/5

Safe to merge; all changes are scoped to test infrastructure, CI configuration, and one well-tested behavioral tweak to isAsyncResultLoading.

Every fix addresses a documented, specific root cause with a matching test or structural guard. The db.ts and async-result.ts changes are the only production-code modifications: the former removes a fire-and-forget that was already wrapped in Effect.ignore, and the latter has explicit before/after test coverage for both branches of the new condition. No auth, data, or request-path logic is touched.

e2e/cloud/auth-routing-flow.test.ts contains a page.waitForTimeout(250) sleep; otherwise no files require special attention.

Important Files Changed

Filename	Overview
.github/workflows/ci.yml	Adds Bun package cache, Playwright browser cache, GHA Docker layer cache, per-SHA concurrency group for push-to-main, and TURBO_TEST_CONCURRENCY=3 cap for the Test job.
apps/cloud/src/db/db.ts	Removes the fire-and-forget fork of sql.end() and awaits it directly; fixes the ECONNRESET race between per-scope DB teardown and the next test's connection.
apps/host-selfhost/vitest.config.ts	Replaces maxForks:2 parallelism cap with fileParallelism:false + maxWorkers:1 (fully serial), raises testTimeout to 120s and hookTimeout to 60s to accommodate loaded-runner boots.
packages/core/sdk/src/testing.ts	Adds a TCP-level readiness probe (up to 100 retries at 10 ms) after the HTTP server binds, before returning the test server shape; prevents connect failures on loaded runners.
packages/plugins/graphql/src/testing/index.ts	Adds waitForRecordedRequests helper that polls the Ref-backed request log until a predicate matches or 100×50ms exhausts; addresses eventual consistency of Yoga's async captureRequest path.
packages/react/src/lib/async-result.ts	Changes isAsyncResultLoading so waiting+value is not loading (stale-while-revalidate); only waiting-without-value and initial states are treated as loading. Test coverage updated to match.
e2e/cloud/auth-routing-flow.test.ts	Adds networkidle wait + retry fill to guard against hydration clearing the org-name input; includes a page.waitForTimeout(250) sleep that violates the no-sleep policy.
apps/host-selfhost/src/boot.test.ts	Moves app-graph construction from module-level top-level await to beforeAll, preventing multiple concurrent heavy boots at module load time.
package.json	Threads TURBO_TEST_CONCURRENCY into the turbo --concurrency flag via ${VAR:+expansion}; unset locally leaves behavior unchanged, CI sets it to 3.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph Before["Before (flaky)"]
        B1["Module load: top-level await\nmakeSelfHostTestApp / makeSelfHostApiHandler\n× 6 files in parallel"]
        B2["All 6 heavy boots run concurrently\nat module-evaluation time"]
        B3["CPU starvation on 2-core runner\n→ in-flight requests stall → timeout"]
        B1 --> B2 --> B3
    end

    subgraph After["After (stable)"]
        A1["Module load: declare handler/dispose\nvariables only (no I/O)"]
        A2["beforeAll: import + boot\none file at a time\n(fileParallelism: false, maxWorkers: 1)"]
        A3["Tests run with 120s budget\nsized for loaded runner"]
        A4["afterAll: dispose()"]
        A1 --> A2 --> A3 --> A4
    end

    subgraph SupportingFixes["Supporting fixes"]
        S1["makeTestHttpServer:\nTCP probe before returning\n(100 × 10ms retries)"]
        S2["db.ts close():\nawait sql.end() — no more\nECONNRESET race"]
        S3["GraphQL plugin tests:\nwaitForRecordedRequests polling\n(100 × 50ms) for eventual-\nconsistent Yoga recorder"]
        S4["CI workflow:\nper-SHA concurrency group for push,\nbun + playwright cache,\nTURBO_TEST_CONCURRENCY=3"]
    end

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    subgraph Before["Before (flaky)"]
        B1["Module load: top-level await\nmakeSelfHostTestApp / makeSelfHostApiHandler\n× 6 files in parallel"]
        B2["All 6 heavy boots run concurrently\nat module-evaluation time"]
        B3["CPU starvation on 2-core runner\n→ in-flight requests stall → timeout"]
        B1 --> B2 --> B3
    end

    subgraph After["After (stable)"]
        A1["Module load: declare handler/dispose\nvariables only (no I/O)"]
        A2["beforeAll: import + boot\none file at a time\n(fileParallelism: false, maxWorkers: 1)"]
        A3["Tests run with 120s budget\nsized for loaded runner"]
        A4["afterAll: dispose()"]
        A1 --> A2 --> A3 --> A4
    end

    subgraph SupportingFixes["Supporting fixes"]
        S1["makeTestHttpServer:\nTCP probe before returning\n(100 × 10ms retries)"]
        S2["db.ts close():\nawait sql.end() — no more\nECONNRESET race"]
        S3["GraphQL plugin tests:\nwaitForRecordedRequests polling\n(100 × 50ms) for eventual-\nconsistent Yoga recorder"]
        S4["CI workflow:\nper-SHA concurrency group for push,\nbun + playwright cache,\nTURBO_TEST_CONCURRENCY=3"]
    end

_{Reviews (3): Last reviewed commit: "Match the new sign-in heading in the sel..." | Re-trigger Greptile}

greptile-apps · 2026-07-02T19:11:28Z

+      - name: Cache Playwright browsers
+        uses: actions/cache@v4
+        with:
+          path: ~/.cache/ms-playwright
+          key: ${{ runner.os }}-playwright-1.60.0
+
      # Install from e2e so bunx resolves ITS pinned playwright (the version
      # the tests run against) rather than floating to the latest.
      - name: Install Playwright Chromium


The Playwright browser cache key is hardcoded to 1.60.0. If the Playwright version is bumped in bun.lock without updating this string, CI will continue serving the old browser binaries from cache, which can cause subtle test failures (new features or fix behaviour tied to the new version absent, or ABI mismatches). Deriving the key from the lockfile keeps it automatically in sync.

Suggested change

- name: Cache Playwright browsers

uses: actions/cache@v4

with:

path: ~/.cache/ms-playwright

key: ${{ runner.os }}-playwright-1.60.0

# Install from e2e so bunx resolves ITS pinned playwright (the version

# the tests run against) rather than floating to the latest.

- name: Install Playwright Chromium

- name: Cache Playwright browsers

uses: actions/cache@v4

with:

path: ~/.cache/ms-playwright

key: ${{ runner.os }}-playwright-${{ hashFiles('e2e/bun.lock', 'bun.lock') }}

# Install from e2e so bunx resolves ITS pinned playwright (the version

# the tests run against) rather than floating to the latest.

- name: Install Playwright Chromium

greptile-apps · 2026-07-02T19:11:29Z

+    yield* Effect.callback<void, TestHttpServerServeError>((resume) => {
+      const socket = createConnection({ host: "127.0.0.1", port: address.port }, () => {
+        socket.end();
+        resume(Effect.void);
+      });
+      socket.on("error", (cause) => resume(Effect.fail(new TestHttpServerServeError({ cause }))));
+    }).pipe(Effect.retry(Schedule.both(Schedule.spaced("10 millis"), Schedule.recurs(100))));


TCP probe socket not cleaned up on interruption

Effect.callback without a returned cleanup function means if the outer fiber is interrupted mid-probe (e.g., a test times out while the server is still booting), the in-flight createConnection socket is not destroyed. On a loaded CI runner this leaves dangling half-open sockets for the duration of the connect timeout. The fix is to return a cleanup from the callback that calls socket.destroy().

github-actions · 2026-07-02T19:11:32Z

Cloudflare preview

Torn down — the PR is closed.

RhysSullivan added 9 commits July 2, 2026 11:24

Make self-host tests CI-safe

f88bf97

Await cloud db teardown

2b3a2ba

Give local e2e boots more headroom

cac1cf5

Keep main CI runs alive

79c8552

Cache CI dependencies and image layers

91d59f0

Wait for recorded introspection requests in graphql plugin tests

f67b122

Size self-host test budget for loaded runners

78d9918

Probe test http servers for readiness before use

e8df02c

Die on recorder wait timeout instead of typing a string error

5f8a25d

Type the recorder wait timeout as a tagged error

19e2389

greptile-apps Bot reviewed Jul 2, 2026

View reviewed changes

RhysSullivan added 5 commits July 2, 2026 13:29

Merge remote-tracking branch 'origin/main' into ci/parallel-safe-tests

e80d1b5

Update e2e locators for the integration copy renames

cc34a0d

Keep stale data rendered while a refresh is waiting

8841d31

Guard create-org input against hydration clobbering

fcd1b0a

Match the new sign-in heading in the selfhost callback test

b0ed1da

RhysSullivan merged commit 8652c99 into main Jul 2, 2026
23 checks passed

RhysSullivan mentioned this pull request Jul 3, 2026

Fix cloud dev-stack degradation under sustained MCP load #1280

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the test suite parallel-safe and speed up CI#1273

Make the test suite parallel-safe and speed up CI#1273
RhysSullivan merged 15 commits into
mainfrom
ci/parallel-safe-tests

RhysSullivan commented Jul 2, 2026

Uh oh!

pkg-pr-new Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot Jul 2, 2026

Uh oh!

greptile-apps Bot Jul 2, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RhysSullivan commented Jul 2, 2026

Flakes fixed

CI changes

Uh oh!

pkg-pr-new Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Cloudflare preview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pkg-pr-new Bot commented Jul 2, 2026 •

edited

Loading

greptile-apps Bot commented Jul 2, 2026 •

edited

Loading

github-actions Bot commented Jul 2, 2026 •

edited

Loading