Skip to content

Repair the self-host e2e suite and run it in CI#1239

Open
RhysSullivan wants to merge 2 commits into
mainfrom
fix-selfhost-e2e-and-ci
Open

Repair the self-host e2e suite and run it in CI#1239
RhysSullivan wants to merge 2 commits into
mainfrom
fix-selfhost-e2e-and-ci

Conversation

@RhysSullivan

Copy link
Copy Markdown
Owner

Why

The self-host e2e project (vitest --project selfhost) has never run in CI: the test job filters e2e out (turbo run test --filter=!@executor-js/e2e) and the only e2e job runs a single stdio scenario on push. With nothing exercising it, the suite drifted red as the app moved on. A full run showed 9 of 56 files failing, none of them a real regression in the tested behavior. This repairs them and adds a CI gate so it stays honest.

Fixes

Stale selectors (the connect modal now uses an affixed credential field). The single-input bearer credential input renders an "Authorization: Bearer " affix with placeholder token, not the old paste the value / token. Updated the selectors, scoped to the dialog: scenarios/connect-handoff.test.ts, scenarios/connect-handoff-session.test.ts, selfhost/auth-methods-ui.test.ts.

Racy action-bar position read. scenarios/openapi-add-integration-action-bar.test.ts measured the Cancel button "in flight", but the floating action bar unmounts the instant the router navigates, so the read blocked the full timeout. The single-node counts already cover the reported doubled/ghosted-button regression; the step now asserts the submit commits and lands on the integration.

Shared-admin connection count. scenarios/api-tools.test.ts asserted a fresh identity has zero connections. That holds only on isolated-identity targets; self-host shares the bootstrap admin, so other scenarios' connections legitimately appear. The list call still runs everywhere (proving the endpoint); the zero-count assertion is gated off self-host per e2e/AGENTS.md.

Multi-tenant-only 404. scenarios/org-slug-routing.test.ts expected an unknown org slug to 404. That is a multi-tenant contract: self-host is single-tenant, /account/me always returns the instance org, and the slug is cosmetic. Gated the 404 step off self-host.

Cloud-shaped toolkit URL. selfhost/toolkits-mcp.test.ts connected to /e2e-org/mcp/toolkits/.... Self-host advertises the bare /mcp/toolkits/... (org prefixing is a cloud convention), and the server's RFC 9728 protected-resource doc reports the bare resource, which MCP SDK 1.29's stricter selectResourceURL requires the client URL to match. Connect to the URL self-host actually publishes.

Quarantined (tracked follow-up). scenarios/microsoft-emulator.test.ts and the Microsoft leg of scenarios/oauth-client-handoff.test.ts are skipped with a documented reason. microsoft.addGraph only accepts the canonical Microsoft Graph spec in the streamable block-YAML profile (it structurally splits the 37MB doc to avoid OOMing the 128MB Workers isolate) and hard-errors on anything else; the emulator serves a small spec outside that profile. Making these pass needs the emulator to serve a block-YAML Graph spec (or a non-Workers compile path), which is separate work. A skip option was added to the scenario helper so the quarantine stays visible in the report rather than silently deleted.

CI

New E2E (self-host) job in ci.yml: boots the self-host target via its globalsetup (no external infra), installs Playwright Chromium, and runs bun run test:selfhost on pull requests and push. If the browser scenarios prove flaky on CI, gate it to push-only like e2e-local, or add a retry.

New coverage

selfhost/oauth-popup-callback-org-state.test.ts: a browser-free regression test for the org-scoped OAuth popup callback (the bug fixed in #1235). It drives the org-context flow, asserts the provider state is wrapped, then does an authenticated GET of the callback and asserts it completes ("Connected") instead of "OAuth session expired or not found". No Playwright, so it is a good candidate to gate this class of regression cheaply.

Verification

  • Full self-host suite locally: 55 files pass, 1 skipped; 72 tests pass, 4 skipped, 0 fail.
  • oxlint --deny-warnings, oxfmt --check, and turbo run typecheck (42/42) pass.

Note: the cross-target scenarios also run on the cloud target, which I could not boot locally. The org-slug and api-tools changes branch on the target so cloud behavior is unchanged, and the Microsoft quarantine applies on both targets.

Relationship to #1235

The first commit cherry-picks #1235's OAuth popup fix (authorship preserved) so the OAuth-callback scenarios and the new test above are green and CI passes. If #1235 merges first, rebasing this branch drops the duplicate commit.

sethcarlton and others added 2 commits June 30, 2026 19:28
The self-host e2e project never ran in CI, so it drifted red while the app
moved on. Repair the failing scenarios (stale connect-modal selectors, a racy
action-bar position read, a shared-admin connection-count assertion, a
multi-tenant-only org-slug 404 step, and a cloud-shaped toolkit MCP URL), add a
documented `skip` affordance to the scenario helper, quarantine the two
Microsoft emulator scenarios that need a canonical block-YAML Graph spec
(tracked separately), and add a CI job that boots the self-host target and runs
the suite on pull requests and push.
@cloudflare-workers-and-pages

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
executor-marketing 514bad6 Commit Preview URL

Branch Preview URL
Jul 01 2026, 03:16 AM

@cloudflare-workers-and-pages

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
executor-cloud 514bad6 Jul 01 2026, 03:17 AM

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Cloudflare preview

Console https://executor-preview-pr-1239.executor-e2e.workers.dev
MCP https://executor-preview-pr-1239.executor-e2e.workers.dev/mcp
Deployed commit 514bad6

Sign-in is Cloudflare Access (one-time PIN to an allowed email). The preview has its own database and encryption key; it is destroyed when this PR closes.

@greptile-apps

greptile-apps Bot commented Jul 1, 2026

Copy link
Copy Markdown

Greptile Summary

This PR repairs the self-host e2e suite (9 of 56 files were failing) and adds a new CI gate so the suite stays honest on every PR. The changes fall into two categories: test fixes (stale selectors, racy position check, target-conditional assertions, wrong toolkit URL, quarantined Microsoft tests) and a functional bug fix in oauth-popup.ts that unwraps the org-slug-wrapped OAuth callback state before passing the raw session token to complete.

  • Core fix (packages/core/api/src/oauth-popup.ts): runOAuthCallback now calls decodeOAuthCallbackState on the URL state parameter and extracts the inner raw token before passing it to complete and embedding it as sessionId in the popup result; plain tokens (no wrapper) are unaffected via the null-coalescing fallback.
  • New selfhost regression test (e2e/selfhost/oauth-popup-callback-org-state.test.ts): a browser-free HTTP journey that drives the org-context OAuth popup flow end-to-end and asserts "Connected" is returned instead of "OAuth session expired or not found".
  • CI (ci.yml): new e2e-selfhost job runs on PRs and push, booting the dev server via globalsetup and exercising both scenarios/** and selfhost/**.

Confidence Score: 4/5

Safe to merge; the oauth-popup.ts fix is narrow, backward-compatible, and covered by both a new unit test and a new end-to-end regression test.

All thirteen changed files are in good shape. The core oauth-popup.ts logic is straightforward — a null-coalescing unwrap with a unit test and an e2e regression test guarding it. The e2e suite fixes are mechanical selector/URL/assertion updates with clear rationale. The two observations (hardcoded credentials and the missing CI timeout) are quality notes, not correctness issues.

.github/workflows/ci.yml for the missing job timeout; e2e/selfhost/oauth-popup-callback-org-state.test.ts for the hardcoded OAuth test-server credentials.

Important Files Changed

Filename Overview
packages/core/api/src/oauth-popup.ts Core fix: decodes the org-slug-wrapped OAuth state to extract the raw session token before passing it to complete and embedding it in the popup result. Correct and well-tested by the accompanying unit test.
packages/core/api/src/oauth-popup.test.ts Adds a unit test covering the wrapped-state unwrapping path; verifies both the complete callback receives the raw token and the popup HTML uses it as sessionId.
e2e/selfhost/oauth-popup-callback-org-state.test.ts New black-box regression test that drives the full org-scoped OAuth callback round-trip; well-structured but relies on hardcoded alice:password credentials from the OAuth test server.
e2e/src/scenario.ts Adds skip option to ScenarioOptions; uses it.skip correctly — callback is never run, and the reason string is self-documenting in the source.
.github/workflows/ci.yml Adds e2e-selfhost CI job that runs on PRs and push; uses bun run bootstrap correctly for dev-server prerequisites, but lacks a timeout-minutes guard.
e2e/scenarios/api-tools.test.ts Correctly gates the zero-connection assertion off selfhost, matching the isolation rule in AGENTS.md; the endpoint call itself still exercises the route on all targets.
e2e/scenarios/openapi-add-integration-action-bar.test.ts Removes the racy in-flight position read; the existing single-node count assertions still cover the ghost-button regression. Change is justified and the regression coverage is preserved.
e2e/scenarios/microsoft-emulator.test.ts Quarantines the blocked Microsoft scenario with a self-documenting skip reason; no logic changes to the test body.
e2e/scenarios/oauth-client-handoff.test.ts Quarantines only the Microsoft handoff scenario with skip; the two non-Microsoft OAuth client scenarios in the same file are unaffected.
e2e/selfhost/toolkits-mcp.test.ts Fixes the toolkit URL to use the bare /mcp/toolkits/… path self-host actually serves, matching RFC 9728 protected-resource metadata and MCP SDK 1.29's stricter URL validation.
e2e/scenarios/org-slug-routing.test.ts Correctly gates the 404-on-unknown-slug step off selfhost; the behaviour difference (single-tenant canonicalization vs. multi-tenant 404) is accurately documented inline.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant App as Executor App (selfhost)
    participant Provider as OAuth Provider (test server)
    participant Callback as /oauth/callback

    App->>Provider: "GET /authorize?state=wrapped({state:"raw-token",orgSlug:"default"})"
    Note over App,Provider: State is org-slug-wrapped envelope

    Provider-->>App: 302 → consent page
    App->>Provider: POST /authorize (Basic auth)
    Provider-->>Callback: "302 → /oauth/callback?state=wrapped&code=code"
    Note over Callback: decodeOAuthCallbackState(wrapped) → {state:"raw-token"}

    Callback->>App: "complete({state: "raw-token", code})"
    Note over Callback: Before fix: passed wrapped state → session lookup failed
    App-->>Callback: OAuth result (access token)
    Callback-->>Provider: 200 HTML (Connected, sessionId: raw-token)
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant App as Executor App (selfhost)
    participant Provider as OAuth Provider (test server)
    participant Callback as /oauth/callback

    App->>Provider: "GET /authorize?state=wrapped({state:"raw-token",orgSlug:"default"})"
    Note over App,Provider: State is org-slug-wrapped envelope

    Provider-->>App: 302 → consent page
    App->>Provider: POST /authorize (Basic auth)
    Provider-->>Callback: "302 → /oauth/callback?state=wrapped&code=code"
    Note over Callback: decodeOAuthCallbackState(wrapped) → {state:"raw-token"}

    Callback->>App: "complete({state: "raw-token", code})"
    Note over Callback: Before fix: passed wrapped state → session lookup failed
    App-->>Callback: OAuth result (access token)
    Callback-->>Provider: 200 HTML (Connected, sessionId: raw-token)
Loading

Reviews (1): Last reviewed commit: "test(e2e): repair self-host scenarios an..." | Re-trigger Greptile

method: "POST",
redirect: "manual",
headers: {
authorization: `Basic ${Buffer.from("alice:password").toString("base64")}`,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded test-server credentials

The Basic auth header for the OAuth consent step hardcodes alice:password. If serveOAuthTestServer ever changes its built-in test users (or if a future implementation randomises them), this step will silently fail with a 401 and the assertion on consent.status will fire with a cryptic mismatch. The credentials should ideally come from a constant exported by the test server utility, or at minimum from a named constant at the top of the file to make the coupling visible.

Comment thread .github/workflows/ci.yml
Comment on lines +118 to +155
e2e-selfhost:
name: E2E (self-host)
# Runs on PRs and push: the self-host project boots its own dev server (no
# external infra) and is the regression guard that PR CI was missing — the
# org-scoped OAuth callback bug lived exactly here and shipped green because
# nothing ran this suite. Browser scenarios are included; if they prove
# flaky on CI, gate this to push-only (like e2e-local) or add a retry.
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- uses: oven-sh/setup-bun@v2
with:
bun-version: 1.3.11

# The self-host web app + emulator OAuth flows spawn Node, and some
# scenarios drive a headless browser: pin Node 22 and install Chromium.
- uses: actions/setup-node@v4
with:
node-version: 22

# Full fresh-checkout setup: installs deps AND builds the vite-plugin
# bundle + react console routes the web boot needs (a bare `bun install`
# leaves those unbuilt). bootstrap also fetches Chromium, but without the
# ubuntu system libs the headless shell needs — the step below adds
# `--with-deps` and the headless-shell download.
- run: bun run bootstrap

- name: Install Playwright Chromium (with system deps)
run: bunx playwright install --with-deps chromium chromium-headless-shell
working-directory: e2e

# Boots the self-host dev server via its globalsetup and runs the
# cross-target `scenarios/**` plus the selfhost-only `selfhost/**` suite.
- name: Run the self-host e2e suite
run: bun run test:selfhost
working-directory: e2e

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No timeout-minutes on the new job

The e2e-selfhost job boots a dev server via globalsetup before running the suite. If the server fails to start or hangs, the job will block until GitHub Actions' 6-hour default, holding up the queue and burning CI minutes. The existing e2e-local job has the same omission, but adding a timeout-minutes (e.g. 30) here would bound the blast radius for a new job that is now running on every PR.

@pkg-pr-new

pkg-pr-new Bot commented Jul 1, 2026

Copy link
Copy Markdown

Open in StackBlitz

@executor-js/cli

npm i https://pkg.pr.new/@executor-js/cli@1239

@executor-js/config

npm i https://pkg.pr.new/@executor-js/config@1239

@executor-js/execution

npm i https://pkg.pr.new/@executor-js/execution@1239

@executor-js/sdk

npm i https://pkg.pr.new/@executor-js/sdk@1239

@executor-js/codemode-core

npm i https://pkg.pr.new/@executor-js/codemode-core@1239

@executor-js/runtime-quickjs

npm i https://pkg.pr.new/@executor-js/runtime-quickjs@1239

@executor-js/plugin-file-secrets

npm i https://pkg.pr.new/@executor-js/plugin-file-secrets@1239

@executor-js/plugin-graphql

npm i https://pkg.pr.new/@executor-js/plugin-graphql@1239

@executor-js/plugin-keychain

npm i https://pkg.pr.new/@executor-js/plugin-keychain@1239

@executor-js/plugin-mcp

npm i https://pkg.pr.new/@executor-js/plugin-mcp@1239

@executor-js/plugin-onepassword

npm i https://pkg.pr.new/@executor-js/plugin-onepassword@1239

@executor-js/plugin-openapi

npm i https://pkg.pr.new/@executor-js/plugin-openapi@1239

executor

npm i https://pkg.pr.new/executor@1239

commit: 514bad6

RhysSullivan added a commit that referenced this pull request Jul 2, 2026
The self-host e2e project never ran in CI, so it drifted red while the app
moved on. Repair the failing scenarios (stale connect-modal selectors, a racy
action-bar position read, a shared-admin connection-count assertion, a
multi-tenant-only org-slug 404 step, and a cloud-shaped toolkit MCP URL), add a
documented skip affordance to the scenario helper, and quarantine the two
Microsoft emulator scenarios that need a canonical block-YAML Graph spec
(tracked separately).

Cherry-picked from origin/fix-selfhost-e2e-and-ci (PR #1239); its CI job is
superseded by the cloud+selfhost matrix job already on this branch.
RhysSullivan added a commit that referenced this pull request Jul 2, 2026
* e2e: fix stale docs, harden dev-CLI status, add cloud+selfhost CI jobs

- e2e/AGENTS.md: the anatomy example predated the service-yielding scenario()
  signature (no more needs/ctx); capability notes said browser was cloud-only
  and mcp-oauth selfhost-only, both wrong per targets/*.ts; file placement now
  lists cloudflare/, local/, cli/; document summary, motel, test:* scripts,
  the viewer/ SPA, pr-media, and the Windows desktop/cli VM targets.
- e2e dev CLI status: probe the app URL before reporting ready (a zombie
  runner with a dead server used to read as healthy), and only parse real
  state files in .dev/ (cloud.journey.json rendered as a garbage DEAD line).
- CI: run the cloud and selfhost e2e projects on every PR/push with failure
  artifacts (trace.zip, session.mp4, step screenshots) uploaded per target.

* Fix the MCP regressions and policy gaps the e2e suite caught

Cloud (hibernatable MCP DO rework fallout):
- server.ts no longer gates MCP dispatch behind the Axiom tracer install: with
  AXIOM_TOKEN unset (any dev boot without motel) every /mcp request fell
  through to the SPA router and 404ed.
- agent-handler mounts a second serve() on /mcp/toolkits/:slug — the agents
  SDK builds an exact-match URLPattern, so the single /mcp handler never saw
  toolkit paths.
- Restore the old envelope's transport contract: JSON-RPC 405 for verbs
  outside GET/POST/DELETE/OPTIONS (was a bare 404), 200 for session DELETE
  (agents SDK answers 204), and a reconnect-worded 404 for requests that
  race a condemned DO's abort.

Selfhost (org-scoped MCP OAuth discovery):
- The org-segment strip middleware now carries the original pathname in an
  internal header, and the protected-resource metadata echoes it, so a client
  that dialed /<org>/mcp/... passes the MCP SDK's RFC 9728 resource check.
  Bare paths are untouched; the header is stripped from unrewritten requests.

Microsoft Graph URL policy:
- microsoftHttpPlugin gains the hosts' local-network dev posture: selfhost,
  cloud, and the cloudflare host thread allowLocalNetwork into
  allowUnsafeUrlOverrides, and the override now also admits plain-http
  loopback URLs (local emulators). Production behavior is unchanged: the
  flag is unset there, and non-loopback http stays rejected even with it.

Stale e2e assertion refreshed for an intentional product change:
- tool-descriptions: the execute inventory is names-only since the skills
  tool slimming; drop the per-connection description assertions.

* test(e2e): repair self-host scenarios and gate the suite in CI

The self-host e2e project never ran in CI, so it drifted red while the app
moved on. Repair the failing scenarios (stale connect-modal selectors, a racy
action-bar position read, a shared-admin connection-count assertion, a
multi-tenant-only org-slug 404 step, and a cloud-shaped toolkit MCP URL), add a
documented skip affordance to the scenario helper, and quarantine the two
Microsoft emulator scenarios that need a canonical block-YAML Graph spec
(tracked separately).

Cherry-picked from origin/fix-selfhost-e2e-and-ci (PR #1239); its CI job is
superseded by the cloud+selfhost matrix job already on this branch.

* test(e2e): quarantine the two agents-SDK transport gaps

Both are real gaps in the hibernatable Agent bridge (standalone SSE
supersede never resolves; response routing scopes JSON-RPC ids per
session instead of per stream), not regressions on this branch. Skip
with reasons so the suite gates CI while the gaps stay visible;
fixing the bridge is tracked separately.

* test(e2e): repair or quarantine the cloud scenarios that drifted on main

The cloud e2e project never gated CI either, so ten scenarios rotted.
Refresh the four whose product behavior moved intentionally:
- connect-card-ssr-origin: install URLs are org-slug-scoped since the
  org-slug console URLs change (#974); accept the slug form.
- connection-owner-isolation: /api/auth/switch-organization was deleted
  with cookie-based org switching (#1000); switch orgs the way the web
  client does, via the x-executor-organization selector header.
- oauth-connections: the popup-state fix (#1235) envelopes the callback
  state as base64url JSON; decode it and assert the inner state + orgSlug.
- unauthenticated-skeleton: the 404 page shipped as a standalone page in
  the same commit as the shell-framed assertion (#986); assert the page
  it actually renders.

Quarantine the six that need product/harness work, each with a reason:
mcp-browser-approval-org-scope + the two browser-approval scenarios
(cloud-only: the mcporter browser-approval completion never lands),
cli-device-login (device-flow terminal never reaches the emulator), and
run-panel-auto-approve (autoApprove leaves the run paused; never green
since the feature landed in #1183).

* lint: suppress the adapter-boundary error checks in the MCP agent handler

The condemned-DO abort surfaces as a plain runtime Error thrown out of the
agents SDK's serve.fetch; its message string is the only signal. Narrow
suppressions with boundary reasons, per the typed-errors skill.

* test(e2e): quarantine the seat-limit scenario on the emulate 0.9.0 Autumn gap

emulate 0.9.0's Autumn customer balances omit the expanded feature object
autumn-js asserts, so useCustomer crashes the org page into the error
boundary. Fixed upstream in UsefulSoftwareCo/emulate#8 (0.9.1); unskip
once the publish lands and the e2e dependency is bumped.

* ci: retrigger

* ci: shard the cloud e2e job so each shard gets a fresh dev stack

A full-suite run against one long-lived cloud dev server degrades partway
through: sign-in starts refusing connections and everything after fails
with fetch errors (the same SSE/OTel memory growth being instrumented on
main). Four shards, each booting its own stack, stay under the threshold.
Re-merge into one job once the leak is fixed.

* ci: split the cloud e2e job into eight shards

Four shards still hit the dev-server degradation a few minutes in on
2-core runners; eight keeps each stack's lifetime under the threshold.

* ci: retry flaky browser scenarios twice on the same stack

The remaining shard failures are scattered single-test Playwright
waitFor timeouts on 2-core runners, not systemic stack death; vitest
--retry clears them without hiding real regressions (a consistent
failure still fails after 3 attempts).

* test(e2e): quarantine the Graph default-add scenario on CI runners

Compiling the Graph spec inside dev workerd 500s on 2-core GitHub
runners and takes the dev stack down for every scenario after it in the
shard (the auth-hint/org-slug/docs-link failures in the same shard were
all downstream of this). Local runs are unaffected; skip only under CI.

* selfhost: read the local-network posture from env in the plugins seam

plugins() runs per request; loadConfig() does filesystem work (data
dir, secret key resolution) that should not ride the request path. The
env read is the same computation loadConfig makes for the flag.

* e2e: bump @executor-js/emulate to 0.10.0, unskip the seat-limit scenario

0.10.0 ships the Autumn balances.feature expansion autumn-js asserts
(UsefulSoftwareCo/emulate#8), so the org page renders again and the
scenario passes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants