Skip to content

Make the e2e suite green and gate cloud + self-host in CI#1258

Merged
RhysSullivan merged 14 commits into
mainfrom
e2e-ci-ready
Jul 2, 2026
Merged

Make the e2e suite green and gate cloud + self-host in CI#1258
RhysSullivan merged 14 commits into
mainfrom
e2e-ci-ready

Conversation

@RhysSullivan

@RhysSullivan RhysSullivan commented Jul 2, 2026

Copy link
Copy Markdown
Owner

The e2e suite had never gated CI, so it drifted red while the app moved on. This makes the default suite (cloud + selfhost) green, wires both targets into PR CI with failure artifacts, and repairs the drift it caught — including three real product regressions from the hibernatable MCP DO rework.

Product fixes

Cloud MCP (DO rework fallout)

  • MCP dispatch no longer hides behind the Axiom tracer install: with AXIOM_TOKEN unset (any dev boot without motel, including bun run cli up cloud), every /mcp request fell through to the SPA router and 404ed.
  • /mcp/toolkits/<slug> reaches the session DO again — the agents SDK's serve() builds an exact-match URLPattern, so the single /mcp mount never saw toolkit paths (all toolkit MCP scenarios were dead).
  • Restored the old envelope's transport contract: JSON-RPC 405 for verbs outside GET/POST/DELETE/OPTIONS, 200 for session DELETE, and a reconnect-worded 404 when a request races a condemned DO's abort.

Selfhost MCP OAuth discovery

  • The org-segment strip middleware now carries the original pathname in an internal header and the protected-resource metadata echoes it, so clients dialing /<org>/mcp/... pass the MCP SDK's RFC 9728 resource check. Bare paths unchanged; the header is stripped from unrewritten requests.

Microsoft Graph URL policy

  • The hosts thread their local-network dev posture (ALLOW_LOCAL_NETWORK / EXECUTOR_ALLOW_LOCAL_NETWORK) into microsoftHttpPlugin({ allowUnsafeUrlOverrides }), and the override admits plain-http loopback URLs so local emulators work. Production behavior is byte-identical (flag unset; non-loopback http still rejected).

Test repairs (intentional product changes the tests missed)

Quarantines (each skip: names its suspect)

  • Two agents-SDK transport gaps (standalone SSE supersede never resolves; JSON-RPC ids scoped per session instead of per stream, so colliding ids cross-wire).
  • Cloud browser-approval plumbing (three scenarios; mcporter approval completion never lands).
  • CLI device-login terminal flow.
  • run-panel autoApprove (never green since the feature landed in Run panel: auto-approve operator-invoked tools #1183).
  • Microsoft emulator scenarios blocked on a block-YAML-profile Graph spec (from Repair the self-host e2e suite and run it in CI #1239, which this branch includes; its CI job is superseded by the matrix job here).
  • The Graph default-add scenario is skipped ONLY under CI: compiling the ~37MB spec inside dev workerd exhausts 2-core runners and takes the dev stack down for the rest of the shard. Local runs still cover it.

Tooling and docs

  • e2e/AGENTS.md rewritten to match the real scenario() API (services are yielded, no needs/ctx), correct capability matrix, all scenario directories, and the undocumented scripts (summary, motel, test:*, viewer, pr-media).
  • Dev CLI status now HTTP-probes the app before reporting ready (zombie stacks read as healthy before) and ignores non-state JSON debris in .dev/.
  • CI: cloud + selfhost matrix job on every PR/push, 30-minute timeout, runs/ uploaded as artifacts on failure.

Emulator fix that rode along

The seat-limit billing scenario caught a real emulate regression: 0.9.0's Autumn emulator omitted the expanded feature object on customer balances, which autumn-js's useCustomer asserts, crashing the org page into the error boundary. Fixed upstream in UsefulSoftwareCo/emulate#8 and published (0.10.0); this PR bumps the e2e dependency and the scenario is green again.

Verification

  • format:check, lint, typecheck, test all green locally.
  • selfhost: 77 passed, 5 skipped (clean run).
  • cloud: 133 passed, 1 legit fail (seat-limit, now quarantined pending the emulate publish), 11 skipped.
  • The cloud dev stack degrades after a few minutes of sustained suite load (locally under parallel suites, and on 2-core CI runners even solo) — requests start failing and everything after dies with connection errors. This tracks the SSE/OTel memory growth being instrumented on main (Add per-isolate SSE + OTel memory counters to cloud worker #1255, Upgrade agents to 0.17.3 + patch MCP SSE forwarder against unbounded buffering #1257). Until that is fixed the CI cloud job runs as eight vitest shards, each on a fresh dev stack; selfhost runs unsharded and green.

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jul 2, 2026

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
executor-marketing 6e7bd93 Commit Preview URL

Branch Preview URL
Jul 02 2026, 09:16 AM

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jul 2, 2026

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
executor-cloud 6e7bd93 Jul 02 2026, 09:16 AM

@greptile-apps

greptile-apps Bot commented Jul 2, 2026

Copy link
Copy Markdown

Greptile Summary

This PR makes the e2e suite green and wires cloud + selfhost into PR CI, while fixing three real cloud MCP regressions (tracer-gated dispatch, toolkit path routing, transport contract), a selfhost RFC 9728 OAuth discovery regression, and the Microsoft Graph loopback URL policy for local emulators.

  • Cloud MCP: MCP dispatch is now classified before installTracerProvider(), a second serve() mount handles /mcp/toolkits/:slug paths (agents SDK builds exact-match URLPatterns), and the transport contract is restored (405 for unsupported verbs, 200 for DELETE, reconnect 404 for condemned DOs).
  • Selfhost OAuth discovery: the org-strip middleware now carries the original org-scoped pathname in x-executor-mcp-original-path so the protected-resource metadata can echo it back, satisfying the MCP SDK's RFC 9728 same-origin resource check.
  • CI: cloud sharded into 8 fresh-stack jobs + selfhost unsharded; runs/ artifacts uploaded on failure; --retry=2 for sporadic browser timeouts on 2-core runners.

Confidence Score: 5/5

Safe to merge — all three cloud MCP regressions are targeted fixes with matching e2e coverage, the selfhost header flow has multiple validation layers preventing spoofing, and the Microsoft loopback allowance is correctly scoped to allowUnsafeUrlOverrides === true.

The transport-contract restoration (405/200/reconnect-404), the dual-serve toolkit routing, and the org-path header are all well-understood point fixes with no fan-out risk to unrelated paths. The security posture of the new header (isRecognizedMcpOrgPath validation in auth.ts plus middleware stripping at both prod and dev boundaries) is correct and defended in depth.

No files require special attention beyond the minor notes above.

Important Files Changed

Filename Overview
apps/cloud/src/server.ts Rearranges MCP classification before the tracer guard so /mcp requests work without AXIOM_TOKEN; conditionally flushes telemetry only when tracing was installed.
apps/cloud/src/mcp/agent-handler.ts Adds a second serve() mount for toolkit paths, method-filters before auth, rewrites DELETE 204→200, and maps condemned-DO aborts to a reconnect 404.
apps/host-selfhost/src/mcp/org-path.ts Adds MCP_ORIGINAL_PATH_HEADER constant, isRecognizedMcpOrgPath validator, and mcpResourcePathFromOriginalPath extractor; all correct and well-guarded against spoofing.
apps/host-selfhost/src/mcp/auth.ts Recovers org-scoped path from the internal header to reflect it in PRM metadata and resource path; double-validates the header via isRecognizedMcpOrgPath before trusting it.
apps/host-selfhost/src/serve.ts Effect middleware now sets MCP_ORIGINAL_PATH_HEADER on rewrites and scrubs any client-supplied value on unrewritten requests to prevent header spoofing.
packages/plugins/microsoft/src/sdk/graph.ts Adds parseTrustedLoopbackHttpUrl so plain-http loopback URLs are accepted under allowUnsafeUrlOverrides; non-loopback http still rejected; new test coverage for all three branches.
.github/workflows/ci.yml Adds cloud (8 shards) + selfhost e2e matrix job with 30-min timeout and runs/ artifact upload on failure; --retry=2 covers sporadic browser timeouts on 2-core runners.
e2e/src/scenario.ts Adds skip option that registers it.skip with the reason, bypassing resolveTarget() so skipped scenarios appear in the report on all targets without running.
e2e/selfhost/oauth-popup-callback-org-state.test.ts New regression guard for org-wrapped OAuth callback state; exercises the full HTTP journey without a browser dependency; well-isolated with a unique slug prefix.
e2e/scripts/cli.ts Status command is now async with HTTP health probe and JSON debris skip; isInstanceState type guard added but does not validate status/urls fields used downstream (flagged in prior review).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Incoming Request] --> B{browserTracesResponse?}
    B -- yes --> C[Return OTLP response]
    B -- no --> D[classifyMcpPath]
    D --> E[installTracerProvider]
    E --> F{mcpRoute?.kind === 'mcp'?}
    F -- yes --> G[mcpAgentHandler]
    G --> H{resource.kind === 'toolkit'?}
    H -- yes --> I[serveToolkit /mcp/toolkits/:slug]
    H -- no --> J[serve /mcp]
    I --> K{response.status === 204 AND DELETE?}
    J --> K
    K -- yes --> L[Rewrite 200 + headers]
    K -- no --> M[wrapMcpSseResponse]
    F -- no --> N{tracingInstalled?}
    N -- no --> O[fetchHandler SPA/SSR]
    N -- yes --> P{isAppOwnedPath?}
    P -- yes --> Q[fetchHandler with Effect tracing]
    P -- no --> R[Worker span + fetchHandler]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[Incoming Request] --> B{browserTracesResponse?}
    B -- yes --> C[Return OTLP response]
    B -- no --> D[classifyMcpPath]
    D --> E[installTracerProvider]
    E --> F{mcpRoute?.kind === 'mcp'?}
    F -- yes --> G[mcpAgentHandler]
    G --> H{resource.kind === 'toolkit'?}
    H -- yes --> I[serveToolkit /mcp/toolkits/:slug]
    H -- no --> J[serve /mcp]
    I --> K{response.status === 204 AND DELETE?}
    J --> K
    K -- yes --> L[Rewrite 200 + headers]
    K -- no --> M[wrapMcpSseResponse]
    F -- no --> N{tracingInstalled?}
    N -- no --> O[fetchHandler SPA/SSR]
    N -- yes --> P{isAppOwnedPath?}
    P -- yes --> Q[fetchHandler with Effect tracing]
    P -- no --> R[Worker span + fetchHandler]
Loading

Reviews (8): Last reviewed commit: "e2e: bump @executor-js/emulate to 0.10.0..." | Re-trigger Greptile

Comment thread apps/cloud/src/mcp/agent-handler.ts
Comment thread e2e/scripts/cli.ts
@RhysSullivan RhysSullivan reopened this Jul 2, 2026
- e2e/AGENTS.md: the anatomy example predated the service-yielding scenario()
  signature (no more needs/ctx); capability notes said browser was cloud-only
  and mcp-oauth selfhost-only, both wrong per targets/*.ts; file placement now
  lists cloudflare/, local/, cli/; document summary, motel, test:* scripts,
  the viewer/ SPA, pr-media, and the Windows desktop/cli VM targets.
- e2e dev CLI status: probe the app URL before reporting ready (a zombie
  runner with a dead server used to read as healthy), and only parse real
  state files in .dev/ (cloud.journey.json rendered as a garbage DEAD line).
- CI: run the cloud and selfhost e2e projects on every PR/push with failure
  artifacts (trace.zip, session.mp4, step screenshots) uploaded per target.
Cloud (hibernatable MCP DO rework fallout):
- server.ts no longer gates MCP dispatch behind the Axiom tracer install: with
  AXIOM_TOKEN unset (any dev boot without motel) every /mcp request fell
  through to the SPA router and 404ed.
- agent-handler mounts a second serve() on /mcp/toolkits/:slug — the agents
  SDK builds an exact-match URLPattern, so the single /mcp handler never saw
  toolkit paths.
- Restore the old envelope's transport contract: JSON-RPC 405 for verbs
  outside GET/POST/DELETE/OPTIONS (was a bare 404), 200 for session DELETE
  (agents SDK answers 204), and a reconnect-worded 404 for requests that
  race a condemned DO's abort.

Selfhost (org-scoped MCP OAuth discovery):
- The org-segment strip middleware now carries the original pathname in an
  internal header, and the protected-resource metadata echoes it, so a client
  that dialed /<org>/mcp/... passes the MCP SDK's RFC 9728 resource check.
  Bare paths are untouched; the header is stripped from unrewritten requests.

Microsoft Graph URL policy:
- microsoftHttpPlugin gains the hosts' local-network dev posture: selfhost,
  cloud, and the cloudflare host thread allowLocalNetwork into
  allowUnsafeUrlOverrides, and the override now also admits plain-http
  loopback URLs (local emulators). Production behavior is unchanged: the
  flag is unset there, and non-loopback http stays rejected even with it.

Stale e2e assertion refreshed for an intentional product change:
- tool-descriptions: the execute inventory is names-only since the skills
  tool slimming; drop the per-connection description assertions.
The self-host e2e project never ran in CI, so it drifted red while the app
moved on. Repair the failing scenarios (stale connect-modal selectors, a racy
action-bar position read, a shared-admin connection-count assertion, a
multi-tenant-only org-slug 404 step, and a cloud-shaped toolkit MCP URL), add a
documented skip affordance to the scenario helper, and quarantine the two
Microsoft emulator scenarios that need a canonical block-YAML Graph spec
(tracked separately).

Cherry-picked from origin/fix-selfhost-e2e-and-ci (PR #1239); its CI job is
superseded by the cloud+selfhost matrix job already on this branch.
Both are real gaps in the hibernatable Agent bridge (standalone SSE
supersede never resolves; response routing scopes JSON-RPC ids per
session instead of per stream), not regressions on this branch. Skip
with reasons so the suite gates CI while the gaps stay visible;
fixing the bridge is tracked separately.
The cloud e2e project never gated CI either, so ten scenarios rotted.
Refresh the four whose product behavior moved intentionally:
- connect-card-ssr-origin: install URLs are org-slug-scoped since the
  org-slug console URLs change (#974); accept the slug form.
- connection-owner-isolation: /api/auth/switch-organization was deleted
  with cookie-based org switching (#1000); switch orgs the way the web
  client does, via the x-executor-organization selector header.
- oauth-connections: the popup-state fix (#1235) envelopes the callback
  state as base64url JSON; decode it and assert the inner state + orgSlug.
- unauthenticated-skeleton: the 404 page shipped as a standalone page in
  the same commit as the shell-framed assertion (#986); assert the page
  it actually renders.

Quarantine the six that need product/harness work, each with a reason:
mcp-browser-approval-org-scope + the two browser-approval scenarios
(cloud-only: the mcporter browser-approval completion never lands),
cli-device-login (device-flow terminal never reaches the emulator), and
run-panel-auto-approve (autoApprove leaves the run paused; never green
since the feature landed in #1183).
…dler

The condemned-DO abort surfaces as a plain runtime Error thrown out of the
agents SDK's serve.fetch; its message string is the only signal. Narrow
suppressions with boundary reasons, per the typed-errors skill.
…tumn gap

emulate 0.9.0's Autumn customer balances omit the expanded feature object
autumn-js asserts, so useCustomer crashes the org page into the error
boundary. Fixed upstream in UsefulSoftwareCo/emulate#8 (0.9.1); unskip
once the publish lands and the e2e dependency is bumped.
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Cloudflare preview

Torn down — the PR is closed.

@pkg-pr-new

pkg-pr-new Bot commented Jul 2, 2026

Copy link
Copy Markdown

Open in StackBlitz

@executor-js/cli

npm i https://pkg.pr.new/@executor-js/cli@1258

@executor-js/config

npm i https://pkg.pr.new/@executor-js/config@1258

@executor-js/execution

npm i https://pkg.pr.new/@executor-js/execution@1258

@executor-js/sdk

npm i https://pkg.pr.new/@executor-js/sdk@1258

@executor-js/codemode-core

npm i https://pkg.pr.new/@executor-js/codemode-core@1258

@executor-js/runtime-quickjs

npm i https://pkg.pr.new/@executor-js/runtime-quickjs@1258

@executor-js/plugin-file-secrets

npm i https://pkg.pr.new/@executor-js/plugin-file-secrets@1258

@executor-js/plugin-graphql

npm i https://pkg.pr.new/@executor-js/plugin-graphql@1258

@executor-js/plugin-keychain

npm i https://pkg.pr.new/@executor-js/plugin-keychain@1258

@executor-js/plugin-mcp

npm i https://pkg.pr.new/@executor-js/plugin-mcp@1258

@executor-js/plugin-onepassword

npm i https://pkg.pr.new/@executor-js/plugin-onepassword@1258

@executor-js/plugin-openapi

npm i https://pkg.pr.new/@executor-js/plugin-openapi@1258

executor

npm i https://pkg.pr.new/executor@1258

commit: 6e7bd93

A full-suite run against one long-lived cloud dev server degrades partway
through: sign-in starts refusing connections and everything after fails
with fetch errors (the same SSE/OTel memory growth being instrumented on
main). Four shards, each booting its own stack, stay under the threshold.
Re-merge into one job once the leak is fixed.
Four shards still hit the dev-server degradation a few minutes in on
2-core runners; eight keeps each stack's lifetime under the threshold.
The remaining shard failures are scattered single-test Playwright
waitFor timeouts on 2-core runners, not systemic stack death; vitest
--retry clears them without hiding real regressions (a consistent
failure still fails after 3 attempts).
Compiling the Graph spec inside dev workerd 500s on 2-core GitHub
runners and takes the dev stack down for every scenario after it in the
shard (the auth-hint/org-slug/docs-link failures in the same shard were
all downstream of this). Local runs are unaffected; skip only under CI.
plugins() runs per request; loadConfig() does filesystem work (data
dir, secret key resolution) that should not ride the request path. The
env read is the same computation loadConfig makes for the flag.
0.10.0 ships the Autumn balances.feature expansion autumn-js asserts
(UsefulSoftwareCo/emulate#8), so the org page renders again and the
scenario passes.
@RhysSullivan RhysSullivan merged commit 76fcb1c into main Jul 2, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant