FE-872: Install-failure classification — infra-vs-test split by kostandinang · Pull Request #218 · hashintel/brunch

kostandinang · 2026-06-15T16:39:30Z

Stacks on FE-871. Third Arc-1 frontier -- the FE-843-deferred fail/infra test-outcome split. No install verb: the install action stays agent-native (bash + FE-843 testConventions, A98).

What?

Make a broken toolchain distinguishable from a failing test, report it honestly, pin the promoted tree as reproducible, and make that classification visible everywhere tests run.

Slice 1 -- classify (TestResult.failureKind, classifyTestFailure): a failed run carries 'infra' | 'test'. Classified conservatively -- only an unambiguous "runner isn't there" signal is infra (spawn ENOENT, shell command not found). Everything else, incl. a missing module, stays test (ambiguous with a legitimate TDD red -- mislabeling would silently skip a real failure). The tests-run report surfaces an aggregate failureKind.
Slice 2 -- react (net-compiler.ts): when a slice exhausts retries because tests never ran, the halt reads toolchain/install failure (tests never ran in N attempts) instead of the misdirecting retry exhaustion.
Slice 3 -- greenfield dep capture (promote-run.test.ts): the agent's manifest + lockfile are pinned as a promotion invariant via git ls-files -- turning promoteGreenfieldRun's incidental copy into an asserted reproducible-tree guarantee.
Slice 4 -- unify the seam (runVerification): collapse the three diverged test-execution paths onto one TestRunner + one verdict helper, so failureKind is visible at every site, not just the net path.

Why?

TestResult was a single { passed } boolean, so a failed npm install looked identical to a logic bug -- sending the code-writer to "fix the code" while the toolchain never installed, then halting with a cause that named the wrong thing. And without a pinned capture invariant, a promoted tree could silently drop the lockfile. app-runtime-probe / integration-oracle depend on both: infra separated from test, and deps reproducible in the promoted tree.

Slice 4 closes the gap that slices 1-2 left: classification was only half-wired. evaluate-done and verify-epic ran tests through a private, spawn-based runTest that returned a bare boolean, so the new failureKind never reached them -- only the net run-tests path saw it. The duplicate execution path had also drifted (spawn vs spawnSync). Now there is one execution seam (TestRunner.run) and one verification seam (runVerification) owning the >=1-and-all-pass verdict rule and the infra-dominates aggregate; a runner that throws is treated as infra. pi-actions.runTest and evaluateVerificationTargets are deleted.

Scope discipline

Not a bespoke re-install arc -- the loop already retries and the agent re-installs via bash on its next turn. The harness owns only what bash install can't give: the classification, the honest terminal cause, the capture invariant, and one path that carries them.

Deferred

Brownfield dep-delta capture over the CoW baseline is blocked on brownfield-promotion (no brownfield promote path yet).

Co-authored-by: Amp amp@ampcode.com

kostandinang · 2026-06-15T16:39:48Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

cursor · 2026-06-16T14:18:10Z

PR Summary

Medium Risk
Changes cook-loop halt semantics and shared test execution paths; behavior is heavily unit/integration tested and classification is intentionally conservative to avoid skipping real test failures.

Overview
FE-872 separates a broken toolchain from a failing test so the cook loop stops blaming “retry exhaustion” when tests never ran, and promotion can assert a reproducible dependency tree.

Failed runs now carry failureKind: 'infra' | 'test' via classifyTestFailure (missing runner / command not found → infra; missing modules and assertion failures stay test so TDD reds are not skipped). ToolchainTestRunner stamps that on failures; tests-run, eval-done, and epic-verified reports include an aggregate failureKind where infra dominates.

runVerification replaces the duplicated spawn-based paths in pi-actions (evaluateVerificationTargets / private runTest removed). The Petri net run-tests transition, evaluate-done, and verify-epic all use the same TestRunner injection (wired from cook-cli). When retries exhaust with failureKind === 'infra', the slice halt reason names toolchain/install failure (tests never ran…) instead of generic retry exhaustion.

Greenfield promotion gains a test that package.json and lockfile are tracked in git after promoteGreenfieldRun. memory/PLAN.md and SPEC.md record FE-872 progress, dogfood-spike outcomes, and partial validation of assumption A98.

^{Reviewed by Cursor Bugbot for commit c902500. Bugbot is set up for automated code reviews on this repo. Configure here.}

TestResult gains a failureKind?: 'infra' | 'test' discriminant so a broken toolchain (missing runner binary / deps never installed) is no longer indistinguishable from a logic failure that should send the code-writer to fix the code. ToolchainTestRunner.run classifies a failed run via classifyTestFailure, deliberately conservative: only an unambiguous "the runner itself isn't there" signal (spawn ENOENT, or a shell command-not-found) is infra; everything else is test, because a missing module is ambiguous with a legitimate TDD red and mislabeling a real failure as infra would silently skip it. The tests-run net report surfaces an aggregate failureKind (infra dominates) so consumers don't rescan results. Amp-Thread-ID: https://ampcode.com/threads/T-019ecb9a-9a08-733b-833d-76885fc8243a Co-authored-by: Amp <amp@ampcode.com>

When a slice exhausts its retry budget because the tests never executed (toolchain missing / deps not installed), the halt reason now reads "toolchain/install failure" instead of the misdirecting "retry exhaustion", using the failureKind classified in slice 1. Deliberately not a bespoke re-install net arc: the loop already loops back and the agent re-installs natively via bash on its next turn, so the harness only owns the honest terminal cause. Completes acceptance 1 (classify + react). Amp-Thread-ID: https://ampcode.com/threads/T-019ecb9a-9a08-733b-833d-76885fc8243a Co-authored-by: Amp <amp@ampcode.com>

…ice 3) promoteGreenfieldRun's blanket copy already lands the manifest + lockfile the cook agent produced; this turns that incidental behavior into an asserted promotion invariant (git ls-files), guaranteeing a reproducible promoted tree. Closes acceptance 2 for greenfield; brownfield dep-delta capture stays blocked on the brownfield-promotion frontier. Amp-Thread-ID: https://ampcode.com/threads/T-019ecb9a-9a08-733b-833d-76885fc8243a Co-authored-by: Amp <amp@ampcode.com>

Ran a real brownfield cook (hand-authored 2-slice plan + node:http app, throwaway repo) to de-risk app-runtime-probe / integration-oracle before building them. Verdict: the chain works end-to-end (CoW worktree, clean-tree gate, per-slice->__epic__ merge composed the wiring, TDD red/green, working branch untouched). The agent wired the feature reachable and self-authored a genuine boot-and-probe integration test — the orphan did not reproduce. But reachability was agent-discretion, not enforced, confirming the value of an independent integration-oracle. Two refinements: app-runtime-probe should own the boot mechanism (the agent had to invent a .js->.ts resolve hook); dep-install was unexercised (zero-dep app). Bonus: the 'Cannot find module' TDD red was handled as a test-red, not infra — validates FE-872 slice 1 live. Marks dogfood-spike done; folds findings into app-runtime-probe and integration-oracle; updates SPEC A98 to partially-validated. Amp-Thread-ID: https://ampcode.com/threads/T-019ecb9a-9a08-733b-833d-76885fc8243a Co-authored-by: Amp <amp@ampcode.com>

Collapse three diverged test-execution paths onto a single TestRunner seam and one runVerification verdict helper (Design C). evaluate-done and verify-epic previously used a private spawn-based runTest that returned a bare boolean, so FE-872's infra-vs-test failureKind was only visible on the net run-tests path. - add VerificationOutcome/VerificationResult to types - add runVerification (test-runner.ts): the one place the >=1-and-all-pass verdict rule and the infra-dominates aggregate live; a throwing runner is an infra failure - delete pi-actions.runTest + evaluateVerificationTargets; thread TestRunner (and an injectable session factory for tests) through createPiActions - net-compiler run-tests now calls runVerification, dropping its inline loop + aggregate - evaluate-done and verify-epic now surface failureKind in their reports Amp-Thread-ID: https://ampcode.com/threads/T-019ecb9a-9a08-733b-833d-76885fc8243a Co-authored-by: Amp <amp@ampcode.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

Want reviews to match your repository better? Bugbot Learning can learn team-specific rules from PR activity. A team admin can enable Learning in the Cursor dashboard.

^{Reviewed by Cursor Bugbot for commit c902500. Configure here.}

cursor · 2026-06-16T23:46:23Z

+              const haltReason =
+                failureKind === 'infra'
+                  ? `Slice ${sliceId} toolchain/install failure (tests never ran in ${maxRetries + 1} attempts)`
+                  : `Slice ${sliceId} retry exhaustion`;


Halt reason overclaims never ran

Medium Severity

When retries are exhausted, an aggregate failureKind of infra always produces a halt reason stating tests never ran in all attempts. That aggregate reflects only the latest runVerification outcome and treats any infra signal (including a single target or a thrown runner) as toolchain failure, so earlier attempts or sibling targets that did execute tests still get a “never ran” terminal message.

^{Reviewed by Cursor Bugbot for commit c902500. Configure here.}

This was referenced Jun 15, 2026

FE-843: Toolchain profile expansion — TS runtimes + live profile selection #198

Open

FE-864: Plan orchestrator brownfield enhancements #212

Open

FE-867: Agent extension host — dual-mode pi-harness contract #213

Open

kostandinang mentioned this pull request Jun 15, 2026

FE-871: Brunch toolchain detection — detect + plan-emitter wiring #214

Open

kostandinang changed the title ~~FE-872: classify test-run failures as infra vs test (slice 1)~~ FE-872: Install-failure classification — infra-vs-test split + honest halt reason Jun 15, 2026

kostandinang changed the title ~~FE-872: Install-failure classification — infra-vs-test split + honest halt reason~~ FE-872: Install-failure classification — infra-vs-test split, honest halt, unified runner seam Jun 16, 2026

kostandinang force-pushed the ka/fe-872-dep-install-classification branch from a87e942 to ff041c6 Compare June 16, 2026 10:00

This was referenced Jun 16, 2026

FE-875: App runtime probe — boot, probe, classify #219

Open

FE-876: Integration oracle — reachability gate + grounding seam #220

Open

kostandinang changed the title ~~FE-872: Install-failure classification — infra-vs-test split, honest halt, unified runner seam~~ FE-872: Install-failure classification — infra-vs-test split Jun 16, 2026

This was referenced Jun 16, 2026

FE-877: Brownfield promotion — commit the cook result onto cook/<runId> #221

Open

FE-878: Brunch serve — one-shot plan-then-cook capstone #222

Open

kostandinang force-pushed the ka/fe-871-brunch-detect branch from c949e53 to a1afc98 Compare June 16, 2026 12:53

kostandinang force-pushed the ka/fe-872-dep-install-classification branch 3 times, most recently from 2cec280 to fecc5c6 Compare June 16, 2026 13:38

kostandinang mentioned this pull request Jun 16, 2026

FE-879: Lazy per-slice cook worktrees and shared node_modules for brownfield #223

Open

kostandinang marked this pull request as ready for review June 16, 2026 14:18

kostandinang mentioned this pull request Jun 16, 2026

FE-864: Orchestrator improvements umbrella — brownfield feature delivery from spec #224

Open

cursor Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread src/orchestrator/src/test-runner.ts Outdated

kostandinang and others added 6 commits June 17, 2026 00:43

FE-872: classify only missing runner spawn errors as infra

c902500

Co-authored-by: Cursor <cursoragent@cursor.com>

kostandinang force-pushed the ka/fe-872-dep-install-classification branch from 4ef65e5 to c902500 Compare June 16, 2026 23:45

kostandinang force-pushed the ka/fe-871-brunch-detect branch from f60b7ea to 625b1b5 Compare June 16, 2026 23:45

cursor Bot reviewed Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FE-872: Install-failure classification — infra-vs-test split#218

FE-872: Install-failure classification — infra-vs-test split#218
kostandinang wants to merge 6 commits into
ka/fe-871-brunch-detectfrom
ka/fe-872-dep-install-classification

kostandinang commented Jun 15, 2026 •

edited

Loading

Uh oh!

kostandinang commented Jun 15, 2026 •

edited

Loading

Uh oh!

cursor Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kostandinang commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What?

Why?

Scope discipline

Deferred

Uh oh!

kostandinang commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 16, 2026

Choose a reason for hiding this comment

Halt reason overclaims never ran

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kostandinang commented Jun 15, 2026 •

edited

Loading

kostandinang commented Jun 15, 2026 •

edited

Loading

cursor Bot commented Jun 16, 2026 •

edited

Loading