FE-872: Install-failure classification — infra-vs-test split#218
FE-872: Install-failure classification — infra-vs-test split#218kostandinang wants to merge 6 commits into
Conversation
a87e942 to
ff041c6
Compare
c949e53 to
a1afc98
Compare
2cec280 to
fecc5c6
Compare
PR SummaryMedium Risk Overview Failed runs now carry
Greenfield promotion gains a test that Reviewed by Cursor Bugbot for commit c902500. Bugbot is set up for automated code reviews on this repo. Configure here. |
TestResult gains a failureKind?: 'infra' | 'test' discriminant so a broken toolchain (missing runner binary / deps never installed) is no longer indistinguishable from a logic failure that should send the code-writer to fix the code. ToolchainTestRunner.run classifies a failed run via classifyTestFailure, deliberately conservative: only an unambiguous "the runner itself isn't there" signal (spawn ENOENT, or a shell command-not-found) is infra; everything else is test, because a missing module is ambiguous with a legitimate TDD red and mislabeling a real failure as infra would silently skip it. The tests-run net report surfaces an aggregate failureKind (infra dominates) so consumers don't rescan results. Amp-Thread-ID: https://ampcode.com/threads/T-019ecb9a-9a08-733b-833d-76885fc8243a Co-authored-by: Amp <amp@ampcode.com>
When a slice exhausts its retry budget because the tests never executed (toolchain missing / deps not installed), the halt reason now reads "toolchain/install failure" instead of the misdirecting "retry exhaustion", using the failureKind classified in slice 1. Deliberately not a bespoke re-install net arc: the loop already loops back and the agent re-installs natively via bash on its next turn, so the harness only owns the honest terminal cause. Completes acceptance 1 (classify + react). Amp-Thread-ID: https://ampcode.com/threads/T-019ecb9a-9a08-733b-833d-76885fc8243a Co-authored-by: Amp <amp@ampcode.com>
…ice 3) promoteGreenfieldRun's blanket copy already lands the manifest + lockfile the cook agent produced; this turns that incidental behavior into an asserted promotion invariant (git ls-files), guaranteeing a reproducible promoted tree. Closes acceptance 2 for greenfield; brownfield dep-delta capture stays blocked on the brownfield-promotion frontier. Amp-Thread-ID: https://ampcode.com/threads/T-019ecb9a-9a08-733b-833d-76885fc8243a Co-authored-by: Amp <amp@ampcode.com>
Ran a real brownfield cook (hand-authored 2-slice plan + node:http app, throwaway repo) to de-risk app-runtime-probe / integration-oracle before building them. Verdict: the chain works end-to-end (CoW worktree, clean-tree gate, per-slice->__epic__ merge composed the wiring, TDD red/green, working branch untouched). The agent wired the feature reachable and self-authored a genuine boot-and-probe integration test — the orphan did not reproduce. But reachability was agent-discretion, not enforced, confirming the value of an independent integration-oracle. Two refinements: app-runtime-probe should own the boot mechanism (the agent had to invent a .js->.ts resolve hook); dep-install was unexercised (zero-dep app). Bonus: the 'Cannot find module' TDD red was handled as a test-red, not infra — validates FE-872 slice 1 live. Marks dogfood-spike done; folds findings into app-runtime-probe and integration-oracle; updates SPEC A98 to partially-validated. Amp-Thread-ID: https://ampcode.com/threads/T-019ecb9a-9a08-733b-833d-76885fc8243a Co-authored-by: Amp <amp@ampcode.com>
Collapse three diverged test-execution paths onto a single TestRunner seam and one runVerification verdict helper (Design C). evaluate-done and verify-epic previously used a private spawn-based runTest that returned a bare boolean, so FE-872's infra-vs-test failureKind was only visible on the net run-tests path. - add VerificationOutcome/VerificationResult to types - add runVerification (test-runner.ts): the one place the >=1-and-all-pass verdict rule and the infra-dominates aggregate live; a throwing runner is an infra failure - delete pi-actions.runTest + evaluateVerificationTargets; thread TestRunner (and an injectable session factory for tests) through createPiActions - net-compiler run-tests now calls runVerification, dropping its inline loop + aggregate - evaluate-done and verify-epic now surface failureKind in their reports Amp-Thread-ID: https://ampcode.com/threads/T-019ecb9a-9a08-733b-833d-76885fc8243a Co-authored-by: Amp <amp@ampcode.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
4ef65e5 to
c902500
Compare
f60b7ea to
625b1b5
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Want reviews to match your repository better? Bugbot Learning can learn team-specific rules from PR activity. A team admin can enable Learning in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit c902500. Configure here.
| const haltReason = | ||
| failureKind === 'infra' | ||
| ? `Slice ${sliceId} toolchain/install failure (tests never ran in ${maxRetries + 1} attempts)` | ||
| : `Slice ${sliceId} retry exhaustion`; |
There was a problem hiding this comment.
Halt reason overclaims never ran
Medium Severity
When retries are exhausted, an aggregate failureKind of infra always produces a halt reason stating tests never ran in all attempts. That aggregate reflects only the latest runVerification outcome and treats any infra signal (including a single target or a thrown runner) as toolchain failure, so earlier attempts or sibling targets that did execute tests still get a “never ran” terminal message.
Reviewed by Cursor Bugbot for commit c902500. Configure here.



Stacks on FE-871. Third Arc-1 frontier -- the FE-843-deferred fail/infra test-outcome split. No install verb: the install action stays agent-native (
bash+ FE-843testConventions, A98).What?
Make a broken toolchain distinguishable from a failing test, report it honestly, pin the promoted tree as reproducible, and make that classification visible everywhere tests run.
TestResult.failureKind,classifyTestFailure): a failed run carries'infra' | 'test'. Classified conservatively -- only an unambiguous "runner isn't there" signal isinfra(spawnENOENT, shellcommand not found). Everything else, incl. a missing module, staystest(ambiguous with a legitimate TDD red -- mislabeling would silently skip a real failure). Thetests-runreport surfaces an aggregatefailureKind.net-compiler.ts): when a slice exhausts retries because tests never ran, the halt readstoolchain/install failure (tests never ran in N attempts)instead of the misdirectingretry exhaustion.promote-run.test.ts): the agent's manifest + lockfile are pinned as a promotion invariant viagit ls-files-- turningpromoteGreenfieldRun's incidental copy into an asserted reproducible-tree guarantee.runVerification): collapse the three diverged test-execution paths onto oneTestRunner+ one verdict helper, sofailureKindis visible at every site, not just the net path.Why?
TestResultwas a single{ passed }boolean, so a failednpm installlooked identical to a logic bug -- sending the code-writer to "fix the code" while the toolchain never installed, then halting with a cause that named the wrong thing. And without a pinned capture invariant, a promoted tree could silently drop the lockfile.app-runtime-probe/integration-oracledepend on both: infra separated from test, and deps reproducible in the promoted tree.Slice 4 closes the gap that slices 1-2 left: classification was only half-wired.
evaluate-doneandverify-epicran tests through a private, spawn-basedrunTestthat returned a bare boolean, so the newfailureKindnever reached them -- only the netrun-testspath saw it. The duplicate execution path had also drifted (spawnvsspawnSync). Now there is one execution seam (TestRunner.run) and one verification seam (runVerification) owning the>=1-and-all-passverdict rule and the infra-dominates aggregate; a runner that throws is treated asinfra.pi-actions.runTestandevaluateVerificationTargetsare deleted.Scope discipline
Not a bespoke re-install arc -- the loop already retries and the agent re-installs via
bashon its next turn. The harness owns only what bash install can't give: the classification, the honest terminal cause, the capture invariant, and one path that carries them.Deferred
Brownfield dep-delta capture over the CoW baseline is blocked on
brownfield-promotion(no brownfield promote path yet).Co-authored-by: Amp amp@ampcode.com