Skip to content

Robustness + review fixes for nightly bench sysroot#677

Merged
MauroToscano merged 2 commits into
fix/nightly-bench-sysrootfrom
fix/sysroot-download-robustness
Jun 18, 2026
Merged

Robustness + review fixes for nightly bench sysroot#677
MauroToscano merged 2 commits into
fix/nightly-bench-sysrootfrom
fix/sysroot-download-robustness

Conversation

@MauroToscano

@MauroToscano MauroToscano commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Follow-up review fixes layered on top of #675 (targets fix/nightly-bench-sysroot, not main). Found via a multi-agent review of #675; the headline bug was reproduced against the real recipe.

Fixed

1. Failed download was not fatal — could silently extract a stale sysroot (Medium, reproduced)

prepare-sysroot's recipe had no set -e, and the curl -fL … -o "$(SYSROOT_TARBALL)" line ended with ; instead of being chained into the extraction. So when curl failed, control fell through to the tar step.

  • In the common case tar chokes on the missing/partial tarball and the || { … exit 1; } handler aborts — but with a misleading "Extracting…" message.
  • The bad case (reproduced): SYSROOT_TARBALL is a fixed /tmp path. If a tarball is already there (left by an interrupted prior run, since the success-path rm never executed), a failed curl falls through, tar extracts the stale tarball, and make exits 0. The next run then sees include/stdlib.h and treats the sysroot as complete.

Repro before the fix (connection-failing URL + pre-existing tarball): make returned 0 and provisioned a stale sysroot. After the fix: make aborts (exit 2), no sysroot is created, and the stale tarball is removed.

Fix: add set -e, rm -f "$(SYSROOT_TARBALL)" before downloading (kills the stale-tarball case and the predictable-path reuse), and make the download explicitly fatal (curl … || { rm -f …; exit 1; }). Same fix applied to prepare-test-data, which had the identical download-fall-through pattern.

2. Download/extraction hardening (defense-in-depth)

  • curl --proto '=https'-L follows redirects; this prevents a redirect from downgrading to plain http.
  • tar … --no-same-owner — relevant in the sudo (root) extraction branch so archive-specified ownership isn't restored.

Note: this does not add tarball checksum verification. That's the larger pre-existing supply-chain gap (an unauthenticated tarball is used as the compiler --sysroot and linked into guest ELFs), but pinning a SHA-256 needs a maintainer-blessed known-good hash and would break CI on legitimate tarball updates, so it's left as a tracked follow-up.

3. Comment / doc fixes

  • bench-vs-nightly.yml: the comment claimed the sysroot "is cached in $HOME across nightly runs," but there is no actions/cache step — it persists only because the self-hosted runner's filesystem is persistent. Reworded so it doesn't imply a cache key/restore mechanism.
  • Removed the now-stale # Note: the tarball rm above only runs on success… comment (no longer accurate after adding the pre-download rm -f and switching the final rm to rm -f).
  • rmrm -f for the success-path tarball cleanup (consistent with the error handlers; won't error if already gone).
  • ASCII - instead of an em-dash in the ::error:: output (greppability / repo consistency).
  • README: lead with the recommended user-writable (no-sudo) SYSROOT_DIR=$HOME/... invocation instead of the /opt + sudo one.

Verified

  • Failure aborts: failed curl + pre-existing tarball → make exits non-zero, no sysroot created, stale tarball removed (was exit 0 before).
  • Happy path: real HTTPS provision against the actual SYSROOT_URL → exit 0, include/stdlib.h + lib/ present, tarball cleaned up.
  • Idempotent: re-run on a complete sysroot prints "already exists" and performs no download.

Considered but intentionally not changed

  • Concurrency / locking around rm -rf + extract: safe within a single make -j (Make builds the shared prepare-sysroot prereq once); only at risk if two separate make processes share one SYSROOT_DIR, which the current CI does not do. Not worth the complexity.
  • Stronger partial-sysroot guard (e.g. probing a specific lib/*.a): skipped to avoid hard-coding a library filename that could be wrong and cause endless re-provisioning. The include/stdlib.h check already fixes the reported failure.
  • infra/provision.sh provisions a complete /opt sysroot, which appears to contradict the Fix/nightly bench sysroot #675 root-cause note ("never fully provisioned"). Left for the author to reconcile — changing the provisioner needs knowledge of how the bench runner is actually set up.

Follow-up review fixes for the nightly-bench-sysroot change.

- prepare-sysroot/prepare-test-data: a failed `curl` was `;`-separated
  with no `set -e`, so it fell through to `tar` and could extract a
  stale tarball from the fixed /tmp path and exit 0 (reproduced). Add
  `set -e`, remove any pre-existing tarball before downloading, and make
  the download explicitly fatal. Apply the same hardening to
  prepare-test-data (same download-fall-through class).
- Hardening: `curl --proto '=https'` (no http downgrade on redirect) and
  `tar --no-same-owner` on extraction (matters in the sudo/root branch).
- `rm` -> `rm -f` for the success-path tarball cleanup; drop the now-stale
  "rm only runs on success" note.
- Workflow comment: the sysroot is not `actions/cache`-cached; it persists
  because the runner filesystem is persistent. Reword to avoid implying a
  cache mechanism.
- Use ASCII `-` instead of an em-dash in the `::error::` output.
- README: lead with the recommended user-writable (no-sudo) sysroot path.
@github-actions

Copy link
Copy Markdown

Codex Code Review

No issues found in the PR diff.

The changes are limited to Makefile provisioning hardening, CI text/comment updates, and README command ordering. I didn’t see introduced security vulnerabilities, behavioral bugs, or significant performance concerns.

@claude

claude Bot commented Jun 18, 2026

Copy link
Copy Markdown

Code Review

Overall: Correct fix for the stale-tarball silent-success bug. Logic is sound, hardening additions are appropriate.

What's good

  • Core fix (rm -f "$(SYSROOT_TARBALL)" before download + set -e + explicit || { … exit 1; }) correctly eliminates the reported silent-success path. Reproduced and verified per the PR description.
  • --proto '=https' — prevents a redirect from downgrading to plain HTTP. Correct placement on both curl calls.
  • --no-same-owner — critical in the sudo branch where root can restore archive ownership; harmless in the user branch. Right call.
  • rm -f on the success-path tarball cleanup is cleaner than bare rm (won't fail if already gone).
  • README reorder — leading with the no-sudo invocation is the right default.
  • Comment fixes are accurate.

Findings

Low — no tarball integrity check (pre-existing, acknowledged in PR body)

SYSROOT_TARBALL is fetched over HTTPS from a fixed URL and extracted directly as the compiler --sysroot for guest ELFs. --proto '=https' prevents protocol downgrade but does not verify the tarball content. A compromised server or CDN edge node could serve a malicious tarball that gets compiled into guest programs. The PR explicitly defers a SHA-256 pin; that deferral is reasonable, but the gap is worth calling out in a tracking issue if one doesn't already exist.

Informational — set -e + explicit || { … exit 1; } is redundant for exit, not for cleanup

set -e alone would cause the recipe to fail on curl error, but it wouldn't clean up the partial file. The explicit handlers are therefore still necessary for cleanup. Keeping both is correct and idiomatic — no change needed.

Informational — /tmp/lambda-vm-sysroot-rv64im.tar.gz is a global path

Two concurrent make prepare-sysroot invocations on the same host race on this path. The PR documents this as acceptable (CI doesn't run concurrent jobs sharing one SYSROOT_DIR), which is fine. Worth keeping in mind if the CI topology changes.

No blocking issues. The fix is correct and the hardening is proportionate.

Comment thread Makefile
Addresses review feedback on #677: document that the fixed /tmp tarball
path assumes one writer at a time, so a future CI topology change that
shares a runner across concurrent jobs doesn't reintroduce a race silently.
@MauroToscano

Copy link
Copy Markdown
Contributor Author

Thanks for the review. Addressing the three points:

  • Global /tmp tarball path / concurrency — added a comment at the SYSROOT_TARBALL definition documenting the single-writer assumption and pointing at mktemp/flock if the CI topology ever shares a runner across concurrent jobs (046481f). Kept it to a comment rather than locking, since the current CI doesn't run concurrent jobs sharing a SYSROOT_DIR.
  • No tarball integrity check — agreed it's the real remaining gap; leaving it as a tracked follow-up (SHA-256 pin needs a maintainer-blessed known-good hash and would otherwise break CI on legitimate tarball updates). I'll open a tracking issue.
  • set -e + explicit || { … exit 1; } — kept both intentionally: set -e handles the abort, the explicit handler does the partial-file cleanup. No change, as you noted.

@MauroToscano MauroToscano merged commit 50162a5 into fix/nightly-bench-sysroot Jun 18, 2026
11 checks passed
@MauroToscano MauroToscano deleted the fix/sysroot-download-robustness branch June 18, 2026 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants