fix(adjudicator): refute exploitable verdicts with no evidence anchor + clarify runtime/secret evidence in the prompt by thejefflarson · Pull Request #130 · thejefflarson/protector

thejefflarson · 2026-06-29T22:45:11Z

Why

An internet-facing watcher-server Pod came back exploitable with reason "connects to exposed secrets which are mounted into the pod…" — a false breach. Its evidence: CVEs (none), no exposed secret baked into the image, runtime behavior = three benign NetworkConnections to its own DB/metrics; all objectives [MOUNTED] (own creds) or [NETWORK] [same-ns] (own DB). Correct verdict: refuted. The 1B judge fabricated evidence by (a) treating benign network connections as a live signal and (b) conflating reaching a secret/… objective with an exposed secret in the image.

There was already a guard_fabricated_cve backstop; this adds the symmetric zero-anchor one for unsupported exploitable.

The guard

guard_unsupported_exploitable (in guards.rs, mirroring guard_fabricated_cve's shape via the shared guard_exploitable gate) downgrades an Exploitable verdict to Refuted ONLY when ALL THREE exploitation anchors are absent:

the CVE evidence list is empty (no CVE shown to the model), AND
there is no exposed-secret finding for the entry, AND
no observed behavior is corroborating.

"Corroborating runtime behavior" reuses the engine's existing definition — Behavior::is_alert() (a critical Falco alert) OR exec_class::notable_exec(&behavior).is_some() (a notable shell/pkg-manager exec, JEF-117). Benign NetworkConnection/FileRead/LibraryLoaded/SecretRead are not corroborating and never anchor an exploitable.

Conservative by design: if any anchor is present — a CVE in the list (even reachability:not-observed), an exposed secret, or a corroborating behavior — the model's (debatable) call stands untouched. This is purely the zero-anchor safety net. Like the fabrication guard it only ever acts on Exploitable; the entry is re-judged next pass.

Exposed-secret presence is read from the same source the prompt uses — entry_findings(graph, entry) returns (secret_lines, posture_lines); a non-empty secret_lines means a usable credential is baked into the image (posture/RBAC is not an anchor). Wired in model_call.rs chained after guard_fabricated_cve.

Prompt clarifications

Two surgical additions (existing structure/wording preserved):

Runtime-behavior bullet: a workload's OWN observed activity (outbound network connections, file reads, library loads, reading its own mounted secrets) is normal behavior and NOT a live signal — only an ALERT or hands-on-keyboard action counts.
Secrets bullet: reaching a secret/… objective (a Credential-Access OUTCOME in the reachable-objectives list) is NOT an exposed secret baked into the image — only a credential in the "Exposed secrets baked into this image" field is exploitation evidence.

Fingerprint shift: changing the prompt string deterministically shifts the verdict-cache fingerprint inputs at the prompt level, so entries re-judge once. Expected. No code-level snapshot pins the prompt text; the only test affected was the prompt-size bound (raised from 4,000 to 5,000 to account for the larger static template — the assertion still proves the untrusted-payload cap, since a megabyte title would blow past it by orders of magnitude).

Tests

guard fires: Exploitable + empty CVEs + no exposed secret + only benign behaviors (the watcher case + misc benign) → Refuted.
guard preserves the verdict in each anchored case: a CVE present, an exposed-secret finding present, a corroborating alert, and a notable exec.
guard leaves non-Exploitable verdicts (Refuted/Confirmed/Uncertain) untouched.
two prompt-content assertions for the clarifications.

All existing adjudicate tests kept green.

Gates (from `engine/`)

cargo fmt · cargo build · cargo clippy --all-targets (clean, warnings = errors) · cargo test — 353 passed, 0 failed, 1 ignored (the e2e test needing PROTECTOR_E2E_MODEL). File-size guard green.

Closes JEF-watcher-false-breach.

🤖 Generated with Claude Code

… + clarify runtime/secret evidence in the prompt An internet-facing watcher-server Pod came back `exploitable` ("connects to exposed secrets which are mounted into the pod…") — a false breach. Its evidence: CVEs (none), no exposed secret baked into the image, runtime = three benign NetworkConnections to its own DB/metrics. The 1B judge fabricated evidence by treating benign connections as a live signal and conflating reaching a secret/… objective with an exposed secret in the image. Correct verdict: refuted. Add the symmetric backstop to guard_fabricated_cve: guard_unsupported_exploitable downgrades an Exploitable verdict to Refuted ONLY when ALL THREE exploitation anchors are absent — empty CVE list, no exposed-secret finding, and no corroborating runtime behavior (Behavior::is_alert() or exec_class::notable_exec, the engine's existing definition; benign Network/File/Library/SecretRead are NOT corroborating). Any anchor present leaves the model's call untouched. Wired after guard_fabricated_cve in model_call; exposed-secret presence read from the same entry_findings source the prompt uses. Also two surgical prompt clarifications: a workload's own activity (network connections, file reads, library loads, reading its own mounted secrets) is NOT a live signal — only an ALERT or hands-on-keyboard action is; and reaching a secret/… objective is NOT an exposed secret baked into the image. This shifts the verdict fingerprint, so entries re-judge once. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01VtjoJttCvBY4dzCoE4f9vP

thejefflarson merged commit f8dd4b4 into main Jun 30, 2026
4 of 5 checks passed

thejefflarson deleted the fix/guard-unsupported-exploitable branch June 30, 2026 03:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(adjudicator): refute exploitable verdicts with no evidence anchor + clarify runtime/secret evidence in the prompt#130

fix(adjudicator): refute exploitable verdicts with no evidence anchor + clarify runtime/secret evidence in the prompt#130
thejefflarson merged 1 commit into
mainfrom
fix/guard-unsupported-exploitable

thejefflarson commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thejefflarson commented Jun 29, 2026

Why

The guard

Prompt clarifications

Tests

Gates (from engine/)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Gates (from `engine/`)