Warm-pooled sandboxes: RFC 0005 + install agent-sandbox extensions#1813
Open
rmalani-nv wants to merge 3 commits into
Open
Warm-pooled sandboxes: RFC 0005 + install agent-sandbox extensions#1813rmalani-nv wants to merge 3 commits into
rmalani-nv wants to merge 3 commits into
Conversation
Propose adopting the upstream agent-sandbox warm-pool extension CRDs (SandboxTemplate / SandboxWarmPool / SandboxClaim, extensions.agents.x-k8s.io/v1alpha1) on the Kubernetes driver to hand out pre-warmed sandbox pods in ~milliseconds instead of cold-starting a Sandbox CR per request. Documents the claim-based create flow, what bakes into the shared template vs. late-binds over the supervisor relay, the one security-sensitive change (re-anchoring sandbox identity to the gateway-created SandboxClaim in auth/k8s_sa.rs), risks, alternatives, and a phased rollout. Drafted from a local spike validated against agent-sandbox v0.4.6. Signed-off-by: Roshni Malani <rmalani@nvidia.com>
…e2e clusters Apply extensions.yaml alongside manifest.yaml when bootstrapping the local k3d dev cluster and the e2e kube harness, reusing the pinned AGENT_SANDBOX_VERSION already used for core. This installs the SandboxTemplate / SandboxWarmPool / SandboxClaim CRDs and reconfigures the existing agent-sandbox-controller, so clusters are ready for the warm-pooled sandbox path (RFC 0005). extensions.yaml rolls the controller deployment, so the e2e harness waits for the rollout after both applies and for the new extension CRDs to be Established. No gateway behavior changes yet. Signed-off-by: Roshni Malani <rmalani@nvidia.com>
The local k3d bootstrap now also applies the agent-sandbox warm-pool extensions; reflect that in the helm-dev-environment skill description. Signed-off-by: Roshni Malani <rmalani@nvidia.com>
ba13a44 to
9dd7e1a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Groundwork for warm-pooled sandboxes on the Kubernetes compute driver: adds the design as RFC 0005 and installs the upstream agent-sandbox warm-pool extension CRDs (
SandboxTemplate/SandboxWarmPool/SandboxClaim) into the local k3d dev cluster and the e2e kube harness. No gateway runtime behavior changes yet — this prepares the clusters and records the plan for the follow-up driver work.Installing the extensions before the gateway consumes them is intentional: it keeps the dev and e2e clusters ready for the phase-2 driver work, completes the existing
AGENT_SANDBOX_VERSION"pinned for … extensions" intent already noted in those scripts, and is behavior-preserving — the extensions only add three CRDs and re-roll the sharedagent-sandbox-controller. The install path was validated on a live k3s cluster (idempotentapply, all three CRDs Established, controller rolled out, and the cold-path sandbox lifecycle still works).Related Issue
N/A — the design is captured in RFC 0005 in this PR. A spike/build issue can follow per the
create-spike→build-from-issueworkflow.Changes
rfc/0005-warm-pooled-sandboxes/README.md): propose claiming pre-warmed pods via the agent-sandbox extension CRDs (extensions.agents.x-k8s.io/v1alpha1). Documents the claim-based create flow, what bakes into the sharedSandboxTemplatevs. late-binds over the supervisor relay, the one security-sensitive change (re-anchoring sandbox identity to the gateway-createdSandboxClaiminauth/k8s_sa.rs), risks, alternatives, and a phased rollout.tasks/scripts/helm-k3s-local.sh,e2e/with-kube-gateway.sh): applyextensions.yamlalongsidemanifest.yaml, reusing the already-pinnedAGENT_SANDBOX_VERSION(v0.4.6). The e2e harness waits for the three new extension CRDs to be Established and for the (re-rolled)agent-sandbox-controller..agents/skills/helm-dev-environment/SKILL.md): note that the dev bootstrap now installs the warm-pool extensions.Three stacked commits: RFC → extension install → skill doc.
Testing
Validated end-to-end on a local k3s (k3d) cluster:
Installed agent-sandbox core + warm-pool extensions (
v0.4.6) and drove a realSandboxTemplate → SandboxWarmPool → SandboxClaimcycle: the claim bound a warm pod in ~0.13s, the claim-injectedopenshell.io/sandbox-idannotation landed on the pod, and the pool self-replenished.Deployed OpenShell via Skaffold and confirmed the cold-path baseline still works:
sandbox create→Ready,IssueSandboxTokenTokenReview → minted gateway JWT, and anechoexecuted inside the sandbox over the supervisor relay.bash -npasses on both modified scripts.mise run pre-commitpasses — ran the relevant lint sub-tasks (license:check✓,markdown:lint✓) andbash -non the scripts ✓. Did not run the fullci(Rust compile/tests) locally because no Rust/Python sources changed; CI covers it.Unit tests added/updated — N/A (no code changes)
E2E tests added/updated — the e2e harness now installs the extensions; a warm-pool e2e assertion follows in the driver-path PR (RFC 0005, phase 2)
Checklist
architecture/("how it works today") is unchanged.