docs(rfc): add Policy subsystem (RFC 0005) by dvavili · Pull Request #1728 · NVIDIA/OpenShell

dvavili · 2026-06-03T21:12:06Z

Summary

Proposes RFC 0005 — Policy Subsystem: promote policy to a first-class gateway subsystem that delegates where policy comes from to a driver, mirroring the subsystem-and-driver model RFC 0001 defines for the gateway (and implements today for compute).

builtin driver (the default) — today's in-process, store-backed policy path, unchanged.
Third-party driver — implements a PolicyDriver gRPC contract and runs as an operator-managed process the gateway connects to over a UDS. The gateway does not launch or supervise it.

The change is additive and opt-in per deployment. Enforcement stays the gateway's: projections are verified against a trust store (authentic), admitted all-or-nothing (complete), and gated against mutation (unaltered). A driver's internals — packaging, policy sourcing, remote backends, trust establishment — are out of scope; OpenShell consumes only the projected SandboxPolicy.

Scoping issue: #1713. Related: RFC 0001, RFC 0002, #1703.

DCO: the commit is signed off.

Proposes a Policy subsystem on the gateway that delegates where policy comes from to a driver: a first-party builtin driver (the default, in-process store-backed path) or a third-party driver over a PolicyDriver gRPC contract. Third-party drivers follow the out-of-tree model — operator-run, with the gateway connecting to a provided UDS. The change is additive and opt-in per deployment. Signed-off-by: Divya Vavili <dvavili@nvidia.com>

copy-pr-bot · 2026-06-03T21:12:10Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-06-03T21:12:22Z

All contributors have signed the DCO ✍️ ✅
_{Posted by the DCO Assistant Lite bot.}

dvavili · 2026-06-03T21:12:34Z

I have read the DCO document and I hereby sign the DCO.

dvavili · 2026-06-03T21:12:50Z

recheck

dvavili · 2026-06-03T21:35:02Z

I have read the Contributor Agreement including DCO and I hereby sign the Contributor Agreement and DCO

drew · 2026-06-08T18:04:55Z

+
+### The `PolicyDriver` contract
+
+A third-party driver implements one gRPC service — `PolicyDriver` (versioned `openshell.policy.driver.v1`), the same kind of contract compute, credentials, and identity drivers already define ([RFC 0001](https://github.com/NVIDIA/OpenShell/blob/8bf667f377d567e4c7638db8ca70ce13ecdeb0da/rfc/0001-core-architecture/README.md)). Four RPCs:


If I understand this RFC correctly, policies bodies are no longer stored on the gateway. Is that right?

If so, is there a way to keep the policy itself stored on the gateway, but then delegate signing and attestation infrastructure to some PolicySigningDriver/PolicyGovernanceDriver instead?

I think that is going to be a cleaner boundary for the driver interface and better match how our other drivers work.

The comment in the earlier thread here should apply for this case as well. The hook mechanism should be agnostic to the requirements of the enterprise and clean separation as discussed.

johntmyers · 2026-06-08T18:26:23Z

+Introduce a **Policy subsystem** on the gateway. Like compute, credentials, and identity ([RFC 0001](https://github.com/NVIDIA/OpenShell/blob/8bf667f377d567e4c7638db8ca70ce13ecdeb0da/rfc/0001-core-architecture/README.md)), the subsystem owns the policy semantics and delegates *where policy comes from* to a **driver**:
+
+- A **first-party `builtin` driver** *(the default, used when no driver socket is configured)* — today's store-backed path, unchanged: ships with the gateway and is satisfied in-process.
+- A **third-party driver** implements the `PolicyDriver` gRPC contract and runs as an **operator-managed process**; the gateway connects to a UDS the operator provides. The gateway does not launch or supervise it — the same out-of-tree model proposed for compute drivers in [#1703](https://github.com/NVIDIA/OpenShell/pull/1703) (the `--compute-driver-socket` flag). How the driver is packaged, where it sources policy (including fronting a remote backend), and how its socket is secured are all the operator's, invisible to the gateway.


Policy (and by extension, provider v2 profiles) objects in our database are constructed in a very specific way for sandbox operations. The gateway does JIT construction of a sandbox's effective running policy, we have our policy prover also that needs JIT access to policy data as well.

I would lean towards ensuring the OpenShell db is still the system of record for all sandbox operations.

Other patterns to explore:

Could the third party service replicate approved objects into OpenShell so all operations still use the OpenShell DB?

Is there a more narrow hook we can put into OpenShell specific around signature attestation? So you could still use existing OpenShell APIs to import policies and provider profiles but the Gateway (or Driver) can do the signature verification.

Now I'm wondering if Gateway configuration would also need to be something that potentially needs to governed, especially with certain knobs for volume mounts. Thoughts @drew?

Definitely. I think we'll need to enforce specific gateway settings from some sort of enterprise/it managed authority.

Is there a more narrow hook we can put into OpenShell specific around signature attestation? So you could still use existing OpenShell APIs to import policies and provider profiles but the Gateway (or Driver) can do the signature verification.

We are narrowing in on this design to have a hook to GetPolicy from any Openshell component that requires it. Note that policy signature verification is just one use-case that this model unlocks. We should keep the requirements of the external Policy Provider service out of gateway. So, instead of thinking it as a hook to sign/attest the policy - it's a general adapter that fetches an enterprise policy for the sandbox. What the adapter does - policy-sign-verification, support multi-tenancy, and so on - can then be catered to the requirements of the enterprise itself and not as an opinionated flow in Openshell.

Regarding the Gateway configuration being something that needs to governed - it's a good point. I'd imagine that this is something that an out-of-band operator responsible for the deployment can leverage to pull gateway's config from a governed source. We could certainly use a similar hook mechanism to fetch the config from a governed location.

@dvavili, I'd like to clarify this

So, instead of thinking it as a hook to sign/attest the policy - it's a general adapter that fetches an enterprise policy for the sandbox. What the adapter does - policy-sign-verification, support multi-tenancy, and so on - can then be catered to the requirements of the enterprise itself and not as an opinionated flow in Openshell.

In our conversation earlier today, you said that OpenShell itself would have to verify the policy signature using the signing_key_id. Are you saying that's not the case, and the general adapter would do this instead?

drew · 2026-06-08T20:14:24Z

+Policy in OpenShell is store-backed and gateway-owned: user-authored, validated, persisted, and composed inside the gateway. That works when one party owns both OpenShell and policy. Some policy ecosystems do not fit that shape — **enterprise deployment models in particular require strict attestation and independent auditability**:
+
+- Policy is authored and signed by a central authority in a separate trust domain.
+- What a sandbox enforces must be tamper-evident even against a compromised gateway.


Can you expand on what these threat vectors look like. Some examples would be good. How is a gateway compromised, and what can an attacker do once the gateway is owned? What does a policy provider outside the gateway trust domain protect against.

dvavili requested review from a team, derekwaynecarr, maxamillion and mrunalp as code owners June 3, 2026 21:12

pimlock added the rfc label Jun 5, 2026

johntmyers reviewed Jun 8, 2026

View reviewed changes

Comment thread rfc/0005-policy-subsystem/README.md

drew reviewed Jun 8, 2026

View reviewed changes

johntmyers reviewed Jun 8, 2026

View reviewed changes

drew reviewed Jun 8, 2026

View reviewed changes


		### The `PolicyDriver` contract

		A third-party driver implements one gRPC service — `PolicyDriver` (versioned `openshell.policy.driver.v1`), the same kind of contract compute, credentials, and identity drivers already define ([RFC 0001](https://github.com/NVIDIA/OpenShell/blob/8bf667f377d567e4c7638db8ca70ce13ecdeb0da/rfc/0001-core-architecture/README.md)). Four RPCs:

Conversation

dvavili commented Jun 3, 2026

Summary

Uh oh!

copy-pr-bot Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dvavili commented Jun 3, 2026

Uh oh!

dvavili commented Jun 3, 2026

Uh oh!

dvavili commented Jun 3, 2026

Uh oh!

Uh oh!

drew Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dvavili Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

johntmyers Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

johntmyers Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

drew Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

dvavili Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

dvavili Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

drew Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drew Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions Bot commented Jun 3, 2026 •

edited

Loading

drew Jun 8, 2026 •

edited

Loading

drew Jun 10, 2026 •

edited

Loading