feat(gpu): move device selection to driver config#1815
Conversation
14e335a to
5b6ab51
Compare
|
🌿 Preview your docs: https://nvidia-preview-pr-1815.docs.buildwithfern.com/openshell |
5b6ab51 to
67bc3f6
Compare
e988248 to
7c7fc4d
Compare
7c7fc4d to
63e4671
Compare
63e4671 to
6817cbc
Compare
6817cbc to
0df61f4
Compare
|
Label |
BREAKING CHANGE: The openshell sandbox create --gpu-device flag and corresponding API field were removed. Select specific GPUs through driver-specific driver_config fields instead. Signed-off-by: Evan Lezar <elezar@nvidia.com>
0df61f4 to
253644c
Compare
| gpu.then(|| { | ||
| if gpu_device.is_empty() { | ||
| if cdi_devices.is_empty() { | ||
| vec![CDI_GPU_DEVICE_ALL.to_string()] |
There was a problem hiding this comment.
I would maybe suggest to have the fallback to one gpu ? Or is there any technical reasons I am missing ?
There was a problem hiding this comment.
The behavioural change of falling back to a single GPU is handled in #1675. It requires adding device detection at a driver level AND allowing round-robin selection of devices for both Docker and Podman.
PR Review StatusValidation: this is maintainer-authored, project-valid GPU driver/API boundary work related to #1716 and #1812. Review findings:
Docs: Fern docs were updated for the new Next state: |
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Re-check After Author UpdateI re-evaluated latest head Disposition: partially resolved. Remaining items:
Next state: |
Re-check After Author UpdateI re-evaluated latest head Disposition: resolved. Remaining items:
Checks: required checks need to run again for the new head; Next state: |
Maintainer Approval NeededGator validation and PR monitoring are complete. Validation: maintainer-authored GPU driver/API boundary work scoped to moving device selection into driver configuration. Human maintainer approval or merge decision is now required. |
Summary
Move exact GPU device selection out of the public sandbox proto/API and into driver-specific
driver_configfields.Related Issue
Related to #1716 and #1812.
Changes
gpu_devicefields, reserving the field numbers and names.--gpu-deviceflag.driver_config.cdi_devices.driver_config.gpu_device_ids, currently limited to one entry.gpu=truewhen exact GPU device config is supplied, and reject exact selection for Kubernetes.--driver-config-jsonpath.Testing
mise run pre-commitpassesChecklist