Rescan NVMe namespaces when a published namespace device is missing#1159
Open
grandeit wants to merge 1 commit into
Open
Rescan NVMe namespaces when a published namespace device is missing#1159grandeit wants to merge 1 commit into
grandeit wants to merge 1 commit into
Conversation
When a namespace is mapped to an already-connected NVMe subsystem, the host only creates its device node in response to an asynchronous notification from the controller. If that notification is missed, the device never appears and NodeStageVolume keeps failing with "no device found for the given namespace". Recover the namespace by issuing "nvme ns-rescan" on the subsystem's controllers: from the AttachNVMeVolume retry loop when the device is missing on an already-connected subsystem, and from NVMe self-healing when a published namespace has no device on the host. The self-healing case mirrors iSCSI self-healing, which already rescans the host for devices. A rescan only adds missing namespaces and never renames existing devices, so it is safe on a connected subsystem. Signed-off-by: Manuel Grandeit <m.grandeit@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change description
Rescan NVMe namespaces when a published namespace device is missing
When a namespace is mapped to an already-connected NVMe subsystem, the
host only creates its device node in response to an asynchronous
notification from the controller. If that notification is missed, the
device never appears and NodeStageVolume keeps failing with "no device
found for the given namespace".
Recover the namespace by issuing "nvme ns-rescan" on the subsystem's
controllers: from the AttachNVMeVolume retry loop when the device is
missing on an already-connected subsystem, and from NVMe self-healing
when a published namespace has no device on the host. The self-healing
case mirrors iSCSI self-healing, which already rescans the host for
devices. A rescan only adds missing namespaces and never renames
existing devices, so it is safe on a connected subsystem.
Project tracking
External community contribution. NetApp KB "Unable to mount PVC due to Kubernetes namespace error" documents this exact error but blames a missing namespace. The error fires during node staging, after the controller Publish has already mapped the namespace on ONTAP, so the namespace exists; the host just never enumerated it (nvme ns-rescan recovers it). This PR fixes that cause.
Do any added TODOs have an issue in the backlog?
None added.
Did you add unit tests? Why not?
The behavior that matters, whether the host re-enumerates the namespace after the rescan, is host/kernel behavior a unit test can't reproduce; a test could only assert that nvme ns-rescan is issued, not that the device returns. That recovery is covered by functional testing. Existing utils/nvme tests pass.
Does this code need functional testing?
Yes. This is host-level NVMe/TCP behavior unit tests can't fully cover. Best validated on NVMe/TCP with raw-block volumes by reproducing the hang under concurrent VM clones. Already confirmed manually that nvme ns-rescan recovers the stuck namespace on the affected node.
Is a code review walkthrough needed? why or why not?
A short one would help. The root cause (a missed NVMe discovery notification on an already-connected, namespace-dense subsystem) is non-obvious, and the change adds a new host command and a self-healing remediation.
Should additional test coverage be executed in addition to pre-merge?
NVMe/TCP with raw-block volumes (many namespaces in one subsystem) under concurrent volume creation/clone, plus a check that NodeStage and NodeUnstage are otherwise unaffected.
Does this code need a note in the changelog?
Yes:
Does this code require documentation changes?
No. Internal recovery mechanism, no config, CRD, or API change.
Additional Information
Under concurrent VM clones on NVMe/TCP, PVCs intermittently hang in NodeStageVolume with "no device found for the given namespace" while sibling volumes on the same subsystem mount fine. For raw-block volumes Trident packs many namespaces into one shared subsystem (getSuperSubsystemName, up to 1024), so after the first volume connects it, discovery of each later namespace depends on that one controller's notifications, and a clone burst makes a missed one likely. Trident's only recovery was a 20s retry that re-reads stale sysfs, with no rescan fallback. The self-healing addition mirrors iSCSI self-healing, which already rescans the host for LUNs (scanForAllLUNs).
This is the error from NetApp KB "Unable to mount PVC due to Kubernetes namespace error", which blames a missing namespace and says to recreate the PVC. That cause is wrong here: the error fires after Publish has already mapped the namespace on ONTAP, so it exists, and nvme ns-rescan recovers it (impossible if it didn't exist). Recreating the PVC only works by forcing re-enumeration, at the cost of destroying the PVC's data (e.g. a KubeVirt VM disk). This PR rescans to recover the namespace directly; if one is genuinely absent the rescan is a harmless no-op and the existing error still surfaces.
Builds for linux and darwin, go vet clean, existing utils/nvme tests pass.