DAOS-18972 cart: configurable address family for fabric init (#18254)#18491
Open
frostedcmos wants to merge 1 commit into
Open
DAOS-18972 cart: configurable address family for fabric init (#18254)#18491frostedcmos wants to merge 1 commit into
frostedcmos wants to merge 1 commit into
Conversation
By default, CaRT initializes Mercury with na_init_info.addr_format set
to NA_ADDR_UNSPEC, which Mercury's na_ofi plugin maps to its per-provider
preference table. For verbs/RoCE this resolves to FI_SOCKADDR_IN (IPv4),
which causes libfabric's fabric scan and QP-attach to prefer IPv4 even
when the operator has explicitly configured the fabric NIC for IPv6.
Add a per-provider address-family hint, configurable two ways:
- crt_init_options_t::cio_addr_format (API field)
- D_ADDR_FORMAT environment variable
mirroring the existing pattern used by cio_provider / D_PROVIDER,
cio_interface / D_INTERFACE, cio_domain / D_DOMAIN, cio_port / D_PORT,
and cio_auth_key / D_PROVIDER_AUTH_KEY. Accepted values are "unspec"
(default), "ipv4", "ipv6", and "native". Unrecognized values fall back
silently to "unspec" rather than failing initialization, matching the
behavior of crt_str_to_tc() for traffic classes.
The value is stored per-provider on struct crt_prov_gdata::cpg_addr_format
and forwarded to Mercury via init_info.na_init_info.addr_format in
crt_hg_class_init(). For multi-provider configurations the option
accepts a comma-separated list, matched one-to-one with the entries of
D_PROVIDER (same parsing pattern as the existing fields).
Mercury already supports IPv6 end-to-end in the na_ofi plugin: the
addr_format mapping, addr_size, raw_addr serialize/deserialize, and
addr_to_key paths all handle FI_SOCKADDR_IN6. No Mercury change is
required.
Implementation notes:
* enum crt_addr_format mirrors enum na_addr_format. Static assertions
guard the alignment so crt_hg.c can cast the enum directly when
assigning na_init_info.addr_format, matching the existing idiom used
for cg_swim_tc -> enum na_traffic_class.
* Default behavior is preserved: omitting D_ADDR_FORMAT leaves the
per-provider gdata at CRT_AF_UNSPEC, which casts to NA_ADDR_UNSPEC,
yielding the previous IPv4-preferring fabric scan. Existing
deployments see no functional change.
* This is the CaRT half of IPv6 fabric support. Two companion changes
are required to round out the full v6 fabric story (each independent
and separate from this patch):
- src/control/server/server_utils.go: "tcp4" -> "tcp",
"0.0.0.0:%d" -> "[::]:%d" for the gRPC management listener;
- HG_Get_na_protocol_info or equivalent Mercury API to accept an
addr_format hint so the daos_server fabric scan respects v6 too.
Validation performed before submission (2-node DAOS 2.6.5 cluster,
Mellanox ConnectX-7, RoCEv2, ofi+verbs;ofi_rxm):
Test Result
---- ------
Patch builds against runtime libmercury OK
Default behaviour (D_ADDR_FORMAT unset) OK - all ranks Joined,
dmg pool create
and storage format
succeed unchanged
Explicit D_ADDR_FORMAT=ipv4 OK - identical to default
Unknown value (e.g. D_ADDR_FORMAT=garbagez) OK - silent fallback
to unspec, no init
failure
D_ADDR_FORMAT=ipv6 hint reaches libfabric OK - confirmed via
'rdma resource
show cm_id': engines
LISTEN on
[2a04:f547:93:3082::20bc]
:20000 (IPv6),
not 10.92.32.188:20000
(IPv4)
End-to-end IPv6 RPC between engines was *not* exercisable on this
specific test cluster: libfabric 1.22.0's verbs;ofi_rxm provider
returns ENODATA from fi_getinfo when an IPv6 addr_format hint is
passed, even though the fabric NIC has an IPv6 global configured.
Reproduced independently with 'fi_pingpong -p "verbs;ofi_rxm" -6',
which fails identically without any DAOS code involved. The kernel
RDMA-CM + RoCEv2 v6 path itself is fine (verified via rping). So the
v6 gap is in the libfabric verbs provider, downstream of every layer
this patch touches.
Local unit testing follows the existing convention for crt_str_to_tc()
and the other CRT_ENV_OPT_GET-mediated options, which are exercised via
the cart ftest suite rather than per-function unit tests.
Signed-off-by: Alex Timofeyev <atimofeyev@linkedin.com>
|
Ticket title is 'cart: configurable address family for fabric init' |
Collaborator
|
is that a clean cherry pick ? |
Contributor
Author
yep |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
By default, CaRT initializes Mercury with na_init_info.addr_format set to NA_ADDR_UNSPEC, which Mercury's na_ofi plugin maps to its per-provider preference table. For verbs/RoCE this resolves to FI_SOCKADDR_IN (IPv4), which causes libfabric's fabric scan and QP-attach to prefer IPv4 even when the operator has explicitly configured the fabric NIC for IPv6.
Add a per-provider address-family hint, configurable two ways:
mirroring the existing pattern used by cio_provider / D_PROVIDER, cio_interface / D_INTERFACE, cio_domain / D_DOMAIN, cio_port / D_PORT, and cio_auth_key / D_PROVIDER_AUTH_KEY. Accepted values are "unspec" (default), "ipv4", "ipv6", and "native". Unrecognized values fall back silently to "unspec" rather than failing initialization, matching the behavior of crt_str_to_tc() for traffic classes.
The value is stored per-provider on struct crt_prov_gdata::cpg_addr_format and forwarded to Mercury via init_info.na_init_info.addr_format in crt_hg_class_init(). For multi-provider configurations the option accepts a comma-separated list, matched one-to-one with the entries of D_PROVIDER (same parsing pattern as the existing fields).
Mercury already supports IPv6 end-to-end in the na_ofi plugin: the addr_format mapping, addr_size, raw_addr serialize/deserialize, and addr_to_key paths all handle FI_SOCKADDR_IN6. No Mercury change is required.
Implementation notes:
enum crt_addr_format mirrors enum na_addr_format. Static assertions guard the alignment so crt_hg.c can cast the enum directly when assigning na_init_info.addr_format, matching the existing idiom used for cg_swim_tc -> enum na_traffic_class.
Default behavior is preserved: omitting D_ADDR_FORMAT leaves the per-provider gdata at CRT_AF_UNSPEC, which casts to NA_ADDR_UNSPEC, yielding the previous IPv4-preferring fabric scan. Existing deployments see no functional change.
This is the CaRT half of IPv6 fabric support. Two companion changes are required to round out the full v6 fabric story (each independent and separate from this patch):
Validation performed before submission (2-node DAOS 2.6.5 cluster, Mellanox ConnectX-7, RoCEv2, ofi+verbs;ofi_rxm):
Test Result
Patch builds against runtime libmercury OK
Default behaviour (D_ADDR_FORMAT unset) OK - all ranks Joined,
dmg pool create
and storage format
succeed unchanged
Explicit D_ADDR_FORMAT=ipv4 OK - identical to default
Unknown value (e.g. D_ADDR_FORMAT=garbagez) OK - silent fallback
to unspec, no init
failure
D_ADDR_FORMAT=ipv6 hint reaches libfabric OK - confirmed via
'rdma resource
show cm_id': engines
LISTEN on
[2a04:f547:93:3082::20bc]
:20000 (IPv6),
not 10.92.32.188:20000
(IPv4)
End-to-end IPv6 RPC between engines was not exercisable on this specific test cluster: libfabric 1.22.0's verbs;ofi_rxm provider returns ENODATA from fi_getinfo when an IPv6 addr_format hint is passed, even though the fabric NIC has an IPv6 global configured. Reproduced independently with 'fi_pingpong -p "verbs;ofi_rxm" -6', which fails identically without any DAOS code involved. The kernel RDMA-CM + RoCEv2 v6 path itself is fine (verified via rping). So the v6 gap is in the libfabric verbs provider, downstream of every layer this patch touches.
Local unit testing follows the existing convention for crt_str_to_tc() and the other CRT_ENV_OPT_GET-mediated options, which are exercised via the cart ftest suite rather than per-function unit tests.
Steps for the author:
After all prior steps are complete: