Skip to content

DAOS-18972 cart: configurable address family for fabric init (#18254)#18491

Open
frostedcmos wants to merge 1 commit into
release/2.8from
frostedcmos/DAOS-18972-2.8
Open

DAOS-18972 cart: configurable address family for fabric init (#18254)#18491
frostedcmos wants to merge 1 commit into
release/2.8from
frostedcmos/DAOS-18972-2.8

Conversation

@frostedcmos

Copy link
Copy Markdown
Contributor

By default, CaRT initializes Mercury with na_init_info.addr_format set to NA_ADDR_UNSPEC, which Mercury's na_ofi plugin maps to its per-provider preference table. For verbs/RoCE this resolves to FI_SOCKADDR_IN (IPv4), which causes libfabric's fabric scan and QP-attach to prefer IPv4 even when the operator has explicitly configured the fabric NIC for IPv6.

Add a per-provider address-family hint, configurable two ways:

  • crt_init_options_t::cio_addr_format (API field)
  • D_ADDR_FORMAT environment variable

mirroring the existing pattern used by cio_provider / D_PROVIDER, cio_interface / D_INTERFACE, cio_domain / D_DOMAIN, cio_port / D_PORT, and cio_auth_key / D_PROVIDER_AUTH_KEY. Accepted values are "unspec" (default), "ipv4", "ipv6", and "native". Unrecognized values fall back silently to "unspec" rather than failing initialization, matching the behavior of crt_str_to_tc() for traffic classes.

The value is stored per-provider on struct crt_prov_gdata::cpg_addr_format and forwarded to Mercury via init_info.na_init_info.addr_format in crt_hg_class_init(). For multi-provider configurations the option accepts a comma-separated list, matched one-to-one with the entries of D_PROVIDER (same parsing pattern as the existing fields).

Mercury already supports IPv6 end-to-end in the na_ofi plugin: the addr_format mapping, addr_size, raw_addr serialize/deserialize, and addr_to_key paths all handle FI_SOCKADDR_IN6. No Mercury change is required.

Implementation notes:

  • enum crt_addr_format mirrors enum na_addr_format. Static assertions guard the alignment so crt_hg.c can cast the enum directly when assigning na_init_info.addr_format, matching the existing idiom used for cg_swim_tc -> enum na_traffic_class.

  • Default behavior is preserved: omitting D_ADDR_FORMAT leaves the per-provider gdata at CRT_AF_UNSPEC, which casts to NA_ADDR_UNSPEC, yielding the previous IPv4-preferring fabric scan. Existing deployments see no functional change.

  • This is the CaRT half of IPv6 fabric support. Two companion changes are required to round out the full v6 fabric story (each independent and separate from this patch):

    • src/control/server/server_utils.go: "tcp4" -> "tcp", "0.0.0.0:%d" -> "[::]:%d" for the gRPC management listener;
    • HG_Get_na_protocol_info or equivalent Mercury API to accept an addr_format hint so the daos_server fabric scan respects v6 too.

Validation performed before submission (2-node DAOS 2.6.5 cluster, Mellanox ConnectX-7, RoCEv2, ofi+verbs;ofi_rxm):

Test Result


Patch builds against runtime libmercury OK
Default behaviour (D_ADDR_FORMAT unset) OK - all ranks Joined,
dmg pool create
and storage format
succeed unchanged
Explicit D_ADDR_FORMAT=ipv4 OK - identical to default
Unknown value (e.g. D_ADDR_FORMAT=garbagez) OK - silent fallback
to unspec, no init
failure
D_ADDR_FORMAT=ipv6 hint reaches libfabric OK - confirmed via
'rdma resource
show cm_id': engines
LISTEN on
[2a04:f547:93:3082::20bc]
:20000 (IPv6),
not 10.92.32.188:20000
(IPv4)

End-to-end IPv6 RPC between engines was not exercisable on this specific test cluster: libfabric 1.22.0's verbs;ofi_rxm provider returns ENODATA from fi_getinfo when an IPv6 addr_format hint is passed, even though the fabric NIC has an IPv6 global configured. Reproduced independently with 'fi_pingpong -p "verbs;ofi_rxm" -6', which fails identically without any DAOS code involved. The kernel RDMA-CM + RoCEv2 v6 path itself is fine (verified via rping). So the v6 gap is in the libfabric verbs provider, downstream of every layer this patch touches.

Local unit testing follows the existing convention for crt_str_to_tc() and the other CRT_ENV_OPT_GET-mediated options, which are exercised via the cart ftest suite rather than per-function unit tests.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

By default, CaRT initializes Mercury with na_init_info.addr_format set
to NA_ADDR_UNSPEC, which Mercury's na_ofi plugin maps to its per-provider
preference table. For verbs/RoCE this resolves to FI_SOCKADDR_IN (IPv4),
which causes libfabric's fabric scan and QP-attach to prefer IPv4 even
when the operator has explicitly configured the fabric NIC for IPv6.

Add a per-provider address-family hint, configurable two ways:

  - crt_init_options_t::cio_addr_format (API field)
  - D_ADDR_FORMAT environment variable

mirroring the existing pattern used by cio_provider / D_PROVIDER,
cio_interface / D_INTERFACE, cio_domain / D_DOMAIN, cio_port / D_PORT,
and cio_auth_key / D_PROVIDER_AUTH_KEY. Accepted values are "unspec"
(default), "ipv4", "ipv6", and "native". Unrecognized values fall back
silently to "unspec" rather than failing initialization, matching the
behavior of crt_str_to_tc() for traffic classes.

The value is stored per-provider on struct crt_prov_gdata::cpg_addr_format
and forwarded to Mercury via init_info.na_init_info.addr_format in
crt_hg_class_init(). For multi-provider configurations the option
accepts a comma-separated list, matched one-to-one with the entries of
D_PROVIDER (same parsing pattern as the existing fields).

Mercury already supports IPv6 end-to-end in the na_ofi plugin: the
addr_format mapping, addr_size, raw_addr serialize/deserialize, and
addr_to_key paths all handle FI_SOCKADDR_IN6. No Mercury change is
required.

Implementation notes:

* enum crt_addr_format mirrors enum na_addr_format. Static assertions
  guard the alignment so crt_hg.c can cast the enum directly when
  assigning na_init_info.addr_format, matching the existing idiom used
  for cg_swim_tc -> enum na_traffic_class.

* Default behavior is preserved: omitting D_ADDR_FORMAT leaves the
  per-provider gdata at CRT_AF_UNSPEC, which casts to NA_ADDR_UNSPEC,
  yielding the previous IPv4-preferring fabric scan. Existing
  deployments see no functional change.

* This is the CaRT half of IPv6 fabric support. Two companion changes
  are required to round out the full v6 fabric story (each independent
  and separate from this patch):
  - src/control/server/server_utils.go: "tcp4" -> "tcp",
    "0.0.0.0:%d" -> "[::]:%d" for the gRPC management listener;
  - HG_Get_na_protocol_info or equivalent Mercury API to accept an
    addr_format hint so the daos_server fabric scan respects v6 too.

Validation performed before submission (2-node DAOS 2.6.5 cluster,
Mellanox ConnectX-7, RoCEv2, ofi+verbs;ofi_rxm):

  Test                                            Result
  ----                                            ------
  Patch builds against runtime libmercury         OK
  Default behaviour (D_ADDR_FORMAT unset)         OK - all ranks Joined,
                                                       dmg pool create
                                                       and storage format
                                                       succeed unchanged
  Explicit D_ADDR_FORMAT=ipv4                     OK - identical to default
  Unknown value (e.g. D_ADDR_FORMAT=garbagez)     OK - silent fallback
                                                       to unspec, no init
                                                       failure
  D_ADDR_FORMAT=ipv6 hint reaches libfabric       OK - confirmed via
                                                       'rdma resource
                                                       show cm_id': engines
                                                       LISTEN on
                                                       [2a04:f547:93:3082::20bc]
                                                       :20000 (IPv6),
                                                       not 10.92.32.188:20000
                                                       (IPv4)

End-to-end IPv6 RPC between engines was *not* exercisable on this
specific test cluster: libfabric 1.22.0's verbs;ofi_rxm provider
returns ENODATA from fi_getinfo when an IPv6 addr_format hint is
passed, even though the fabric NIC has an IPv6 global configured.
Reproduced independently with 'fi_pingpong -p "verbs;ofi_rxm" -6',
which fails identically without any DAOS code involved. The kernel
RDMA-CM + RoCEv2 v6 path itself is fine (verified via rping). So the
v6 gap is in the libfabric verbs provider, downstream of every layer
this patch touches.

Local unit testing follows the existing convention for crt_str_to_tc()
and the other CRT_ENV_OPT_GET-mediated options, which are exercised via
the cart ftest suite rather than per-function unit tests.

Signed-off-by: Alex Timofeyev <atimofeyev@linkedin.com>
@frostedcmos frostedcmos requested review from a team as code owners June 15, 2026 05:24
@github-actions

Copy link
Copy Markdown

Ticket title is 'cart: configurable address family for fabric init'
Status is 'In Review'
Labels: 'linkedin'
https://daosio.atlassian.net/browse/DAOS-18972

@soumagne

Copy link
Copy Markdown
Collaborator

is that a clean cherry pick ?

@frostedcmos

Copy link
Copy Markdown
Contributor Author

is that a clean cherry pick ?

yep

@frostedcmos frostedcmos added the clean-cherry-pick Cherry-pick from another branch that did not require additional edits label Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clean-cherry-pick Cherry-pick from another branch that did not require additional edits

Development

Successfully merging this pull request may close these issues.

3 participants