MPI support#7
Conversation
…I with NVIDIA and AMD GPUs
vancraar
left a comment
There was a problem hiding this comment.
Thanks, Tim, for this comprehensive MPI addition! The implementation is well-structured overall. I commented to a few points that that caught my eye. What's your opinion on them?
| # Get mpi4py's C header location, simultaneously checking if mpi4py is importable in the current Python environment | ||
| execute_process( | ||
| COMMAND "${Python_EXECUTABLE}" -c | ||
| "import mpi4py, sys; sys.stdout.write(mpi4py.get_include())" |
There was a problem hiding this comment.
Is it possible, that mpi4py is installed, but broken without include path? Should we additionally check for that?
There was a problem hiding this comment.
Why do we now have to link PUBLIC against CUDA?
There was a problem hiding this comment.
Same as with CUDA why now PUBLIC?
| #include <utility> // std::move | ||
|
|
||
| #if defined(HWS_MPI_SUPPORT_ENABLED) | ||
| #include <mpi.h> // MPI_Comm |
| ] | ||
| # optional dependencies | ||
| [project.optional-dependencies] | ||
| mpi = ["mpi4py>=4"] |
There was a problem hiding this comment.
Why check here for Version >=4 but not in the CMakeLists.txt? Should we do this consistent on both sides?
There was a problem hiding this comment.
Should we really include the backend includes in the main utility.hpp, or wouldn't it be better to modularize them?
There was a problem hiding this comment.
In my view it looks like lines 70-111: The block inside else if (mode == detail::mpi_sampling_mode::whole_node)
needs one more level of indentation. The #if defined preprocessor directives and
detail::free_hostname_comm(nc) should be indented consistently with the code inside
the conditional block.
| #include <vector> // std::vector | ||
|
|
||
| #if defined(HWS_MPI_SUPPORT_ENABLED) | ||
| #include <mpi.h> // MPI_Comm, MPI_Gatherv, MPI_Gather, MPI_Initialized, MPI_Comm_rank, MPI_Comm_size |
vancraar
left a comment
There was a problem hiding this comment.
Thanks, Tim, for addressing all my review points! The additional CMake check for a broken mpi4py include path, the mpi4py version check, reverting the GPU library linkage back to PRIVATE, and the cleanup of the backend includes from utility.hpp all look good now. The PR is in great shape – happy to approve!
Summary
This PR adds optional MPI support to hws, enabling MPI-aware hardware sampling across distributed multi-process jobs.
API extensions
system_hardware_sampler(MPI_Comm, mpi_sampling_mode, ...): MPI-aware constructors that take a communicator and a sampling mode. Passinghws::detail::mpi_sampling_mode::whole_nodeensures that every device visible to at least one sampler is sampled exactly once, even if multiplesystem_hardware_samplerinstances from different ranks see the same device.start_sampling(MPI_Comm) / stop_sampling(MPI_Comm): synchronized start/stop with MPI barriers to align sampling windows across ranksdump_yaml_global(filename, MPI_Comm): gathers YAML output from all ranks on rank 0 and writes a single combined file; available on both hardware_sampler and system_hardware_samplerPython bindings
CMake
HWS_ENABLE_MPI_SUPPORT=AUTO|ON|OFFoption (default AUTO). When MPI is found and Python bindings are enabled, mpi4py must be importable in the configured Python environment.