Skip to content

TimThuering/N-Body-Simulation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

119 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

N-Body-Simulation

This project started as part of a bachelor thesis.

It contains code for n-body simulations that were used to analyze the runtime behavior of two widely known n-body algorithms: the naive approach and the Barnes-Hut algorithm.

Both algorithms are implemented using SYCL and can be executed in parallel on GPUs and CPUs.

Features

  • Parallel SYCL implementation of the naive algorithm
  • Parallel SYCL implementation of the Barnes-Hut algorithm
  • Support for CPUs as well as GPUs from NVIDIA and AMD (not tested on Intel GPUs)
  • ParaView output for visualization

Installation

The project supports two different SYCL implementations: AdaptiveCpp and DPC++. The project requires Linux.

After the installation of one of the two supported SYCL implementations, clone this repository and create a build directory:

git clone https://github.com/TimThuering/N-Body-Simulation.git
cd N-Body-Simulation
mkdir build
cd build

Building the project with AdaptiveCpp

AdaptiveCpp is supported with the CUDA, ROCm, and OpenMP backends. Note that since we use sycl::nd_ranges, AdaptiveCpp should be built with the omp.accelerated target enabled in order to achieve acceptable performance on CPUs.

The following commands build the project with AdaptiveCpp for CUDA and OpenMP. Replace sm_XX with the compute capability of your GPU, e.g., sm_75.

cmake -DUSE_DPCPP=OFF -DHIPSYCL_TARGETS="cuda:sm_XX;omp.accelerated" -DCMAKE_BUILD_TYPE=release ..
make

To build the project for AMD GPUs, replace cuda:sm_XX with hip:gfxXXX and replace gfxXXX according to your AMD GPU.

If you do not wish to build the project with support for CPUs with OpenMP, delete ;omp.accelerated.

For further details, please refer to the AdaptiveCpp documentation.

The project was tested with this AdaptiveCpp commit.

Note: using AdaptiveCpp's new generic target may work, but hasn't been tested.

Building the project with DPC++

DPC++ is supported with the CUDA backend. AMD GPUs are only supported with DPC++ when using the naive algorithm.

The following commands build the project with DPC++ for CUDA.

cmake -DUSE_DPCPP=ON -DCMAKE_BUILD_TYPE=release ..
make

The following commands build the project with DPC++ for AMD GPUS. Make sure that the DEVICE_LIB_PATH environment variable points the the location of <path to amdgcn>/amdgcn/bitcode/

cmake -DUSE_DPCPP=ON -DUSE_DPCPP_AMD=ON -DDPCPP_ARCH="gfxXXX" -DCMAKE_BUILD_TYPE=release ..
make

Replace gfxXXX according to your AMD GPU. For more information, please refer to the DPC++ documentation.

The project was tested with this DPC++ commit.

Running the program

The program has several optional and mandatory program arguments.

Mandatory program arguments

Argument Description Notes
--file Path to a .csv file containing the data for the simulation -
--dt Width of the time step for the simulation E.g.: 1h for one hour
--t_end The internal time until the system will be simulated E.g.: 365d for 365 days or 12y for twelve years
--vs The time step width of the visualization E.g.: 1d to visualize every day
--vs_dir The top-level output directory for the output files A separate folder (with a timestamp) that
contains all output files will be created in this directory
--algorithm The algorithm to use for the simulation Either <naive> or <BarnesHut>

Optional program arguments

Argument Description Notes
--use_gpus Enable / disable execution on GPUs
(GPU execution is enabled by default)
true or false
--energy Enable / disable computation of the energy of the system in each
visualized step (disabled by default).
The results will be written to the ParaView files.
true or false,
can result into long runtimes with large datasets

Optional arguments for the naive algorithm

Argument Description Notes
--block_size The size of the blocks after which the
local memory is updated in the naive algorithm
Should be a power of 2, e.g., 128
--opt_stage Selects the optimization stage of the naive algorithm (default: 2) Possible values:
  • 0 (non-optimized)
  • 1
  • 2 (highest degree of optimization)

Optional arguments for the Barnes-Hut algorithm

Argument Description Notes
--theta The theta-value which determines the
accuracy of the Barnes-Hut algorithm
Smaller values like 0.2
mean worse performance but better accuracy
Larger values like 1 result into better
performance but worse accuracy
--num_wi_octree Determines the number of work-items
used to build the octree
-
--num_wi_top_octree Determines the number of work-items
used to build the top of the octree
-
--num_wi_AABB Determines the number of work-items
used to calculate the AABB
-
--num_wi_com Determines the number of work-items
used to calculate the center of mass
-
--max_level_top_octree Determines the maximum level
to which the top of the octree gets build
-
--wg_size_barnes_hut Determines the work-group size of the acceleration kernel -
--sort_bodies Enable / disable sorting of the bodies
according to their in-order position in the
octree (enabled by default)
true or false
--storage_size_param Scales the amount of memory for the
octree data structures
Use only if you encounter problems
with specific datasets
--stack_size_param Scales the amount of memory for the
stack used to traverse the octree
Use only if you encounter problems
with specific datasets

CMake options

Option Description Notes
-DUSE_DPCPP Enable / disable the usage of DPC++ ON (Default) or OFF
-DUSE_OCTREE_TOP_DOWN_SYNC Use the top-down synchronized approach without subtrees for the tree creation Default OFF
-DENABLE_TESTS Enable building of tests Only supported with DPC++
-DUSE_DPCPP_AMD Use DPC++ with AMD GPUs -
-DDPCPP_ARCH Specify the GPU architecture for DPC++ when using AMD GPUs Not recommended when using NVIDIA GPUs

Input data format

The input data for the simulation has to contain information about all bodies of the system. The columns of the .csv file have to contain the following information about each body, starting in the second line:

id name of body name for class of body mass of body in kg x position y position z position x velocity y velocity z velocity

Use a , as the separator. The unit of length has to be the astronomical unit (AU).

The dataset converter can be used to generate datasets in this format.

Examples

The following command starts a simulation using the naive algorithm.

./N_Body_Simulation --file=<path to simulation data> --dt=1h --t_end=365d --vs=1d --vs_dir=<path to output directory> --algorithm=naive

The following command starts a simulation using the Barnes-Hut algorithm, specifying work-item counts for the tree creation and the theta value explicitly.

./N_Body_Simulation --file=<path to simulation data> --dt=1h --t_end=365d --vs=1d --vs_dir=<path to output directory> --algorithm=BarnesHut --theta=0.6 --num_wi_top_octree=640 --num_wi_octree=640

References

The implementation of the naive algorithm is based on the work by Nyland et al. The implementation of the Barnes-Hut algorithm is based on the work by Burtscher et al.

About

SYCL code for n-body simulations with the naive algorithm and the Barnes-Hut algorithm

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors