Multi-Node Usage
ultraFluidX can generally be used on an arbitrary number of compute nodes.
However, be aware of typical strong scaling behavior, that is, for each problem size there is a maximum reasonable number of GPU devices. If the problem size per GPU device becomes too small, the communication between devices will start to dominate and the overall compute performance will not increase further.1
To start ultraFluidX on multiple compute nodes, it is recommended to use one of the two MPI options --hostfile or --host. Just like on a single machine, the main rank, 0 handles the preprocessing and the I/O, whereas each additional secondary rank 1...n handles one GPU device each. Therefore, if you want to run ultraFluidX on, for example, three machines with 2 GPU devices each, the simulation must be started with a total number of 7 MPI ranks (main rank 0 + 3×2 GPU ranks).
node1
,
node2
and node3
with node1
containing the main rank 0, the BASH command when using the
--host option would be (-np is not needed
in this case):
mpirun –-host node1,node1,node1,node2,node2,node3,node3 ultraFluidX case.xml
node1 slots=3
node2 slots=2
node3 slots=2
mpirun –-hostfile uFX_hosts –np 7 ultraFluidX case.xml
For further documentation regarding multi-node usage and also general handling and run-time tuning of Open MPI, refer to the respective FAQ on the Open MPI web page: https://www.open-mpi.org/faq/