Multi-node Jobs
Recommendations for running multi-node jobs.
- Check that a standard single node job runs correctly to validate the installation, paths, and submission line.
- A host must be defined, and then passed to
mpirun/mpiexec
with--hostfile <hostfilename>
.mpirun -np < numprocs > hostfile hostfilename > nfx_exe > i <casefile>
Tip: Learn how to define a host file on the OpenMPI FAQ page.Note: Host files depend on the system topology. If PBS is used for scheduling jobs, it is aware of the topology and it is possible to usePBS_NODEFILE
by using--hostfile $PBS_NODEFILE
. - Use a PBS or an equivalent job scheduler for multi-node runs.
- If launching directly from command line without using PBS or any equivalent
job scheduler, ssh access between nodes without a password prompt is
needed.Tip: Learn how to get ssh access without a password on the OpenMPI FAQ page.Important: Only use this method if you have an advanced understanding. Consult with your system admin for more information or recommendations.
2022.1 or Newer
It is advised to run general diagnostics on the system to make sure the infiniband connection (packages, connections, etc.) is working.
2021 or Older
It is advised to run general diagnostics on the system to make sure ibverbs (packages, connections, etc.) is working.
For recent versions of nanoFluidX, the ibverbs version of
OpenMPI must be sourced. Starting from 2019.1, you can source
set_nFX_environment.sh ibverbs
.