SWIFT: Using task-based parallelism, fully asynchronous communication, and graph partition-based domain decomposition for strong scaling on more than 100000 cores.
Abstract
We present a new open-source cosmological code, called \swift, designed to solve
the equations of hydrodynamics using a particle-based approach (Smooth Particle
Hydrodynamics) on hybrid shared / distributed-memory architectures. \swift was
designed from the bottom up to provide excellent {\em strong scaling} on both
commodity clusters (Tier-2 systems) and Top100-supercomputers (Tier-0 systems),
without relying on architecture-specific features or specialized accelerator
hardware. This performance is due to three main computational approaches:
\begin{itemize} \item \textbf{Task-based parallelism} for shared-memory
parallelism, which provides fine-grained load balancing and thus strong scaling on
large numbers of cores. \item \textbf{Graph-based domain decomposition}, which uses
the task graph to decompose the simulation domain such that the {\em work}, as
opposed to just the {\em data}, as is the case with most partitioning schemes, is
equally distributed across all nodes. \item \textbf{Fully dynamic and asynchronous
communication}, in which communication is modelled as just another task in the
task-based scheme, sending data whenever it is ready and deferrin on tasks that
rely on data from other nodes until it arrives. \end{itemize} In order to use these
approaches, the code had to be re-written from scratch, and the algorithms therein
adapted to the task-based paradigm. As a result, we can show upwards of 60\%
parallel efficiency for moderate-sized problems when increasing the number of cores
512-fold, on both x86-based and Power8-based architectures.
