Cori (NERSC)

The Cori cluster is located at NERSC.

If you are new to this system, please see the following resources:

GPU nodes
Cori user guide
Batch system: Slurm
Jupyter service
Production directories:
- $SCRATCH: per-user production directory (20TB)
- /global/cscratch1/sd/m3239: shared production directory for users in the project m3239 (50TB)
- /global/cfs/cdirs/m3239/: community file system for users in the project m3239 (100TB)

Installation

Use the following commands to download the ImpactX source code and switch to the correct branch:

git clone https://github.com/ECP-WarpX/impactx.git $HOME/src/impactx

KNL

We use the following modules and environments on the system ($HOME/knl_impactx.profile).

module swap craype-haswell craype-mic-knl
module swap PrgEnv-intel PrgEnv-gnu
module load cmake/3.22.1
module load cray-hdf5-parallel/1.10.5.2
module load cray-fftw/3.3.8.10
module load cray-python/3.9.7.1

export PKG_CONFIG_PATH=$FFTW_DIR/pkgconfig:$PKG_CONFIG_PATH
export CMAKE_PREFIX_PATH=$HOME/sw/knl/adios2-2.7.1-install:$CMAKE_PREFIX_PATH

if [ -d "$HOME/sw/knl/venvs/impactx" ]
then
  source $HOME/sw/knl/venvs/impactx/bin/activate
fi

export CXXFLAGS="-march=knl"
export CFLAGS="-march=knl"

For PICMI and Python workflows, also install a virtual environment:

# establish Python dependencies
python3 -m pip install --user --upgrade pip
python3 -m pip install --user virtualenv

python3 -m venv $HOME/sw/knl/venvs/impactx
source $HOME/sw/knl/venvs/impactx/bin/activate

python3 -m pip install --upgrade pip
MPICC="cc -shared" python3 -m pip install -U --no-cache-dir -v mpi4py
python3 -m pip install -r $HOME/src/impactx/requirements.txt

Haswell

We use the following modules and environments on the system ($HOME/haswell_impactx.profile).

module swap PrgEnv-intel PrgEnv-gnu
module load cmake/3.22.1
module load cray-hdf5-parallel/1.10.5.2
module load cray-fftw/3.3.8.10
module load cray-python/3.9.7.1

export PKG_CONFIG_PATH=$FFTW_DIR/pkgconfig:$PKG_CONFIG_PATH
export CMAKE_PREFIX_PATH=$HOME/sw/haswell/adios2-2.7.1-install:$CMAKE_PREFIX_PATH

if [ -d "$HOME/sw/haswell/venvs/impactx" ]
then
  source $HOME/sw/haswell/venvs/impactx/bin/activate
fi

For PICMI and Python workflows, also install a virtual environment:

# establish Python dependencies
python3 -m pip install --user --upgrade pip
python3 -m pip install --user virtualenv

python3 -m venv $HOME/sw/haswell/venvs/impactx
source $HOME/sw/haswell/venvs/impactx/bin/activate

python3 -m pip install --upgrade pip
MPICC="cc -shared" python3 -m pip install -U --no-cache-dir -v mpi4py
python3 -m pip install -r $HOME/src/impactx/requirements.txt

GPU (V100)

Cori provides a partition with 18 nodes that include V100 (16 GB) GPUs. We use the following modules and environments on the system ($HOME/gpu_impactx.profile).

export proj="m1759"

module purge
module load modules
module load cgpu
module load esslurm
module load gcc/8.3.0 cuda/11.4.0 cmake/3.22.1
module load openmpi

export CMAKE_PREFIX_PATH=$HOME/sw/cori_gpu/adios2-2.7.1-install:$CMAKE_PREFIX_PATH

if [ -d "$HOME/sw/cori_gpu/venvs/impactx" ]
then
  source $HOME/sw/cori_gpu/venvs/impactx/bin/activate
fi

# compiler environment hints
export CC=$(which gcc)
export CXX=$(which g++)
export FC=$(which gfortran)
export CUDACXX=$(which nvcc)
export CUDAHOSTCXX=$(which g++)

# optimize CUDA compilation for V100
export AMREX_CUDA_ARCH=7.0

# allocate a GPU, e.g. to compile on
#   10 logical cores (5 physical), 1 GPU
function getNode() {
    salloc -C gpu -N 1 -t 30 -c 10 --gres=gpu:1 -A $proj
}

For PICMI and Python workflows, also install a virtual environment:

# establish Python dependencies
python3 -m pip install --user --upgrade pip
python3 -m pip install --user virtualenv

python3 -m venv $HOME/sw/cori_gpu/venvs/impactx
source $HOME/sw/cori_gpu/venvs/impactx/bin/activate

python3 -m pip install --upgrade pip
python3 -m pip install -U --no-cache-dir -v mpi4py
python3 -m pip install -r $HOME/src/impactx/requirements.txt

Building ImpactX

We recommend to store the above lines in individual impactx.profile files, as suggested above. If you want to run on either of the three partitions of Cori, open a new terminal, log into Cori and source the environment you want to work with:

# KNL:
source $HOME/knl_impactx.profile

# Haswell:
#source $HOME/haswell_impactx.profile

# GPU:
#source $HOME/gpu_impactx.profile

Warning

Consider that all three Cori partitions are incompatible.

Do not source multiple ...impactx.profile files in the same terminal session. Open a new terminal and log into Cori again, if you want to switch the targeted Cori partition.

If you re-submit an already compiled simulation that you ran on another day or in another session, make sure to source the corresponding ...impactx.profile again after login!

Then, cd into the directory $HOME/src/impactx and use the following commands to compile:

cd $HOME/src/impactx
rm -rf build

#                       append if you target GPUs:    -DImpactX_COMPUTE=CUDA
cmake -S . -B build -DImpactX_OPENPMD=ON -DImpactX_DIMS=3
cmake --build build -j 16

Testing

To run all tests (here on KNL), do:

srun -C knl -N 1 -t 30 -q debug ctest --test-dir build --output-on-failure

Running

Navigate (i.e. cd) into one of the production directories (e.g. $SCRATCH) before executing the instructions below.

KNL

The batch script below can be used to run a ImpactX simulation on 2 KNL nodes on the supercomputer Cori at NERSC. Replace descriptions between chevrons <> by relevant values, for instance <job name> could be laserWakefield.

Do not forget to first source $HOME/knl_impactx.profile if you have not done so already for this terminal session.

For PICMI Python runs, the <path/to/executable> has to read python3 and the <input file> is the path to your PICMI input script.

#!/bin/bash -l

# Copyright 2019 Maxence Thevenet
#
# This file is part of ImpactX.
#
# License: BSD-3-Clause-LBNL


#SBATCH -N 2
#SBATCH -t 01:00:00
#SBATCH -q regular
#SBATCH -C knl
#SBATCH -S 4
#SBATCH -J <job name>
#SBATCH -A <allocation ID>
#SBATCH -e ImpactX.e%j
#SBATCH -o ImpactX.o%j

export OMP_PLACES=threads
export OMP_PROC_BIND=spread

# KNLs have 4 hyperthreads max
export CORI_MAX_HYPETHREAD_LEVEL=4
# We use 64 cores out of the 68 available on Cori KNL,
# and leave 4 to the system (see "#SBATCH -S 4" above).
export CORI_NCORES_PER_NODE=64

# Typically use 8 MPI ranks per node without hyperthreading,
# i.e., OMP_NUM_THREADS=8
export IMPACTX_NMPI_PER_NODE=8
export IMPACTX_HYPERTHREAD_LEVEL=1

# Compute OMP_NUM_THREADS and the thread count (-c option)
export CORI_NHYPERTHREADS_MAX=$(( ${CORI_MAX_HYPETHREAD_LEVEL} * ${CORI_NCORES_PER_NODE} ))
export IMPACTX_NTHREADS_PER_NODE=$(( ${IMPACTX_HYPERTHREAD_LEVEL} * ${CORI_NCORES_PER_NODE} ))
export OMP_NUM_THREADS=$(( ${IMPACTX_NTHREADS_PER_NODE} / ${IMPACTX_NMPI_PER_NODE} ))
export IMPACTX_THREAD_COUNT=$(( ${CORI_NHYPERTHREADS_MAX} / ${IMPACTX_NMPI_PER_NODE} ))

# for async_io support: (optional)
export MPICH_MAX_THREAD_SAFETY=multiple

srun --cpu_bind=cores -n $(( ${SLURM_JOB_NUM_NODES} * ${IMPACTX_NMPI_PER_NODE} )) -c ${IMPACTX_THREAD_COUNT} \
  <path/to/executable> <input file> \
  > output.txt

To run a simulation, copy the lines above to a file batch_cori.sh and run

sbatch batch_cori.sh

to submit the job.

For a 3D simulation with a few (1-4) particles per cell using FDTD Maxwell solver on Cori KNL for a well load-balanced problem (in our case laser wakefield acceleration simulation in a boosted frame in the quasi-linear regime), the following set of parameters provided good performance:

amr.max_grid_size=64 and amr.blocking_factor=64 so that the size of each grid is fixed to 64**3 (we are not using load-balancing here).
8 MPI ranks per KNL node, with OMP_NUM_THREADS=8 (that is 64 threads per KNL node, i.e. 1 thread per physical core, and 4 cores left to the system).
2 grids per MPI, i.e., 16 grids per KNL node.

Haswell

The batch script below can be used to run a ImpactX simulation on 1 Haswell node on the supercomputer Cori at NERSC.

Do not forget to first source $HOME/haswell_impactx.profile if you have not done so already for this terminal session.

#!/bin/bash -l

# Just increase this number of you need more nodes.
#SBATCH -N 1
#SBATCH -t 03:00:00
#SBATCH -q regular
#SBATCH -C haswell
#SBATCH -J <job name>
#SBATCH -A <allocation ID>
#SBATCH -e ImpactX.e%j
#SBATCH -o ImpactX.o%j
# one MPI rank per half-socket (see below)
#SBATCH --tasks-per-node=4
# request all logical (virtual) cores per half-socket
#SBATCH --cpus-per-task=16


# each Cori Haswell node has 2 sockets of Intel Xeon E5-2698 v3
# each Xeon CPU is divided into 2 bus rings that each have direct L3 access
export IMPACTX_NMPI_PER_NODE=4

# each MPI rank per half-socket has 8 physical cores
#   or 16 logical (virtual) cores
# over-subscribing each physical core with 2x
#   hyperthreading leads to a slight (3.5%) speedup
# the settings below make sure threads are close to the
#   controlling MPI rank (process) per half socket and
#   distribute equally over close-by physical cores and,
#   for N>8, also equally over close-by logical cores
export OMP_PROC_BIND=spread
export OMP_PLACES=threads
export OMP_NUM_THREADS=16

# for async_io support: (optional)
export MPICH_MAX_THREAD_SAFETY=multiple

EXE="<path/to/executable>"

srun --cpu_bind=cores -n $(( ${SLURM_JOB_NUM_NODES} * ${IMPACTX_NMPI_PER_NODE} )) \
  ${EXE} <input file> \
  > output.txt

To run a simulation, copy the lines above to a file batch_cori_haswell.sh and run

sbatch batch_cori_haswell.sh

to submit the job.

For a 3D simulation with a few (1-4) particles per cell using FDTD Maxwell solver on Cori Haswell for a well load-balanced problem (in our case laser wakefield acceleration simulation in a boosted frame in the quasi-linear regime), the following set of parameters provided good performance:

4 MPI ranks per Haswell node (2 MPI ranks per Intel Xeon E5-2698 v3), with OMP_NUM_THREADS=16 (which uses 2x hyperthreading)

GPU (V100)

Do not forget to first source $HOME/gpu_impactx.profile if you have not done so already for this terminal session.

Due to the limited amount of GPU development nodes, just request a single node with the above defined getNode function. For single-node runs, try to run one grid per GPU.

A multi-node batch script template can be found below:

#!/bin/bash -l

# Copyright 2021 Axel Huebl
# This file is part of ImpactX.
# License: BSD-3-Clause-LBNL
#
# Ref:
# - https://docs-dev.nersc.gov/cgpu/hardware/
# - https://docs-dev.nersc.gov/cgpu/access/
# - https://docs-dev.nersc.gov/cgpu/usage/#controlling-task-and-gpu-binding

# Just increase this number of you need more nodes.
#SBATCH -N 2
#SBATCH -t 03:00:00
#SBATCH -J <job name>
#SBATCH -A m1759
#SBATCH -q regular
#SBATCH -C gpu
# 8 V100 GPUs (16 GB) per node
#SBATCH --gres=gpu:8
#SBATCH --exclusive
# one MPI rank per GPU (a quarter-socket)
#SBATCH --tasks-per-node=8
# request all logical (virtual) cores per quarter-socket
#SBATCH --cpus-per-task=10
#SBATCH -e ImpactX.e%j
#SBATCH -o ImpactX.o%j


# each Cori GPU node has 2 sockets of Intel Xeon Gold 6148 ('Skylake') @ 2.40 GHz
export IMPACTX_NMPI_PER_NODE=8

# each MPI rank per half-socket has 10 physical cores
#   or 20 logical (virtual) cores
# we split half-sockets again by 2 to have one MPI rank per GPU
# over-subscribing each physical core with 2x
#   hyperthreading leads to often to slight speedup on Intel
# the settings below make sure threads are close to the
#   controlling MPI rank (process) per half socket and
#   distribute equally over close-by physical cores and,
#   for N>20, also equally over close-by logical cores
export OMP_PROC_BIND=spread
export OMP_PLACES=threads
export OMP_NUM_THREADS=10

# for async_io support: (optional)
export MPICH_MAX_THREAD_SAFETY=multiple

EXE="<path/to/executable>"

srun --cpu_bind=cores --gpus-per-task=1 --gpu-bind=map_gpu:0,1,2,3,4,5,6,7 \
  -n $(( ${SLURM_JOB_NUM_NODES} * ${IMPACTX_NMPI_PER_NODE} )) \
  ${EXE} <input file> \
  > output.txt

Post-Processing

For post-processing, most users use Python via NERSC’s Jupyter service (Docs).

As a one-time preparatory setup, create your own Conda environment as described in NERSC docs. In this manual, we often use this conda create line over the officially documented one:

conda create -n myenv -c conda-forge python mamba ipykernel ipympl matplotlib numpy pandas yt openpmd-viewer openpmd-api h5py fast-histogram

We then follow the Customizing Kernels with a Helper Shell Script section to finalize the setup of using this conda-environment as a custom Jupyter kernel.

When opening a Jupyter notebook, just select the name you picked for your custom kernel on the top right of the notebook.

Additional software can be installed later on, e.g., in a Jupyter cell using !mamba install -c conda-forge .... Software that is not available via conda can be installed via !python -m pip install ....