Notes for the GPU session

We will use a Python notebook from inside a RAPIDS NGC Apptainer container. Building this container from scratch takes anywhere from 40 mins (in /localscratch) to several hours (in /scratch). Today we will skip this part, as we have already built this container for you – you can find it at /scratch/razoumov/rapids.sif on Cedar.

Note: Just for your information, here is how we recommend to build this container in /localscratch in the future:

Log in to Cedar cluster and cd ~/scratch

Create a job submission script distributed.sh filling in your Slurm allocation in the blank field:
#!/bin/bash
#SBATCH --time=0:90:0
#SBATCH --mem-per-cpu=7200
#SBATCH --account=...
WORKDIR=$(pwd)
cd $SLURM_TMPDIR
module load apptainer
export APPTAINER_TMPDIR=$SLURM_TMPDIR   # otherwise temporary files will go into /scratch/$USER (slower)
apptainer build rapids.sif docker://nvcr.io/nvidia/rapidsai/rapidsai:cuda11.5-runtime-centos7-py3.9
/bin/mv -f rapids.sif $WORKDIR
Submit this job with sbatch distributed.sh and wait for it to finish.

Today, we will be using this image. Log in to cedar.computecanada.ca with your guest account and copy the container and the notebooks into your /scratch directory. Next, start an interactive GPU job:

cd ~/scratch
cp /scratch/razoumov/rapids.sif .
cp /scratch/razoumov/notebook-1-cupy-intro.ipynb .
cp /scratch/razoumov/notebook-2-rapids-intro.ipynb .
cp /scratch/razoumov/notebook-3-numba-intro.ipynb .
module load apptainer
salloc --time=3:00:0 --mem-per-cpu=3600  --gpus-per-node=1 --account=def-training-wa_gpu --reservation=westdri-wr_gpu

Wait for the job to get started, and then – inside the job – start a shell inside the container. Please pay attention to the change in the prompt in the lines below, i.e. don’t blankly copy the prompt into your command line:

apptainer shell --nv -B /home -B /project -B /scratch rapids.sif

Apptainer> source /opt/conda/etc/profile.d/conda.sh
Apptainer> conda activate rapids

(rapids) Apptainer> jupyter-lab --ip $(hostname -f) --no-browser

When the JupyterLab server starts, it will produce something like:

To access the server, open this file in a browser:
    file:///home/.../.local/share/jupyter/runtime/jpserver-...-open.html
Or copy and paste one of these URLs:
    http://node_name.int.cedar.computecanada.ca:8888/lab?token=896fb...c28e1
 or http://127.0.0.1:8888/lab?token=896fb...c28e1

Take note of (1) the node name, (2) the token and possibly (3) the port if it is different from 8888.

On your computer, open a new local terminal, whether in Mac or Linux or inside MobaXTerm in Windows. In that window, paste the following command, substituting username by your username, node_name by its corresponding value, and the remote port (the second 8888) by the actual port (if different from 8888) – this will start SSH port forwarding from the local port 8888 to the remote port 8888 on the compute node:

ssh username@cedar.computecanada.ca -L 8888:node_name.int.cedar.computecanada.ca:8888

Finally, in the browser on your computer, go to http://localhost:8888/?token=896fb…c28e1, pasting in the full token . This will start JupyterLab. Inside it, start a Python 3 notebook.