Run on Slurm cluster

This section is about how to improve your efficiency by utilizing cluster resources.

You can use the following bash script to run things on Slurm cluster. Running on a multi-core CPU cluster can increase the processing speed and save your local computer for other jobs.

#!/bin/bash

#SBATCH --job-name=wikidata
#SBATCH --partition=longq
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --time=14-00:00
#SBATCH --mem-per-cpu=6000
#SBATCH --output=log_%j.out
#SBATCH --error=log_%j.error

python your_script.py

You can adjust configurations accordingly.

  • Partition depends on your cluster network. Check out the documentation for your cluster network to find out what partitions are there.

  • CPUs per task is the number of CPUs (processes) you can use and you should use. Remember to set the num_proc argument to the exact same number of CPUs you configured here. Do not put a number smaller than the assigned num_proc argument!

  • Time is the maximum execution time. The configuration I set is 14 days. For just running 3 hours, set it to 3:00:00.

    • If your task finishes early, it will terminate, so no need to worry if setting a huge number.

    • Only be careful if your program could potentially stuck in an infinite loop or never end.

  • Mem per cpu is the memory space per CPU assigned. This is not associated with specific core. It is just a simple math (mem per cpu times number of CPUs) to determine how large the memory will be given to this task.

Last updated