Wiki Edit Blocks
Wiki Edit Blocks
  • Introduction
  • ▶️Getting started
    • Preparation
    • Download dataset from HF
    • Checkout and get your data
    • Modify dataset for your needs
    • Use data with limited resource
    • Distribute your dataset
    • Run on Slurm cluster
  • 🚧Advanced usages
    • Build from source
  • 🛠️External Resources
    • BloArk Documentation
    • BloArk GitHub Repo
    • WikiDL Documentation
    • Grimm Documentation
    • Ergodiff Documentation
Powered by GitBook
On this page

Was this helpful?

  1. Getting started

Run on Slurm cluster

This section is about how to improve your efficiency by utilizing cluster resources.

You can use the following bash script to run things on Slurm cluster. Running on a multi-core CPU cluster can increase the processing speed and save your local computer for other jobs.

#!/bin/bash

#SBATCH --job-name=wikidata
#SBATCH --partition=longq
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --time=14-00:00
#SBATCH --mem-per-cpu=6000
#SBATCH --output=log_%j.out
#SBATCH --error=log_%j.error

python your_script.py

You can adjust configurations accordingly.

  • Partition depends on your cluster network. Check out the documentation for your cluster network to find out what partitions are there.

  • CPUs per task is the number of CPUs (processes) you can use and you should use. Remember to set the num_proc argument to the exact same number of CPUs you configured here. Do not put a number smaller than the assigned num_proc argument!

  • Time is the maximum execution time. The configuration I set is 14 days. For just running 3 hours, set it to 3:00:00.

    • If your task finishes early, it will terminate, so no need to worry if setting a huge number.

    • Only be careful if your program could potentially stuck in an infinite loop or never end.

  • Mem per cpu is the memory space per CPU assigned. This is not associated with specific core. It is just a simple math (mem per cpu times number of CPUs) to determine how large the memory will be given to this task.

PreviousDistribute your datasetNextBuild from source

Last updated 1 year ago

Was this helpful?

▶️