TACC User Guides

Batch Systems Comparison

Comparison Tables

This document provides basic resource specifications and batch options for the LoadLeveler (Champion), LSF (Lonestar), and SGE (Ranger and Stampede) batch utilities, compares them, and lists them side-by-side. While there are no TACC systems that currently use the PBS batch system, commands for that batch system are included to help users migrate from sites that use PBS to one of the batch systems used at TACC.

Table 1 compares the resource syntax of each batch system for the most commonly employed user specifications. Examples of these specifications include the total number of nodes and tasks per node, the wall clock time, and the peak memory usage per task. Additional specifications include the delineation of a specific "class" or "queue" to designate a relative priority in advancing through the queue structure and an email request for notifying the user at the beginning or end of a given job execution.

Table 2 provides a list of important environment variables under each batch system, and Table 3 compares the relevant resource management commands to submit, monitor, and cancel queued jobs. As a final comparison between the batch systems, example submission scripts are provided for each of the three batch systems to request comparable resources and run a given parallel executable named mpihello.

Table 1 Important Resource Syntax for LoadLeveler, PBS, LSF, and SGE.
Utility LoadLeveler (LL) PBS LSF SGE
Resource Sentinel # @ #PBS #BSUB #$
Nodes/Processors node = < # >
tasks_per_node = < # >
-l nodes=< # >:ppn=< # >
(ppn = proc. per node)
-n < # > -pe < # >wayness < #cc >
(wayness=cores/node)
(#cc=core count)
Wall Clock Limit wall_clock_limit= [dd:]hh:mm:ss -l walltime =hh:mm:ss -W hh:mm -l h_rt=hh:mm:ss
Queue Class = < queue > -q < queue > -q < queue > -q < queue >
email notification =always| error| start| never| complete -me B (sends mail when job begins execution)
-N (sends job report by mail when job finishes)
-m be (sends mail when job begins/ends execution)
email address notify_user=< email > -M < email_address > -u < email_address > -M < email_address >
Initial Directory initialdir=< directory > (default = $HOME) (default = job
submission directory)
(default = $HOME)
Job Name job_name=< name > -N < name > -J < name > -N < name >
STDERR & STDOUT to same file output = < file >
error = $(output)
-j oe (use -o without -e) -j y
Project to charge account_no=< project >   -P < project > -A < project >

 

Table 2 Important Environment Variables
  LoadLeveler PBS LSF SGE
Processor List $LOADL_PROCESSOR_LIST cat -n $PBS_NODEFILE $LSB_HOSTS (not available)
Submission Directory $LOADL_STEP_INITDIR $PBS_O_WORKDIR $LS_SUBCWD #$ -V or $SGE_O_WORKDIR
Job ID $LOADL_STEP_ID $PBS_JOBID $LSB_JOBI $JOB_NAME

 

Table 3 Queue Management Commands for Each System
Purpose LoadLeveler PBS LSF SGE
Submission llsubmit job qsub job bsub < job qsub job
Deletion llcancel qdel bkill qdel
Status llq qstat bjobs qstat
List Queue llclass qstat -Q bqueues -l qconf sql
GUI Monitor xloadl xpbsmon (not available) (not available)

 

Example Batch Scripts

You can find example job scripts for each batch system below. All scripts specify the same resources and run the same parallel executable:

Example PBS job script

In the example below, the environment variables PBS_O_HOST, PBS_NODEFILE, and PBS_O_WORKDIR contain the master host, list of assigned compute nodes, and the directory of submission, respectively. Mpirun is used to launch the parallel applications on 16 processors (the "-np" argument).

#!/bin/csh
#PBS -l nodes=8:ppn=2
#PBS -l walltime=6:00:00
#PBS -q normal
#PBS -N hello
#PBS -j oe
#PBS -me -M This e-mail address is being protected from spambots. You need JavaScript enabled to view it

echo "Master Host: $PBS_O_HOST"
echo "Nodes:"; cat -n $PBS_NODEFILE; echo ""
echo "-----------------------------------------------"

cd $PBS_O_WORKDIR
mpirun -np 16 ./mpihello

Example LSF job script

In the LSF example below, the "%J" expression is evaluated as the job name by the LSF interpreter. The environment variables LSB_HOSTS and LS_SUBCWD contain the list of assigned compute nodes and the submission directory, respectively.

#!/bin/csh
#BSUB -n 16
#BSUB -W 6:00
#BSUB -q normal
#BSUB -J hello
#BSUB -o out.o%J
#BSUB -u This e-mail address is being protected from spambots. You need JavaScript enabled to view it

echo "Master Host: `hostname` "
echo "Node List: $LSB_HOSTS "

cd $LS_SUBCWD
ibrun ./mpihello

LoadLeveler example job script

In the following LoadLeveler example, the "environment" keyword provides a list of colon separated environment variable values (variable_name=variable_value).

Note: The environment resource specification must be on a single line (the expression used below is wrapped only for display). Setting COPY_ALL (without a value) signals LoadLeveler to copy all of your interactive variables to the batch environment. The MP_EUILIB=us and MP_SHARED_MEMORY ensure the correct software (user space) and shared memory mpi buffers for MPI, respectively. The network.MPI resources (csss,shared,US) specifies the SP2 dual-plane adapters, shared memory, and "us" software stack, respectively.

The $LOADL_PROCESSOR_LIST and $LOADL_STEP_INITDIR environment variables contain the list of processors and the submission directory. The "poe" command is used to launch the parallel applications on 16 processors (node and tasks_per_node are used to determine the number of processors). For code compiled with "MP" compilers (mpxlf90, mpcc, etc.) the "poe" is not necessary. Using hpmcount in lieu of the poe will provide hardware counter information for the parallel execution.

#!/usr/bin/csh
# # @ environment = COPY_ALL;MP_EUILIB=us;MP_INTRDELAY=100;
XLSMPOPTS=parthds=1;SPINLOOPTIME=10000;
YIELDLOOPTIME=10000;MP_CPU_USE=multiple;
MP_SHARED_MEMORY=yes;MP_INTRDELAY=100

# @ node = 4
# @ tasks_per_node = 4
# @ resources = ConsumableCpus(1) ConsumableMemory(1800MB)

# @ wall_clock_limit = 06:00:00
# @ class = normal

# @ job_name = hello
# @ output = $(job_name).o$(jobid)
# @ error = $(job_name).o$(jobid)
# @ notification = never

# @ network.MPI = csss,shared,US
# @ job_type = parallel

# @ notification=never
# @ notify_user = This e-mail address is being protected from spambots. You need JavaScript enabled to view it
# @ queue

echo "Master Host: `pwd`"
echo "NODELIST: $LOADL_PROCESSOR_LIST"
echo "----------------------------------"

cd $LOADL_STEP_INITDIR
poe ./mpihello >

Example SGE job script

#!/bin/bash  
#$ -V # Inherit the submission environment
#$ -cwd # Start job in submission directory
#$ -N myMPI # Job Name
#$ -j y # Combine stderr and stdout
#$ -o ${JOB_NAME}.o${JOB_ID} # Name of the output file (eg. myMPI.oJobID)
#$ -pe 16way 32 # Requests 16 tasks/node, 32 cores total
#$ -q normal # Queue name "normal"
#$ -l h_rt=01:30:00 # Run time (hh:mm:ss) - 1.5 hours
#$ -M # Use email notification address
#$ -m be # Email at Begin and End of job
set -x # Echo commands, use "set echo" with csh
ibrun ./mpihello # Run the MPI executable named "mpihello"

 

The "$JOB_NAME" variable is allowed in the batch resource specification in SGE. SGE launches 16 executables on each node (16way) and uses the core count (32) to determine the number of nodes; hence the core count must be divisible by 16 at TACC. If the number of tasks to be launched is not divisible by 16, then set the MY_NSLOTS environment variable to the number of tasks, and set the core count to the next number divisible by 16 (in order the get the correct number of nodes).

Note: "ibrun" is a wrapper script at TACC for using an InfiniBand aware MPI.