PBS: Portable Batch System on komp

Jobs on komps02-24 (except 07 and 11), dual850 and the -bans must now be submitted through PBS via komp01. Direct logins are disabled except on komp01 komp07 and komp11. This page gives the relevant instructions. See http://www.openpbs.com for more information on PBS.

General

PBS is a queueing system similar to what is implemented at supercomputer centers etc. Jobs are submitted to the queue that reflects the resources needed, and a scheduler decides which ones to run when nodes become available. These decisions are made on the basis of length of run, how long a job has been waiting, and fair sharing of resources among different users.

Monitoring nodes and queues: xpbs and xpbsmon

Two X-windows based tools are available on komp. xpbsmon shows the status of the various nodes (free, used, down etc.) and xpbs shows the various queues and jobs in them. By default xpbs shows only your jobs, but you can set it to show all users' jobs. For non-graphical information use qstat (type 'man qstat' for more details).

The queues

Presently there are 6 queues. The details may be adjusted based on user experience.

Rather than having different queues for different length runs, you can specify the time required when submitting a job (see below). If you don't specify a time then it defaults to 24 hours of komp-equivalent time (i.e., on the faster cpus time passes more rapidly in proportion to their speed). The maximum run duration is 1 or 2 weeks of komp-time depending on the queue. After your time has elapsed PBS kills the job regardless of whether it has finished, so it is best to specify longer than you need (but not excessively long because the scheduler gives higher priority to shorter jobs). Time is measured as (scaled) wall clock time, not CPU time (since there is only 1 job/cpu these should be ~the same unless a job hangs up).

How to submit jobs: qsub

e.g., qsub -q n1fast myscript

This submits script file myscript to queue n1fast. When a cpu gets allocated, PBS will start a login shell on that PC; the shell executes the script. Thus the script must contain the list of unix commands needed to run your job from a log in. A minimalist script file might contain something like:

cd /home/pjt/+stag ; cp par.test par ; stag3d

To specify run (wall/clock) time, use the -l option, e.g., qsub -q n1fast -l walltime=48:00:00 myscript

To delete a job, use qdel, e.g., qdel 123.komp01.ess.ucla.edu

To monitor the job use qstat or xpbs as discussed earlier.

MPI jobs

mpirun must specify the number of processors (the same as the queue specification) but the nodes are chosen automatically, e.g.,

qsub -q n16 myscript

myscript contains: "cd /home/pjt/+stag; cp par.test par; mpirun -np 16 stag3d"

A slight idiosyncracy of PBS on dual-CPU PCs is that it will spread parallel jobs across multiple PCs with only one process per PC. So, a 4-cpu job might get put on komp02,03,04,05. It does, however, know about the dual CPUs, so another 4-cpu job might get put on the same PCs at the same time.

 

.login files

If your .login or .cshrc file sets terminal characteristics it may be necessary to bypass that for PBS login processes using an "if" block:

if ( ! $?PBS_ENVIRONMENT ) then

<terminal setting stuff>

end if