NQS allows one to submit batch jobs to queues on local or remote machines and have the log file returned to the originating machine or another machine. After submitting the request, the user can watch the progress of the request. The user can also affect the job after it is submitted by holding a queued job from being scheduled, releasing a held job, suspending a running request, resuming a suspended request, or deleting a queued or running request.
There are two main types of queues: batch and pipe. A batch queue is an execution queue where the request actually runs. A pipe queue provides routing capabilities; when a request is submitted to a pipe queue it is passed on to another pipe queue for further routing or a batch queue for execution on the same or another machine.
The core NQS user commands are as follows:
Qsub - Submit an NQS job Qdel - Delete an NQS job Qstat - Determine the status of a job or a queue
Here is a sample NQS script with embedded switches:
Note that the lines starting with "#" appear as comments to the shell, but that Qsub interprets the lines starting with "# QSUB" as indicators that a Qsub switch follows. This script indicates that stdout and stderr should be combined into one file (-eo), that the request should be called "cvtabc" (-r), and that the job should be queued to the queue called "batch" (-q). The final QSUB line without any parameters indicates to Qsub that no more switches follow.# QSUB -eo # QSUB -r cvtabc # QSUB -q batch # QSUB . . Various script commands follow here
If this script was called scriptname.sh, it could be submitted using the command:
If there was a similar script called anotherscript.sh without the embedded NQS commands, it could be submitted using the following command and run exactly as the above script:qsub scriptname.sh
It is also possible to have switches both imbedded in the script and on the command line.qsub -eo -r cvtabc -q batch anotherscript.sh
Here are several of the most often used Qsub switches:
Switch Action
-a run request after stated time
-e direct stderr output to the given destination
-eo combine stdout and stderr in one file
-o direct stdout output to the given destination
-r give the request a name
-q indicate to which queue to submit the job
All of the Qsub switches are explained in detail in the Qsub man pages.The script file is spooled when you submit it, so you can modify the script after submission and not affect the request.
By default the sequence number of the request is printed when Qsub com- pletes its processing. This number combined with the hostname makes up the unique identifier for the request.
The general purpose command for determining the status of queues and jobs is Qstat. To find out what queues are present on the local machine, use the following command:
If you add the "-b" switch you will get a brief version of the information, and if you add the "-l" switch you will see a lot more. Sample output from qstat -x is:qstat -x
batch@beaker.monsanto.com; type=BATCH; [ENABLED, INACTIVE]; pri=16 lim=1
0 exit; 0 run; 0 stage; 0 queued; 0 wait; 0 hold; 0 arrive;
User run limit= 1
helium@beaker.monsanto.com; type=PIPE; [ENABLED, INACTIVE]; pri=16 lim=1
0 depart; 0 route; 0 queued; 0 wait; 0 hold; 0 arrive;
Destset = {batch@helium};
The first queue is a batch queue, and jobs actually run in this queue. The
second queue is a pipe queue, which means jobs submitted to it are
transferred to another queue either on the same machine or another to exe-
cute. The destset on the helium queue indicates that the jobs submitted to
that queue are transferred to the batch queue on the node helium to run.If you want to learn more about queues on remote machines, use the command of the form:
which indicates that the request should be forwarded to the machine ddcs1 and the appropriate information printed on your screen.qstat -x @ddcs1
There are also switches which control the format of the output. The default Monsanto format is a single line for each request. The -s switch gives the standard COSMIC NQS format, and the -l switch provides much more detail in a long format.Qstat switch Effect -a show all requests -u username show request belonging to a specific user -o select jobs which originated on the local machine -d show jobs on all machines within the local NQS domain
The systems in the local NQS domain are listed in the file /usr/lib/nqs/nqs-domain (by default). This is a list of systems which can be considered a unit; jobs can be submitted between systems on the list. The -d switch then requests information from each system on the list.
This list can be modified by having a file called .qstat in your home directory which has the same format as the system-wide file, but has only the systems in which you are interested. Then you will get NQS status only from that list of systems.
The Qcat utility is also available to get information on the status of a job. It will list the spooled input script or the available output or error files. Since applications may not flush the stdout or stderr streams frequently, the available information may be limited, but it can be helpful in indicating how a job is progressing.
Here is an example of the default qstat output:
Request I.D. Owner Queue Start Time Time Limit Total Time St -------------- ------ -------- -------- ----------- ---------- ---------- -- example 129 jrroma batch 4/30 10:11 4 04:00:00 0 00:00:00 RThe columns are self-explanatory, except perhaps, for the last one, which indicates the status of the request. Possible statuses include R for run- ning, Q for queued, H for holding, W for waiting, and S for suspended.
The -s switch gives the following information in the standard COSMIC NQS format:
And an example of output from the -l switch is as follows:batch@beaker.monsanto.com; type=BATCH; [ENABLED, RUNNING]; pri=16 lim=1 0 exit; 1 run; 0 stage; 0 queued; 0 wait; 0 hold; 0 arrive; User run limit= 1 REQUEST NAME REQUEST ID USER PRI STATE PGRP 1: example 129.beaker jrroma 31 RUNNING 7835 helium@beaker.monsanto.com; type=PIPE; [ENABLED, INACTIVE]; pri=16 lim=1 0 depart; 0 route; 0 queued; 0 wait; 0 hold; 0 arrive;
Again, information on the status of jobs on remote machines can be obtained by using the "@node" syntax to indicate where to get the information.batch@beaker.monsanto.com; type=BATCH; [ENABLED, RUNNING]; pri=16 lim=1 0 exit; 1 run; 0 stage; 0 queued; 0 wait; 0 hold; 0 arrive; User run limit= 1 Request 1: Name=example Id=129.beaker Owner=jrroma Priority=31 RUNNING Pgrp=7835 Created at Thu Apr 30 10:11:09 CDT 1992 Mail = [NONE] Mail address = jrroma@beaker Owner user name at originating machine = jrroma Request is not restartable, not recoverable. Broadcast = [NONE] Per-proc. core file size limit= [32 megabytes, 32 megabytes]Per-proc. data size limit= [32 megabytes, 32 megabytes] Per-proc. permanent file size limit= [500 megabytes, 500 megabytes] Per-proc. execution nice priority = 0 Per-proc. stack size limit= [32 megabytes, 32 megabytes] Per-proc. CPU time limit= [360000.0, 360000.0] Per-proc. working set limit= [32 megabytes, 32 megabytes] Standard-error access mode = EO Standard-output access mode = SPOOL Standard-output name = beaker:/usr2/jrroma/tmp/example.o129 Shell = DEFAULT Umask = 22 helium@beaker.monsanto.com; type=PIPE; [ENABLED, INACTIVE]; pri=16 lim=1 0 depart; 0 route; 0 queued; 0 wait; 0 hold; 0 arrive;
If this job is queued on beaker, the appropriate command is:
If the job is running, you must add the -k switch which indicates that the running job is to be killed.qdel 217.beaker
Local jobs can be deleted by the request name with the -r swich. The argu- ment to the -r switch is the request pattern to delete. If the -c switch is used with the -r switch, then the user is prompted to confirm the dele- tion of the job.
If this job submitted from beaker is now running on a remote machine, you will need to add the remote system name, or:
where ddcs1 is the name of the remote machine where the job is running. This will send a message to ddcs1 to delete the request 217 which ori- ginated on beaker.qdel -k 217.beaker@ddcs1
Core file size limit (-lc)
Data segment size limit (-ld)
Per-process permanent file size limit (-lf)
Nice value (-ln)
Stack segment size limit (-ls)
Per-process cpu time limit (-lt)
Working set limit (-lw)
Shell strategy = FREE