The lsb.queues file defines the batch queues in an openlava cluster.
This file is optional; if no queues are configured, openlava creates a queue named default, with all parameters set to default values.
o lsb.queues Structure
OS Memory Limit Enforcement
Each queue definition begins with the line Begin Queue and ends with the line End Queue. The queue name must be specified; all other parameters are optional.
ADMINISTRATORS = user_name | user_group ...
List of queue administrators.
Queue administrators can perform operations on any users job in the queue, as well as on the queue itself.
Undefined (you must be a cluster administrator to operate on this queue).
CHKPNT = chkpnt_dir [chkpnt_period]
Enables automatic checkpointing.
The checkpoint directory is the directory where the checkpoint files are created. Specify an absolute path or a path relative to CWD, do not use environment variables.
Specify the checkpoint period in minutes.
Job-level checkpoint parameters override queue-level checkpoint parameters.
CORELIMIT = integer .SS Description
The per-process (hard) core file size limit (in KB) for all of the processes belonging to a job from this queue (see getrlimit(2)).
CPULIMIT = [default_limit] maximum_limit .PP where default_limit and maximum_limit are:
[hours:]minutes[/host_name | /host_model]
Maximum normalized CPU time and optionally, the default normalized CPU time allowed for all processes of a job running in this queue. The name of a host or host model specifies the CPU time normalization host to use.
If a job dynamically spawns processes, the CPU time used by these processes is accumulated over the life of the job.
Processes that exist for fewer than 30 seconds may be ignored.
By default, if a default CPU limit is specified, jobs submitted to the queue without a job-level CPU limit are killed when the default CPU limit is reached.
If you specify only one limit, it is the maximum, or hard, CPU limit. If you specify two limits, the first one is the default, or soft, CPU limit, and the second one is the maximum CPU limit. The number of minutes may be greater than 59. Therefore, three and a half hours can be specified either as 3:30 or 210.
You can define whether the CPU limit is a per-process limit enforced by the OS or a per-job limit enforced by openlava with LSB_JOB_CPULIMIT in lsf.conf.
DATALIMIT = [default_limit] maximum_limit .SS Description
The per-process data segment size limit (in KB) for all of the processes belonging to a job from this queue (see getrlimit(2)).
By default, if a default data limit is specified, jobs submitted to the queue without a job-level data limit are killed when the default data limit is reached.
If you specify only one limit, it is the maximum, or hard, data limit. If you specify two limits, the first one is the default, or soft, data limit, and the second one is the maximum data limit
DEFAULT_HOST_SPEC = host_name | host_model .SS Description
The default CPU time normalization host for the queue.
The CPU factor of the specified host or host model will be used to normalize the CPU time limit of all jobs in the queue, unless the CPU time normalization host is specified at the job level.
DESCRIPTION = text .SS Description
Description of the job queue that will be displayed by bqueues -l. .PP This description should clearly describe the service features of this queue, to help users select the proper queue for each job.
The text can include any characters, including white space. The text can be extended to multiple lines by ending the preceding line with a backslash (. The maximum length for the text is 512 characters.
DISPATCH_WINDOW = time_window ...
The time windows in which jobs from this queue are dispatched. Once dispatched, jobs are no longer affected by the dispatch window.
Undefined (always open).
EXCLUSIVE = Y | N .SS Description
If Y, specifies an exclusive queue.
Jobs submitted to an exclusive queue with bsub -x will only be dispatched to a host that has no other openlava jobs running.
FILELIMIT = integer .SS Description
The per-process (hard) file size limit (in KB) for all of the processes belonging to a job from this queue (see getrlimit(2)).
HJOB_LIMIT = integer .SS Description
Per-host job slot limit.
Maximum number of job slots that this queue can use on any host. This limit is configured per host, regardless of the number of processors it may have.
This may be useful if the queue dispatches jobs that require a node- locked license. If there is only one node-locked license per host then the system should not dispatch more than one job to the host even if it is a multiprocessor host.
The following will run a maximum of one job on each of hostA, hostB, and hostC:
HJOB_LIMIT = 1
HOSTS=hostA hostB hostC
HOSTS = [~]host_name[+pref_level] | [~]host_group[+pref_level] | others[+pref_level] | all | none ...
A space-separated list of hosts, host groups, and host partitions on which jobs from this queue can be run. All the members of the host list should either belong to a single host partition or not belong to any host partition. Otherwise, job scheduling may be affected.
Any item can be followed by a plus sign (+) and a positive number to indicate the preference for dispatching a job to that host, host group, or host partition. A higher number indicates a higher preference. If a host preference is not given, it is assumed to be 0. Hosts at the same level of preference are ordered by load.
Use the keyword others to indicate all hosts not explicitly listed.
Use the not operator (~) to exclude hosts or host groups from the queue. This is useful if you have a large cluster but only want to exclude a few hosts from the queue definition.
Use the keyword all to indicate all hosts not explicitly excluded.
Host preferences specified by bsub -m override the queue specification.
HOSTS = hostA+1 hostB hostC+1 GroupX+3
This example defines three levels of preferences: run jobs on hosts in GroupX as much as possible, otherwise run on either hostA or hostC if possible, otherwise run on hostB. Jobs should not run on hostB unless all other hosts are too busy to accept more jobs.
HOSTS = hostD+1 others
Run jobs on hostD as much as possible, otherwise run jobs on the least-loaded host available.
HOSTS = Group1 ~hostA hostB hostC
Run jobs on hostB, hostC, and all hosts in Group1 except for hostA.
HOSTS = all ~group2 ~hostA
Run jobs on all hosts in the cluster, except for hostA and the hosts in group2.
all (the queue can use all hosts in the cluster, and every host has equal preference).
IGNORE_DEADLINE = Y .SS Description
If Y, disables deadline constraint scheduling (starts all jobs regardless of deadline contraints).
INTERACTIVE = NO | ONLY .SS Description
Causes the queue to reject interactive batch jobs (NO) or accept nothing but interactive batch jobs (ONLY).
Interactive batch jobs are submitted via bsub -I.
Undefined (the queue accepts both interactive and non-interactive jobs).
JOB_ACCEPT_INTERVAL = integer .SS Description
The number of dispatch turns to wait after dispatching a job to a host, before dispatching a second job to the same host. By default, a dispatch turn lasts 60 seconds (MBD_SLEEP_TIME in lsb.params).
If 0 (zero), a host may accept more than one job in each dispatch turn. By default, there is no limit to the total number of jobs that can run on a host, so if this parameter is set to 0, a very large number of jobs might be dispatched to a host all at once. You may notice performance problems if this occurs.
JOB_ACCEPT_INTERVAL set at the queue level (lsb.queues) overrides JOB_ACCEPT_INTERVAL set at the cluster level (lsb.params).
Undefined (the queue uses JOB_ACCEPT_INTERVAL defined in lsb.params, which has a default value of 1).
JOB_CONTROLS = SUSPEND[signal | command | CHKPNT] RESUME[signal | command] TERMINATE[signal | command | CHKPNT]
o CHKPNT is a special action, which causes the system to checkpoint the job. If the SUSPEND action is CHKPNT, the job is checkpointed and then stopped by sending the SIGSTOP signal to the job automatically.
o signal is a UNIX signal name (such as SIGSTOP or SIGTSTP).
o command specifies a /bin/sh command line to be invoked. Do not specify a signal followed by an action that triggers the same signal (for example, do not specify JOB_CONTROLS=TERMINATE[bkill] or JOB_CONTROLS=TERMINATE[brequeue]). This will cause a deadlock between the signal and the action.
Changes the behaviour of the SUSPEND, RESUME, and TERMINATE actions in openlava.
For SUSPEND and RESUME, if the action is a command, the following points should be considered:
o The contents of the configuration line for the action are run with /bin/sh -c so you can use shell features in the command.
o The standard input, output, and error of the command are redirected to the NULL device.
o The command is run as the user of the job.
o All environment variables set for the job are also set for the command action. The following additional environment variables are set:
o LSB_JOBPGIDS -- a list of current process group IDs of the job
o LSB_JOBPIDS --a list of current process IDs of the job
For the SUSPEND action command, the following environment variable is also set:
o LSB_SUSP_REASONS -- an integer representing a bitmap of suspending reasons as defined in lsbatch.h The suspending reason can allow the command to take different actions based on the reason for suspending the job.
On LINUX, by default, SUSPEND sends SIGTSTP for parallel or interactive jobs and SIGSTOP for other jobs. RESUME sends SIGCONT. TERMINATE sends SIGINT, SIGTERM and SIGKILL in that order.
JOB_STARTER = starter [starter] ["%USRCMD"] [starter]
Creates a specific environment for submitted jobs prior to execution.
starter is any executable that can be used to start the job (i.e., can accept the job as an input argument). Optionally, additional strings can be specified.
By default, the user commands run after the job starter. A special string, %USRCMD, can be used to represent the position of the users job in the job starter command line. The %USRCMD string may be enclosed with quotes or followed by additional commands.
JOB_STARTER = csh -c "%USRCMD;sleep 10"
In this case, if a user submits a job
% bsub myjob arguments
the command that actually runs is:
% csh -c "myjob arguments;sleep 10"
Undefined (no job starter).
load_index = loadSched[/loadStop]
Specify io, it, ls, mem, pg, r15s, r1m, r15m, swp, tmp, ut, or a non- shared custom external load index. Specify multiple lines to configure thresholds for multiple load indices.
Specify io, it, ls, mem, pg, r15s, r1m, r15m, swp, tmp, ut, or a non- shared custom external load index as a column. Specify multiple columns to configure thresholds for multiple load indices.
Scheduling and suspending thresholds for the specified dynamic load index.
The loadSched condition must be satisfied before a job is dispatched to the host. If a RESUME_COND is not specified, the loadSched condition must also be satisfied before a suspended job can be resumed.
If the loadStop condition is satisfied, a job on the host will be suspended.
The loadSched and loadStop thresholds permit the specification of conditions using simple AND/OR logic. Any load index that does not have a configured threshold has no effect on job scheduling.
openlava will not suspend a job if the job is the only batch job running on the host and the machine is interactively idle (it>0).
The r15s, r1m, and r15m CPU run queue length conditions are compared to the effective queue length as reported by lsload -E, which is normalized for multiprocessor hosts. Thresholds for these parameters should be set at appropriate levels for single processor hosts.
These two lines translate into a loadSched condition of
mem>=2.0 && swap>=200
and a loadStop condition of
mem < 10 || swap < 30
MEMLIMIT = [default_limit] maximum_limit .SS Description
The per-process (hard) process resident set size limit (in KB) for all of the processes belonging to a job from this queue (see getrlimit(2)).
Sets the maximum amount of physical memory (resident set size, RSS) that may be allocated to a process.
By default, if a default memory limit is specified, jobs submitted to the queue without a job-level memory limit are killed when the default memory limit is reached.
If you specify only one limit, it is the maximum, or hard, memory limit. If you specify two limits, the first one is the default, or soft, memory limit, and the second one is the maximum memory limit.
openlava has two methods of enforcing memory usage:
o OS Memory Limit Enforcement
o openlava Memory Limit Enforcement
OS memory limit enforcement is the default MEMLIMIT behavior and does not require further configuration. OS enforcement usually allows the process to eventually run to completion. openlava passes MEMLIMIT to the OS which uses it as a guide for the system scheduler and memory allocator. The system may allocate more memory to a process if there is a surplus. When memory is low, the system takes memory from and lowers the scheduling priority (re-nice) of a process that has exceeded its declared MEMLIMIT. Only available on systems that support RUSAGE_RSS for setrlimit().
To enable openlava memory limit enforcement, set LSB_MEMLIMIT_ENFORCE in lsf.conf to y. openlava memory limit enforcement explicitly sends a signal to kill a running process once it has allocated memory past MEMLIMIT.
You can also enable openlava memory limit enforcement by setting LSB_JOB_MEMLIMIT in lsf.conf to y. The difference between LSB_JOB_MEMLIMIT set to y and LSB_MEMLIMIT_ENFORCE set to y is that with LSB_JOB_MEMLIMIT, only the per-job memory limit enforced by openlava is enabled. The per-process memory limit enforced by the OS is disabled. With LSB_MEMLIMIT_ENFORCE set to y, both the per-job memory limit enforced by openlava and the per-process memory limit enforced by the OS are enabled.
Available for all systems on which openlava collects total memory usage.
The following configuration defines a queue with a memory limit of 5000 KB:
QUEUE_NAME = default
DESCRIPTION = Queue with memory limit of 5000 kbytes
MEMLIMIT = 5000
MIG = minutes .SS Description
Enables automatic job migration and specifies the migration threshold, in minutes.
If a checkpointable or rerunnable job dispatched to the host is suspended (SSUSP state) for longer than the specified number of minutes, the job is migrated (unless another job on the same host is being migrated). A value of 0 (zero) specifies that a suspended job should be migrated immediately.
If a migration threshold is defined at both host and queue levels, the lower threshold is used.
Undefined (no automatic job migration).
NEW_JOB_SCHED_DELAY = seconds .SS Description
The maximum or minimum length of time that a new job waits before being dispatched; the behavior depends on whether the delay period specified is longer or shorter than a regular dispatch interval (MBD_SLEEP_TIME in lsb.params, 60 seconds by default).
o If less than the dispatch interval, specifies the maximum number of seconds to wait, after a new job is submitted, before starting a new dispatch turn and scheduling the job. Usually, this causes openlava to schedule dispatch turns more frequently. You might notice performance problems (affecting the entire cluster) if this value is set too low in a busy queue.
o If 0 (zero), starts a new dispatch turn as soon as a job is submitted to this queue (affecting the entire cluster).
o If greater than the dispatch interval, specifies the minimum number of seconds to wait, after a new job is submitted, before scheduling the job. Has no effect of the timing of the dispatch turns, but new jobs in this queue are always delayed by one or more dispatch turns.
NICE = integer .SS Description
Adjusts the LINUX scheduling priority at which jobs from this queue execute.
The default value of 0 (zero) maintains the default scheduling priority for UNIX interactive jobs. This value adjusts the run-time priorities for batch jobs on a queue-by-queue basis, to control their effect on other batch or interactive jobs. See the nice(1) manual page for more details.
PJOB_LIMIT = integer .SS Description
Per-processor job slot limit for the queue.
Maximum number of job slots that this queue can use on any processor. This limit is configured per processor, so that multiprocessor hosts automatically run more jobs.
POST_EXEC = command .SS Description
A command run on the execution host after the job.
The entire contents of the configuration line of the pre- and post- execution commands are run under /bin/sh -c, so shell features can be used in the command.
The pre- and post-execution commands are run in /tmp.
Standard input and standard output and error are set to:
The output from the pre- and post-execution commands can be explicitly redirected to a file for debugging purposes.
The PATH environment variable is set to:
"/bin /usr/bin /sbin/usr/sbin"
No post-execution commands
PRE_EXEC = command .SS Description
A command run on the execution host before the job.
To specify a pre-execution command at the job level, use bsub -E. If both queue and job level pre-execution commands are specified, the job level pre-execution is run after the queue level pre-execution command.
o The entire contents of the configuration line of the pre- and post- execution commands are run under /bin/sh -c, so shell features can be used in the command.
o The pre- and post-execution commands are run in /tmp.
o Standard input and standard output and error are set to: /dev/null
o The output from the pre- and post-execution commands can be explicitly redirected to a file for debugging purposes.
o The PATH environment variable is set to: /bin /usr/bin /sbin/usr/sbin
o If the pre-execution command exits with a non-zero exit code, it is considered to have failed, and the job is requeued to the head of the queue. This feature can be used to implement customized scheduling by having the pre-execution command fail if conditions for dispatching the job are not met.
o Other environment variables set for the job are also set for the pre- and post-execution commands.
No pre-execution commands
PROCESSLIMIT = [default_limit] maximum_limit .SS Description
Limits the number of concurrent processes that can be part of a job.
By default, if a default process limit is specified, jobs submitted to the queue without a job-level process limit are killed when the default process limit is reached.
If you specify only one limit, it is the maximum, or hard, process limit. If you specify two limits, the first one is the default, or soft, process limit, and the second one is the maximum process limit.
PROCLIMIT = [minimum_limit [default_limit]] maximum_limit .SS Description
Maximum number of slots that can be allocated to a job. For parallel jobs, the maximum number of processors that can be allocated to t he job.
Optionally specifies the minimum and default number of job slots.
Jobs that specify fewer slots than the minimum PROCLIMIT or more slots than the maximum PROCLIMIT cannot use this queue and are rejected.
All limits must be positive numbers greater than or equal to 1 that satisfy the following relationship:
1 <= minimum <= default <= maximum .PP You can specify up to three limits in the PROCLIMIT parameter:
If you specify one limit, it is the maximum processor limit. The minimum and default limits are set to 1.
If you specify two limits, the first is the minimum processor limit, and the second one is the maximum. The default is set equal to the minimum. The minimum must be less than or equal to the maximum.
If you specify three limits, the first is the minimum processor limit, the second is the default processor limit, and the third is the maximum.The minimum must be less than the default and the maximum.
Unlimited, the default number of slots is 1.
QJOB_LIMIT = integer .SS Description
Job slot limit for the queue. Total number of job slots that this queue can use.
QUEUE_NAME = string .SS Description
Required. Name of the queue.
Specify any ASCII string up to 40 characters long. You can use letters, digits, underscores (_) or dashes (-). You cannot use blank spaces. You cannot specify the reserved name default.
You must specify this parameter to define a queue. The default queue automatically created by openlava is named default.
REQUEUE_EXIT_VALUES = [exit_code ...] [EXCLUDE(exit_code ...)]
Enables automatic job requeue and sets the LSB_EXIT_REQUEUE environment variable.
Separate multiple exit codes with spaces. Define an exit code as EXCLUDE(exit_code) to enable exclusive job requeue. Exclusive job requeue does not work for parallel jobs.
Jobs are requeued to the head of the queue from which they were dispatched. The output from the failed run is not saved, and the user is not notified by openlava.
A job terminated by a signal is not requeued.
If MBD is restarted, it will not remember the previous hosts from which the job exited with an exclusive requeue exit code. In this situation, it is possible for a job to be dispatched to hosts on which the job has previously exited with an exclusive exit code.
Automatic job requeue and exclusive job requeue are described in the openlava Administrators Guide.
means that jobs with exit code 30 are requeued, jobs with exit code 20 are requeued exclusively, and jobs with any other exit code are not requeued.
Undefined (jobs in this queue are not requeued)
RERUNNABLE = yes | no .SS Description
If yes, enables automatic job rerun (restart).
RES_REQ = res_req .SS Description
Resource requirements used to determine eligible hosts. Specify a resource requirement string as usual. The resource requirement string lets you specify conditions in a more flexible manner than using the load thresholds.
The select section defined at the queue level must be satisfied at in addition to any job-level requirements or load thresholds.
The rusage section defined at the queue level overrides the rusage section defined at the job level, and jobs are rejected if they specify resource reservation requirements that exceed the requirements specified at the queue level.
The order section defined at the queue level is ignored if any resource requirements are specified at the job level (if the job-level resource requirements do not include the order section, the default order, r15s:pg, is used instead of the queue-level resource requirement).
The span section defined at the queue level is ignored if the span section is also defined at the job level.
If RES_REQ is defined at the queue level and there are no load thresholds defined, the pending reasons for each individual load index will not be displayed by bjobs.
select[type==local] order[r15s:pg]. If this parameter is defined and a host model or Boolean resource is specified, the default type will be any.
RESUME_COND = res_req .PP Use the select section of the resource requirement string to specify load thresholds. All other sections are ignored.
openlava automatically resumes a suspended (SSUSP) job in this queue if the load on the host satisfies the specified conditions.
If RESUME_COND is not defined, then the loadSched thresholds are used to control resuming of jobs. The loadSched thresholds are ignored, when resuming jobs, if RESUME_COND is defined.
RUN_WINDOW = time_window ...
Time periods during which jobs in the queue are allowed to run.
When the window closes, openlava suspends jobs running in the queue and stops dispatching jobs from the queue. When the window reopens, openlava resumes the suspended jobs and begins dispatching additional jobs.
Undefined (queue is always active)
RUNLIMIT = [default_limit] maximum_limit .PP where default_limit and maximum_limit are:
[hours:]minutes[/host_name | /host_model]
The maximum run limit and optionally the default run limit. The name of a host or host model specifies the run time normalization host to use.
By default, jobs that are in the RUN state for longer than the specified maximum run limit are killed by openlava. You can optionally provide your own termination job action to override this default.
Jobs submitted with a job-level run limit (bsub -W) that is less than the maximum run limit are killed when their job-level run limit is reached. Jobs submitted with a run limit greater than the maximum run limit are rejected by the queue.
If a default run limit is specified, jobs submitted to the queue without a job-level run limit are killed when the default run limit is reached.
If you specify only one limit, it is the maximum, or hard, run limit. If you specify two limits, the first one is the default, or soft, run limit, and the second one is the maximum run limit. The number of minutes may be greater than 59. Therefore, three and a half hours can be specified either as 3:30, or 210.
SLOT_RESERVE = MAX_RESERVE_TIME[integer] .SS Description
Enables processor reservation and specifies the number of dispatch turns over which a parallel job can reserve job slots.
After this time, if a job has not accumulated enough job slots to start, it releases all its reserved job slots. This means a job cannot reserve job slots for more than (integer * MBD_SLEEP_TIME) seconds.
MBD_SLEEP_TIME is defined in lsb.params; the default value is 60 seconds.
SLOT_RESERVE = MAX_RESERVE_TIME
This example specifies that parallel jobs have up to 5 dispatch turns to reserve sufficient job slots (equal to 5 minutes, by default).
Undefined (no processor reservation)
STACKLIMIT = integer .SS Description
The per-process (hard) stack segment size limit (in KB) for all of the processes belonging to a job from this queue (see getrlimit(2)).
STOP_COND = res_req .PP Use the select section of the resource requirement string to specify load thresholds. All other sections are ignored.
openlava automatically suspends a running job in this queue if the load on the host satisfies the specified conditions.
o openlava will not suspend the only job running on the host if the machine is interactively idle (it > 0).
o openlava will not suspend a forced job (brun -f).
o openlava will not suspend a job because of paging rate if the machine is interactively idle.
If STOP_COND is specified in the queue and there are no load thresholds, the suspending reasons for each individual load index will not be displayed by bjobs.
STOP_COND= select[((!cs && it < 5) || (cs && mem < 15 && swap < 50))]
In this example, assume "cs" is a Boolean resource indicating that the host is a computer server. The stop condition for jobs running on computer servers is based on the availability of swap memory. The stop condition for jobs running on other kinds of hosts is based on the idle time.
SWAPLIMIT = integer .SS Description
The amount of total virtual memory limit (in KB) for a job from this queue.
This limit applies to the whole job, no matter how many processes the job may contain.
The action taken when a job exceeds its SWAPLIMIT or PROCESSLIMIT is to send SIGQUIT, SIGINT, SIGTERM, and SIGKILL in sequence. For CPULIMIT, SIGXCPU is sent before SIGINT, SIGTERM, and SIGKILL.
Configures the queue to invoke the TERMINATE action instead of the SUSPEND action in the specified circumstance.
TERMINATE_WHEN = WINDOW | LOAD .RS
o WINDOW -- kills jobs if the run window closes.
o LOAD -- kills jobs when the load exceeds the suspending thresholds.
Set TERMINATE_WHEN to WINDOW to define a night queue that will kill jobs if the run window closes:
NAME = night
RUN_WINDOW = 20:00-08:00
TERMINATE_WHEN = WINDOW
JOB_CONTROLS = TERMINATE[kill -KILL $LS_JOBPGIDS; mail - s "job $LSB_JOBID killed by queue run window" $USER < /dev/null]
UJOB_LIMIT = integer .SS Description
Per-user job slot limit for the queue. Maximum number of job slots that each user can use in this queue.
USERS = all | user_name | user_group ...
A list of users or user groups that can submit jobs to this queue
Use the reserved word all to specify all openlava users.
openlava cluster administrators can submit jobs to this queue or switch any users jobs into this queue, even if they are not listed.
lsf.cluster(5), lsf.conf(5), lsb.params(5), lsb.hosts(5), lsb.users(5), busers(1), bugroup(1), bchkpnt(1), nice(1), getgrnam(3), getrlimit(2), bmgroup(1), bqueues(1), bhosts(1), bsub(1), lsid(1), mbatchd(8), badmin(8)
|lsb.queues (5)||"openlava Version 2.0 - Jan 2012"|