---
title: IBM Spectrum LSF
description: Monitor your IBM Spectrum LSF workloads.
breadcrumbs: Docs > Integrations > IBM Spectrum LSF
---

# IBM Spectrum LSF
Supported OS Integration version1.3.0
## Overview{% #overview %}

This check monitors [IBM Spectrum LSF](https://www.ibm.com/products/hpc-workload-management) using the Datadog Agent.

This integration gives an overview of the performance of your IBM Spectrum LSF environment. It also provides detailed information about running and completed jobs, slot utilization, and queues.

## Setup{% #setup %}

Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the [Autodiscovery Integration Templates](https://docs.datadoghq.com/containers/kubernetes/integrations/) for guidance on applying these instructions.

### Installation{% #installation %}

The IBM Spectrum LSF check is included in the [Datadog Agent](https://app.datadoghq.com/account/settings/agent/latest) package.

Install the Datadog Agent and configure the IBM Spectrum LSF check on the management host of your cluster. This integration monitors the entire cluster.

#### Additional Configuration on Linux{% #additional-configuration-on-linux %}

Add the `dd-agent` user as an LSF [administrator](https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=cluster-adding-administrators).

The integration runs commands such as `lsid`, `bhosts`, and `lsclusters`. In order to run these commands, the Agent needs them in its `PATH`. This is typically done by running `source $LSF_HOME/conf/profile.lsf`. However, the Datadog Agent uses upstart or systemd to orchestrate the `datadog-agent` service. You may need to add environment variables to the service configuration files:

1. To get the environment variables necessary for the Agent service, locate the `<LSF_TOP_DIR>/conf/profile.lsf` file and run the following command:

   ```
   env -i bash -c "source <LSF_TOP_DIR>/conf/profile.lsf; env"
   ```

Running this command outputs a list of environment variables necessary to run the IBM Spectrum LSF commands.

1. Add these environment variables to the configuration file for either systemd or upstart:

   - systemd: `/etc/datadog-agent/environment`. Here is an example configuration:

     ```
     LSF_SERVERDIR=<LSF_TOP_DIR>/10.1/linux3.10-glibc2.17-x86_64/etc
     LSF_ENVDIR=<LSF_TOP_DIR>/conf
     LSF_BINDIR=<LSF_TOP_DIR>/10.1/linux3.10-glibc2.17-x86_64/bin
     LSF_LIBDIR=<LSF_TOP_DIR>/10.1/linux3.10-glibc2.17-x86_64/lib
     PATH=<LSF_TOP_DIR>/10.1/linux3.10-glibc2.17-x86_64/etc:<LSF_TOP_DIR>/10.1/linux3.10-glibc2.17-x86_64/bin:/usr/local/bin:/usr/bin:/bin:.
     LD_LIBRARY_PATH=<LSF_TOP_DIR>/10.1/linux3.10-glibc2.17-x86_64/lib
     ```

   - upstart: `/etc/init/datadog-agent.conf`. (Note that each time there is an Agent update, `/etc/init/datadog-agent.conf` is wiped and needs to be updated again.) Here is an example configuration:

     ```
     description "Datadog Agent"
     
     start on started networking
     stop on runlevel [!2345]
     
     respawn
     respawn limit 10 5
     normal exit 0
     
     console log
     env DD_LOG_TO_CONSOLE=false
     env LSF_SERVERDIR=<LSF_TOP_DIR>/10.1/linux3.10-glibc2.17-x86_64/etc
     env LSF_ENVDIR=<LSF_TOP_DIR>/conf
     env LSF_BINDIR=<LSF_TOP_DIR>/10.1/linux3.10-glibc2.17-x86_64/bin
     env LSF_LIBDIR=<LSF_TOP_DIR>/10.1/linux3.10-glibc2.17-x86_64/lib
     env PATH=<LSF_TOP_DIR>/10.1/linux3.10-glibc2.17-x86_64/etc:<LSF_TOP_DIR>/10.1/linux3.10-glibc2.17-x86_64/bin:/usr/local/bin:/usr/bin:/bin:.
     env LD_LIBRARY_PATH=<LSF_TOP_DIR>/10.1/linux3.10-glibc2.17-x86_64/lib
     
     setuid dd-agent
     
     script
       exec /opt/datadog-agent/bin/agent/agent start -p /opt/datadog-agent/run/agent.pid
     end script
     
       rm -f /opt/datadog-agent/run/agent.pid
     end script
     ```

1. Restart the Agent.

View more information about setting environment variables for the Datadog Agent [here](https://docs.datadoghq.com/agent/guide/environment-variables/#using-environment-variables-in-systemd-units).

### Configuration{% #configuration %}

1. Edit the `ibm_spectrum_lsf.d/conf.yaml` file, in the `conf.d/` folder at the root of your Agent's configuration directory to start collecting your `ibm_spectrum_lsf` performance data. See the [sample ibm_spectrum_lsf.d/conf.yaml](https://github.com/DataDog/integrations-core/blob/master/ibm_spectrum_lsf/datadog_checks/ibm_spectrum_lsf/data/conf.yaml.example) for all available configuration options.

The IBM Spectrum LSF integration runs a series of management commands to collect data. To control which commands are run and which metrics are emitted, use the `metric_sources` configuration option. By default, data from the following commands are collected, but you can enable more optional metrics or opt out of collecting any set of metrics: `lsclusters`, `lshosts`, `bhosts`, `lsload`, `bqueues`, `bslots`, `bjobs`.

For example, if you want to only measure GPU-specific metrics, your `metrics_sources` will look like:

   ```gdscript3
     metric_sources:
       - lsload_gpu
       - bhosts_gpu
   ```

The `badmin_perfmon` metric source collects data from the `badmin perfmon view -json` command. This collects [overall statistics](https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=performance-monitor-metrics-in-real-time) about the cluster. To collect these metrics, performance collection must be enabled on your server using the `badmin perfmon start <COLLECTION_INTERVAL>` command. By default, the integration runs this command automatically (and stops collection once the Agent is turned off). However, you can turn off this behavior by setting `badmin_perfmon_auto: false`.

Since collecting these metrics can add extra load on your server, we recommend setting a higher collection interval for these metrics, or at least 60. The exact interval depends on the load and size of your cluster. View IBM Spectrum LSF's [recommendations](https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=tips-maintaining-cluster-performance) for managing high query load.

Similarly, the `bhist` command collects information about completed jobs, which can be query-intensive, so we recommend monitoring this command with the `min_collection_interval` set to 60 or higher. The `bhist_details` command involves running `bhist -l` for each completed job, so we recommend monitoring it with a higher `min_collection_interval` along with `bhist`.

Here is a sample configuration monitoring all available metrics:

   ```gdscript3
   instances:
   - cluster_name: test-cluster
     metric_sources:
       - lsclusters
       - lshosts
       - bhosts
       - lsload
       - bqueues
       - bslots
       - bjobs
       - lsload_gpu
       - bhosts_gpu
   - cluster_name: test-cluster
     badmin_perfmon_auto: false
     metric_sources:
       - badmin_perfmon
       - bhist
       - bhist_details
     min_collection_interval: 60
   ```

1. [Restart the Agent](https://docs.datadoghq.com/agent/configuration/agent-commands/#start-stop-and-restart-the-agent).

#### Logs{% #logs %}

The IBM Spectrum LSF integration collects two types of logs: system logs and job logs.

##### Collecting system logs{% #collecting-system-logs %}

System logs provide diagnostic information from the IBM Spectrum LSF [daemons](https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=files-about-lsf-log#concept_bvz_5gb_kv__title__2). You can collect them from the management host and execution hosts. To collect system logs:

1. Enable log collection in your `datadog.yaml` file:

   ```yaml
   logs_enabled: true
   ```

1. Uncomment and edit the logs configuration block in your `ibm_spectrum_lsf.d/conf.yaml` file. For example:

   ```yaml
     - type: file
       source: ibm_spectrum_lsf
       tags:
        - log_type:system
       path: <LSF_TOP_DIR>/log/*
       service: <SERVICE_NAME>
   ```

##### Collecting job logs{% #collecting-job-logs %}

{% alert level="info" %}
Job logs are located on the job submission host, which is typically different from the management host. Ensure that the Datadog Agent is installed and running on the host where jobs are submitted.
{% /alert %}

Job logs are generated by job tasks and are useful for debugging failed jobs. To collect job logs:

1. Ensure that the IBM Spectrum LSF job log files you want to monitor are named `<JOB_ID>.out` and `<JOB_ID>.err`. Configure this when submitting jobs by using the following [`bsub`](https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=bsub-options) options:

`bsub -o %J.out -e %J.err`

1. Enable log collection in your `datadog.yaml` file:

   ```yaml
   logs_enabled: true
   ```

1. Uncomment and edit the logs configuration block in your `ibm_spectrum_lsf.d/conf.yaml` file. For example:

   ```yaml
    logs:
     - type: file
       source: ibm_spectrum_lsf
       tags:
       - log_type:job
       path:
       - <PATH_TO_JOB_LOGS>/*.out
       - <PATH_TO_SYSTEM_LOGS>/*.err
       service: <SERVICE_NAME>
   ```

### Validation{% #validation %}

[Run the Agent's status subcommand](https://docs.datadoghq.com/agent/configuration/agent-commands/#agent-status-and-information) and look for `ibm_spectrum_lsf` under the Checks section.

## Data Collected{% #data-collected %}

### Metrics{% #metrics %}

|  |
|  |
| **ibm\_spectrum\_lsf.can\_connect**(gauge)                                   | Whether or not the integration can run LSF commands [Always reported]                                                                                                                                                                       |
| **ibm\_spectrum\_lsf.cluster.hosts**(gauge)                                  | The number of hosts in the cluster. [Reported by lsclusters]                                                                                                                                                                                |
| **ibm\_spectrum\_lsf.cluster.servers**(gauge)                                | The number of servers in the cluster. [Reported by lsclusters]                                                                                                                                                                              |
| **ibm\_spectrum\_lsf.cluster.status**(gauge)                                 | The status of the cluster. [Reported by lsclusters]                                                                                                                                                                                         |
| **ibm\_spectrum\_lsf.gpu.ecc**(gauge)                                        | Number of ECC errors. [Reported by lsload_gpu]                                                                                                                                                                                              |
| **ibm\_spectrum\_lsf.gpu.error**(gauge)                                      | Whether or not the GPU is in an error state. [Reported by lsload_gpu]                                                                                                                                                                       |
| **ibm\_spectrum\_lsf.gpu.mem.total**(gauge)                                  | The total memory available on the GPU. [Reported by lsload_gpu]                                                                                                                                                                             |
| **ibm\_spectrum\_lsf.gpu.mem.used**(gauge)                                   | The total memory used on the GPU. [Reported by lsload_gpu]                                                                                                                                                                                  |
| **ibm\_spectrum\_lsf.gpu.mem.utilization**(gauge)                            | The percentage of the GPU's memory currently in use. [Reported by lsload_gpu]                                                                                                                                                               |
| **ibm\_spectrum\_lsf.gpu.mode**(gauge)                                       | The GPU's compute mode, 0 is default. [Reported by lsload_gpu]                                                                                                                                                                              |
| **ibm\_spectrum\_lsf.gpu.power**(gauge)                                      | Current power draw of the GPU in watts. [Reported by lsload_gpu]*Shown as watt*                                                                                                                                                             |
| **ibm\_spectrum\_lsf.gpu.pstate**(gauge)                                     | Current performance state of the GPU. [Reported by lsload_gpu]                                                                                                                                                                              |
| **ibm\_spectrum\_lsf.gpu.status**(gauge)                                     | Whether or not the GPU is OK. [Reported by lsload_gpu]                                                                                                                                                                                      |
| **ibm\_spectrum\_lsf.gpu.temperature**(gauge)                                | The current temperature of the GPU. [Reported by lsload_gpu]*Shown as degree celsius*                                                                                                                                                       |
| **ibm\_spectrum\_lsf.gpu.utilization**(gauge)                                | The current GPU utilization. [Reported by lsload_gpu]                                                                                                                                                                                       |
| **ibm\_spectrum\_lsf.host.cpu\_factor**(gauge)                               | The relative CPU performance factor. [Reported by lshosts]                                                                                                                                                                                  |
| **ibm\_spectrum\_lsf.host.is\_server**(gauge)                                | Indicates whether the host is a server or client host. [Reported by lshosts]                                                                                                                                                                |
| **ibm\_spectrum\_lsf.host.max\_mem**(gauge)                                  | The maximum amount of physical memory available for user processes. [Reported by lshosts]                                                                                                                                                   |
| **ibm\_spectrum\_lsf.host.max\_swap**(gauge)                                 | The total available swap space. [Reported by lshosts]                                                                                                                                                                                       |
| **ibm\_spectrum\_lsf.host.max\_temp**(gauge)                                 | The maximum /tmp space in MB configured on a host. [Reported by lshosts]*Shown as megabyte*                                                                                                                                                 |
| **ibm\_spectrum\_lsf.host.num\_cores**(gauge)                                | The number of cores per processor that is configured on a host. [Reported by lshosts]*Shown as core*                                                                                                                                        |
| **ibm\_spectrum\_lsf.host.num\_cpus**(gauge)                                 | The number of processors on this host. [Reported by lshosts]                                                                                                                                                                                |
| **ibm\_spectrum\_lsf.host.num\_procs**(gauge)                                | The number of physical processors per CPU configured on a host. [Reported by lshosts]                                                                                                                                                       |
| **ibm\_spectrum\_lsf.host.num\_threads**(gauge)                              | The number of threads per core that is configured on a host. [Reported by lshosts]*Shown as thread*                                                                                                                                         |
| **ibm\_spectrum\_lsf.job.completed.details.avg\_memory**(gauge)              | The average memory used by the completed job. [Reported by bhist_details]*Shown as megabyte*                                                                                                                                                |
| **ibm\_spectrum\_lsf.job.completed.details.cpu\_average\_efficiency**(gauge) | The CPU average efficiency percentage of the completed job. [Reported by bhist_details]*Shown as percent*                                                                                                                                   |
| **ibm\_spectrum\_lsf.job.completed.details.cpu\_peak**(gauge)                | The CPU peak value for the completed job. [Reported by bhist_details]                                                                                                                                                                       |
| **ibm\_spectrum\_lsf.job.completed.details.cpu\_peak\_duration**(gauge)      | The duration of CPU peak usage for the completed job. [Reported by bhist_details]*Shown as second*                                                                                                                                          |
| **ibm\_spectrum\_lsf.job.completed.details.cpu\_peak\_efficiency**(gauge)    | The CPU peak efficiency percentage of the completed job. [Reported by bhist_details]*Shown as percent*                                                                                                                                      |
| **ibm\_spectrum\_lsf.job.completed.details.cpu\_time**(gauge)                | The total CPU time consumed by the completed job. [Reported by bhist_details]*Shown as second*                                                                                                                                              |
| **ibm\_spectrum\_lsf.job.completed.details.exit\_code**(gauge)               | The exit code returned by the completed job. [Reported by bhist_details]                                                                                                                                                                    |
| **ibm\_spectrum\_lsf.job.completed.details.max\_memory**(gauge)              | The maximum memory used by the completed job. [Reported by bhist_details]*Shown as megabyte*                                                                                                                                                |
| **ibm\_spectrum\_lsf.job.completed.details.mem\_efficiency**(gauge)          | The memory efficiency percentage of the completed job. [Reported by bhist_details]*Shown as percent*                                                                                                                                        |
| **ibm\_spectrum\_lsf.job.completed.details.status**(gauge)                   | The status of the completed job (1). Tagged with status:success or status:failure. [Reported by bhist_details]                                                                                                                              |
| **ibm\_spectrum\_lsf.job.completed.details.success**(gauge)                  | Indicates whether the job completed successfully (1) or failed (0). [Reported by bhist_details]                                                                                                                                             |
| **ibm\_spectrum\_lsf.job.completed.pending**(gauge)                          | The total amount of time spent by the job in the pending state. [Reported by bhist]*Shown as second*                                                                                                                                        |
| **ibm\_spectrum\_lsf.job.completed.pending\_user\_suspended**(gauge)         | The total amount of time spent by the job in the user suspended state. [Reported by bhist]*Shown as second*                                                                                                                                 |
| **ibm\_spectrum\_lsf.job.completed.running**(gauge)                          | The total run time of the job. [Reported by bhist]*Shown as second*                                                                                                                                                                         |
| **ibm\_spectrum\_lsf.job.completed.system\_suspended**(gauge)                | The total amount of time the job was in the system suspended state. [Reported by bhist]*Shown as second*                                                                                                                                    |
| **ibm\_spectrum\_lsf.job.completed.total**(gauge)                            | The total amount of time spent by the job from submission to completion. [Reported by bhist]*Shown as second*                                                                                                                               |
| **ibm\_spectrum\_lsf.job.completed.unknown**(gauge)                          | The total amount of time spent by the job in an unknown state. [Reported by bhist]*Shown as second*                                                                                                                                         |
| **ibm\_spectrum\_lsf.job.completed.user\_suspended**(gauge)                  | The total amount of time spent by the job in the user suspended state. [Reported by bhist]*Shown as second*                                                                                                                                 |
| **ibm\_spectrum\_lsf.job.cpu\_used**(gauge)                                  | The CPU used by the job. [Reported by bjobs]                                                                                                                                                                                                |
| **ibm\_spectrum\_lsf.job.idle\_factor**(gauge)                               | Job idle information (CPU time/runtime) if JOB_IDLE is configured in the queue, and the job has triggered an idle exception. [Reported by bjobs]                                                                                            |
| **ibm\_spectrum\_lsf.job.mem**(gauge)                                        | Total resident memory usage of all processes in a job. [Reported by bjobs]                                                                                                                                                                  |
| **ibm\_spectrum\_lsf.job.percent\_complete**(gauge)                          | The estimated completion percentage of the job. [Reported by bjobs]                                                                                                                                                                         |
| **ibm\_spectrum\_lsf.job.run\_time**(gauge)                                  | Estimated run time for the job. [Reported by bjobs]*Shown as second*                                                                                                                                                                        |
| **ibm\_spectrum\_lsf.job.swap**(gauge)                                       | Total virtual memory and swap usage of all processes in a job. [Reported by bjobs]                                                                                                                                                          |
| **ibm\_spectrum\_lsf.job.time\_left**(gauge)                                 | The estimated run time that the job has remaining. [Reported by bjobs]*Shown as second*                                                                                                                                                     |
| **ibm\_spectrum\_lsf.load.cpu.run\_queue\_length.15m**(gauge)                | The 15 minute exponentially averaged CPU run queue length. [Reported by lsload]                                                                                                                                                             |
| **ibm\_spectrum\_lsf.load.cpu.run\_queue\_length.15s**(gauge)                | The 15 second exponentially averaged CPU run queue length. [Reported by lsload]                                                                                                                                                             |
| **ibm\_spectrum\_lsf.load.cpu.run\_queue\_length.1m**(gauge)                 | The 1 minute exponentially averaged CPU run queue length. [Reported by lsload]                                                                                                                                                              |
| **ibm\_spectrum\_lsf.load.cpu.utilization**(gauge)                           | The CPU utilization exponentially averaged over the last minute, 0 - 1. [Reported by lsload]                                                                                                                                                |
| **ibm\_spectrum\_lsf.load.disk.io**(gauge)                                   | the disk I/O rate exponentially averaged over the last minute, in KB per second. [Reported by lsload]*Shown as kilobyte*                                                                                                                    |
| **ibm\_spectrum\_lsf.load.idle\_time**(gauge)                                | On UNIX, the idle time of the host (keyboard is not touched on all logged in sessions), in minutes. On Windows, the it index is based on the time that a screen saver is active on a particular host. [Reported by lsload]*Shown as minute* |
| **ibm\_spectrum\_lsf.load.login\_users**(gauge)                              | The number of current login users. [Reported by lsload]                                                                                                                                                                                     |
| **ibm\_spectrum\_lsf.load.mem.available\_ram**(gauge)                        | The amount of available RAM. [Reported by lsload]*Shown as megabyte*                                                                                                                                                                        |
| **ibm\_spectrum\_lsf.load.mem.available\_swap**(gauge)                       | The amount of available swap space. [Reported by lsload]*Shown as megabyte*                                                                                                                                                                 |
| **ibm\_spectrum\_lsf.load.mem.free**(gauge)                                  | The amount of free space in /tmp, in MB. [Reported by lsload]*Shown as megabyte*                                                                                                                                                            |
| **ibm\_spectrum\_lsf.load.mem.paging\_rate**(gauge)                          | The memory paging rate exponentially averaged over the last minute, in pages per second. [Reported by lsload]*Shown as page*                                                                                                                |
| **ibm\_spectrum\_lsf.load.status**(gauge)                                    | Status of the host. [Reported by lsload]                                                                                                                                                                                                    |
| **ibm\_spectrum\_lsf.perfmon.host.queries.avg**(gauge)                       | The average number of host information queries in the sampling period. [Reported by badmin_perfmon]                                                                                                                                         |
| **ibm\_spectrum\_lsf.perfmon.host.queries.current**(gauge)                   | The current number of host information queries in the sampling period. [Reported by badmin_perfmon]                                                                                                                                         |
| **ibm\_spectrum\_lsf.perfmon.host.queries.max**(gauge)                       | The max number of host information queries in the sampling period. [Reported by badmin_perfmon]                                                                                                                                             |
| **ibm\_spectrum\_lsf.perfmon.host.queries.min**(gauge)                       | The min number of host information queries in the sampling period. [Reported by badmin_perfmon]                                                                                                                                             |
| **ibm\_spectrum\_lsf.perfmon.host.queries.total**(gauge)                     | The total number of host information queries in the sampling period. [Reported by badmin_perfmon]                                                                                                                                           |
| **ibm\_spectrum\_lsf.perfmon.jobs.accepted\_remote.avg**(gauge)              | The average number of jobs accepted from remote cluster in the sampling period. [Reported by badmin_perfmon]*Shown as job*                                                                                                                  |
| **ibm\_spectrum\_lsf.perfmon.jobs.accepted\_remote.current**(gauge)          | The current number of jobs accepted from remote cluster in the sampling period. [Reported by badmin_perfmon]*Shown as job*                                                                                                                  |
| **ibm\_spectrum\_lsf.perfmon.jobs.accepted\_remote.max**(gauge)              | The max number of jobs accepted from remote cluster in the sampling period. [Reported by badmin_perfmon]*Shown as job*                                                                                                                      |
| **ibm\_spectrum\_lsf.perfmon.jobs.accepted\_remote.min**(gauge)              | The min number of jobs accepted from remote cluster in the sampling period. [Reported by badmin_perfmon]*Shown as job*                                                                                                                      |
| **ibm\_spectrum\_lsf.perfmon.jobs.accepted\_remote.total**(gauge)            | The total number of jobs accepted from remote cluster in the sampling period. [Reported by badmin_perfmon]*Shown as job*                                                                                                                    |
| **ibm\_spectrum\_lsf.perfmon.jobs.buckets.avg**(gauge)                       | The average number of scheduler buckets in which jobs are put based on resource requirements and different scheduling policies. [Reported by badmin_perfmon]                                                                                |
| **ibm\_spectrum\_lsf.perfmon.jobs.buckets.current**(gauge)                   | The current number of scheduler buckets in which jobs are put based on resource requirements and different scheduling policies. [Reported by badmin_perfmon]                                                                                |
| **ibm\_spectrum\_lsf.perfmon.jobs.buckets.max**(gauge)                       | The max number of scheduler buckets in which jobs are put based on resource requirements and different scheduling policies. [Reported by badmin_perfmon]                                                                                    |
| **ibm\_spectrum\_lsf.perfmon.jobs.buckets.min**(gauge)                       | The min number of scheduler buckets in which jobs are put based on resource requirements and different scheduling policies. [Reported by badmin_perfmon]                                                                                    |
| **ibm\_spectrum\_lsf.perfmon.jobs.buckets.total**(gauge)                     | The total number of scheduler buckets in which jobs are put based on resource requirements and different scheduling policies. [Reported by badmin_perfmon]                                                                                  |
| **ibm\_spectrum\_lsf.perfmon.jobs.completed.avg**(gauge)                     | The average amount of jobs completed in the sampling period. [Reported by badmin_perfmon]*Shown as job*                                                                                                                                     |
| **ibm\_spectrum\_lsf.perfmon.jobs.completed.current**(gauge)                 | The amount of jobs completed in the sampling period. [Reported by badmin_perfmon]*Shown as job*                                                                                                                                             |
| **ibm\_spectrum\_lsf.perfmon.jobs.completed.max**(gauge)                     | The max amount of jobs completed in the sampling period. [Reported by badmin_perfmon]*Shown as job*                                                                                                                                         |
| **ibm\_spectrum\_lsf.perfmon.jobs.completed.min**(gauge)                     | The min amount of jobs completed in the sampling period. [Reported by badmin_perfmon]*Shown as job*                                                                                                                                         |
| **ibm\_spectrum\_lsf.perfmon.jobs.completed.total**(gauge)                   | The total amount of jobs completed in the sampling period. [Reported by badmin_perfmon]*Shown as job*                                                                                                                                       |
| **ibm\_spectrum\_lsf.perfmon.jobs.dispatched.avg**(gauge)                    | The average number of jobs dispatched in the sampling period. [Reported by badmin_perfmon]*Shown as job*                                                                                                                                    |
| **ibm\_spectrum\_lsf.perfmon.jobs.dispatched.current**(gauge)                | The number of jobs dispatched. [Reported by badmin_perfmon]*Shown as job*                                                                                                                                                                   |
| **ibm\_spectrum\_lsf.perfmon.jobs.dispatched.max**(gauge)                    | The max number of jobs dispatched in the sampling period. [Reported by badmin_perfmon]*Shown as job*                                                                                                                                        |
| **ibm\_spectrum\_lsf.perfmon.jobs.dispatched.min**(gauge)                    | The min number of jobs dispatched in the sampling period. [Reported by badmin_perfmon]*Shown as job*                                                                                                                                        |
| **ibm\_spectrum\_lsf.perfmon.jobs.dispatched.total**(gauge)                  | The total number of jobs dispatched in the sampling period. [Reported by badmin_perfmon]*Shown as job*                                                                                                                                      |
| **ibm\_spectrum\_lsf.perfmon.jobs.queries.avg**(gauge)                       | The average number of job queries in the sampling period. [Reported by badmin_perfmon]                                                                                                                                                      |
| **ibm\_spectrum\_lsf.perfmon.jobs.queries.current**(gauge)                   | The number of job queries in the sampling period. [Reported by badmin_perfmon]                                                                                                                                                              |
| **ibm\_spectrum\_lsf.perfmon.jobs.queries.max**(gauge)                       | The max number of job queries in the sampling period. [Reported by badmin_perfmon]                                                                                                                                                          |
| **ibm\_spectrum\_lsf.perfmon.jobs.queries.min**(gauge)                       | The min number of job queries in the sampling period. [Reported by badmin_perfmon]                                                                                                                                                          |
| **ibm\_spectrum\_lsf.perfmon.jobs.queries.total**(gauge)                     | The total number of job queries in the sampling period. [Reported by badmin_perfmon]                                                                                                                                                        |
| **ibm\_spectrum\_lsf.perfmon.jobs.reordered.avg**(gauge)                     | The average number of jobs reordered in the sampling period, that is, the number of jobs that reused the resource allocation of a finished job. [Reported by badmin_perfmon]*Shown as job*                                                  |
| **ibm\_spectrum\_lsf.perfmon.jobs.reordered.current**(gauge)                 | The number of jobs reordered in the sampling period, that is, the number of jobs that reused the resource allocation of a finished job. [Reported by badmin_perfmon]*Shown as job*                                                          |
| **ibm\_spectrum\_lsf.perfmon.jobs.reordered.max**(gauge)                     | The max number of jobs reordered in the sampling period, that is, the number of jobs that reused the resource allocation of a finished job. [Reported by badmin_perfmon]*Shown as job*                                                      |
| **ibm\_spectrum\_lsf.perfmon.jobs.reordered.min**(gauge)                     | The min number of jobs reordered in the sampling period, that is, the number of jobs that reused the resource allocation of a finished job. [Reported by badmin_perfmon]*Shown as job*                                                      |
| **ibm\_spectrum\_lsf.perfmon.jobs.reordered.total**(gauge)                   | The total number of jobs reordered in the sampling period, that is, the number of jobs that reused the resource allocation of a finished job. [Reported by badmin_perfmon]*Shown as job*                                                    |
| **ibm\_spectrum\_lsf.perfmon.jobs.scheduling\_interval.avg**(gauge)          | The average scheduling interval in the sampling period. [Reported by badmin_perfmon]*Shown as second*                                                                                                                                       |
| **ibm\_spectrum\_lsf.perfmon.jobs.scheduling\_interval.current**(gauge)      | The current scheduling interval in the sampling period. [Reported by badmin_perfmon]*Shown as second*                                                                                                                                       |
| **ibm\_spectrum\_lsf.perfmon.jobs.scheduling\_interval.max**(gauge)          | The max scheduling interval in the sampling period. [Reported by badmin_perfmon]*Shown as second*                                                                                                                                           |
| **ibm\_spectrum\_lsf.perfmon.jobs.scheduling\_interval.min**(gauge)          | The min scheduling interval in the sampling period. [Reported by badmin_perfmon]*Shown as second*                                                                                                                                           |
| **ibm\_spectrum\_lsf.perfmon.jobs.scheduling\_interval.total**(gauge)        | The total scheduling interval in the sampling period. [Reported by badmin_perfmon]*Shown as second*                                                                                                                                         |
| **ibm\_spectrum\_lsf.perfmon.jobs.sent\_remote.avg**(gauge)                  | The average number of jobs sent to remote cluster in the sampling period. [Reported by badmin_perfmon]*Shown as job*                                                                                                                        |
| **ibm\_spectrum\_lsf.perfmon.jobs.sent\_remote.current**(gauge)              | The number of jobs sent to remote cluster in the sampling period. [Reported by badmin_perfmon]*Shown as job*                                                                                                                                |
| **ibm\_spectrum\_lsf.perfmon.jobs.sent\_remote.max**(gauge)                  | The max number of jobs sent to remote cluster in the sampling period. [Reported by badmin_perfmon]*Shown as job*                                                                                                                            |
| **ibm\_spectrum\_lsf.perfmon.jobs.sent\_remote.min**(gauge)                  | The avminerage number of jobs sent to remote cluster in the sampling period. [Reported by badmin_perfmon]*Shown as job*                                                                                                                     |
| **ibm\_spectrum\_lsf.perfmon.jobs.sent\_remote.total**(gauge)                | The total number of jobs sent to remote cluster in the sampling period. [Reported by badmin_perfmon]*Shown as job*                                                                                                                          |
| **ibm\_spectrum\_lsf.perfmon.jobs.submission\_requests.avg**(gauge)          | The average number of job submission requests in the sampling period. [Reported by badmin_perfmon]*Shown as request*                                                                                                                        |
| **ibm\_spectrum\_lsf.perfmon.jobs.submission\_requests.current**(gauge)      | The number of job submission requests in the sampling period. [Reported by badmin_perfmon]*Shown as request*                                                                                                                                |
| **ibm\_spectrum\_lsf.perfmon.jobs.submission\_requests.max**(gauge)          | The max number of job submission requests in the sampling period. [Reported by badmin_perfmon]*Shown as request*                                                                                                                            |
| **ibm\_spectrum\_lsf.perfmon.jobs.submission\_requests.min**(gauge)          | The min number of job submission requests in the sampling period. [Reported by badmin_perfmon]*Shown as request*                                                                                                                            |
| **ibm\_spectrum\_lsf.perfmon.jobs.submission\_requests.total**(gauge)        | The total number of job submission requests in the sampling period. [Reported by badmin_perfmon]*Shown as request*                                                                                                                          |
| **ibm\_spectrum\_lsf.perfmon.jobs.submitted.avg**(gauge)                     | The average number of jobs submitted in the sampling period. [Reported by badmin_perfmon]*Shown as job*                                                                                                                                     |
| **ibm\_spectrum\_lsf.perfmon.jobs.submitted.current**(gauge)                 | The number of jobs submitted in the sampling period. [Reported by badmin_perfmon]*Shown as job*                                                                                                                                             |
| **ibm\_spectrum\_lsf.perfmon.jobs.submitted.max**(gauge)                     | The max number of jobs submitted in the sampling period. [Reported by badmin_perfmon]*Shown as job*                                                                                                                                         |
| **ibm\_spectrum\_lsf.perfmon.jobs.submitted.min**(gauge)                     | The min number of jobs submitted in the sampling period. [Reported by badmin_perfmon]*Shown as job*                                                                                                                                         |
| **ibm\_spectrum\_lsf.perfmon.jobs.submitted.total**(gauge)                   | The total number of jobs submitted in the sampling period. [Reported by badmin_perfmon]*Shown as job*                                                                                                                                       |
| **ibm\_spectrum\_lsf.perfmon.mbatchd.processed\_requests.avg**(gauge)        | The average number of queries handled by mbatchd in the sampling period. [Reported by badmin_perfmon]*Shown as request*                                                                                                                     |
| **ibm\_spectrum\_lsf.perfmon.mbatchd.processed\_requests.current**(gauge)    | The number of queries handled by mbatchd in the sampling period. [Reported by badmin_perfmon]*Shown as request*                                                                                                                             |
| **ibm\_spectrum\_lsf.perfmon.mbatchd.processed\_requests.max**(gauge)        | The max number of queries handled by mbatchd in the sampling period. [Reported by badmin_perfmon]*Shown as request*                                                                                                                         |
| **ibm\_spectrum\_lsf.perfmon.mbatchd.processed\_requests.min**(gauge)        | The min number of queries handled by mbatchd in the sampling period. [Reported by badmin_perfmon]*Shown as request*                                                                                                                         |
| **ibm\_spectrum\_lsf.perfmon.mbatchd.processed\_requests.total**(gauge)      | The total number of queries handled by mbatchd in the sampling period. [Reported by badmin_perfmon]*Shown as request*                                                                                                                       |
| **ibm\_spectrum\_lsf.perfmon.memory.utilization.current**(gauge)             | Current memory utilization. [Reported by badmin_perfmon]                                                                                                                                                                                    |
| **ibm\_spectrum\_lsf.perfmon.memory.utilization.total**(gauge)               | Total memory utilization. [Reported by badmin_perfmon]                                                                                                                                                                                      |
| **ibm\_spectrum\_lsf.perfmon.queue.queries.avg**(gauge)                      | The average number of queue queries in the sampling period. [Reported by badmin_perfmon]                                                                                                                                                    |
| **ibm\_spectrum\_lsf.perfmon.queue.queries.current**(gauge)                  | The number of queue queries in the sampling period. [Reported by badmin_perfmon]                                                                                                                                                            |
| **ibm\_spectrum\_lsf.perfmon.queue.queries.max**(gauge)                      | The max number of queue queries in the sampling period. [Reported by badmin_perfmon]                                                                                                                                                        |
| **ibm\_spectrum\_lsf.perfmon.queue.queries.min**(gauge)                      | The min number of queue queries in the sampling period. [Reported by badmin_perfmon]                                                                                                                                                        |
| **ibm\_spectrum\_lsf.perfmon.queue.queries.total**(gauge)                    | The total number of queue queries in the sampling period. [Reported by badmin_perfmon]                                                                                                                                                      |
| **ibm\_spectrum\_lsf.perfmon.scheduler.host\_matches.avg**(gauge)            | The average number of hosts matching the resource criteria for a job. [Reported by badmin_perfmon]*Shown as host*                                                                                                                           |
| **ibm\_spectrum\_lsf.perfmon.scheduler.host\_matches.current**(gauge)        | The number of hosts matching the resource criteria for a job. [Reported by badmin_perfmon]*Shown as host*                                                                                                                                   |
| **ibm\_spectrum\_lsf.perfmon.scheduler.host\_matches.max**(gauge)            | The max number of hosts matching the resource criteria for a job. [Reported by badmin_perfmon]*Shown as host*                                                                                                                               |
| **ibm\_spectrum\_lsf.perfmon.scheduler.host\_matches.min**(gauge)            | The min number of hosts matching the resource criteria for a job. [Reported by badmin_perfmon]*Shown as host*                                                                                                                               |
| **ibm\_spectrum\_lsf.perfmon.scheduler.host\_matches.total**(gauge)          | The total number of hosts matching the resource criteria for a job in the sampling period. [Reported by badmin_perfmon]*Shown as host*                                                                                                      |
| **ibm\_spectrum\_lsf.perfmon.slots.utilization.current**(gauge)              | The current slot utilization. [Reported by badmin_perfmon]                                                                                                                                                                                  |
| **ibm\_spectrum\_lsf.perfmon.slots.utilization.total**(gauge)                | The total slot utilization of the sampling period. [Reported by badmin_perfmon]                                                                                                                                                             |
| **ibm\_spectrum\_lsf.queue.is\_active**(gauge)                               | Whether or not jobs in the queue can be started. [Reported by bqueues]                                                                                                                                                                      |
| **ibm\_spectrum\_lsf.queue.is\_open**(gauge)                                 | Whether or not the queue can accept jobs. [Reported by bqueues]                                                                                                                                                                             |
| **ibm\_spectrum\_lsf.queue.max\_jobs**(gauge)                                | The maximum number of job slots that can be used by the jobs from the queue. These job slots are used by dispatched jobs that are not yet finished, and by pending jobs that reserve slots. [Reported by bqueues]                           |
| **ibm\_spectrum\_lsf.queue.max\_jobs\_per\_host**(gauge)                     | The maximum number of job slots a host can allocate from this queue. [Reported by bqueues]                                                                                                                                                  |
| **ibm\_spectrum\_lsf.queue.max\_jobs\_per\_processor**(gauge)                | The maximum number of job slots a processor can process from the queue. [Reported by bqueues]                                                                                                                                               |
| **ibm\_spectrum\_lsf.queue.max\_jobs\_per\_user**(gauge)                     | The maximum number of job slots each user can use for jobs in the queue. [Reported by bqueues]                                                                                                                                              |
| **ibm\_spectrum\_lsf.queue.num\_job\_slots**(gauge)                          | The total number of slots for jobs in the queue. [Reported by bqueues]                                                                                                                                                                      |
| **ibm\_spectrum\_lsf.queue.pending**(gauge)                                  | The total number of tasks for all pending jobs in the queue. [Reported by bqueues]*Shown as job*                                                                                                                                            |
| **ibm\_spectrum\_lsf.queue.priority**(gauge)                                 | The priority of the queue. The larger the value, the higher the priority. [Reported by bqueues]                                                                                                                                             |
| **ibm\_spectrum\_lsf.queue.running**(gauge)                                  | The total number of tasks for all running jobs in the queue. If the -alloc option is used, the total is allocated slots for the jobs in the queue. [Reported by bqueues]*Shown as task*                                                     |
| **ibm\_spectrum\_lsf.queue.suspended**(gauge)                                | The total number of tasks for all suspended jobs in the queue. [Reported by bqueues]*Shown as task*                                                                                                                                         |
| **ibm\_spectrum\_lsf.server.gpu.num\_gpus**(gauge)                           | The total number of GPUs. [Reported by bhosts_gpu]                                                                                                                                                                                          |
| **ibm\_spectrum\_lsf.server.gpu.num\_gpus\_alloc**(gauge)                    | The current total number of GPUs that are allocated to be used by a job. [Reported by bhosts_gpu]                                                                                                                                           |
| **ibm\_spectrum\_lsf.server.gpu.num\_gpus\_exclusive\_alloc**(gauge)         | The current total number of GPUs that are allocated to be used exclusive by the job. [Reported by bhosts_gpu]                                                                                                                               |
| **ibm\_spectrum\_lsf.server.gpu.num\_gpus\_exclusive\_available**(gauge)     | The current total number of GPUs that are used exclusive by the job. [Reported by bhosts_gpu]                                                                                                                                               |
| **ibm\_spectrum\_lsf.server.gpu.num\_gpus\_jexclusive\_alloc**(gauge)        | The total number of GPUs allocated exclusively for a job. [Reported by bhosts_gpu]                                                                                                                                                          |
| **ibm\_spectrum\_lsf.server.gpu.num\_gpus\_shared\_alloc**(gauge)            | The total number of GPUs allocated but shared. [Reported by bhosts_gpu]                                                                                                                                                                     |
| **ibm\_spectrum\_lsf.server.gpu.num\_gpus\_shared\_available**(gauge)        | The current total number of GPUs that are available for concurrent use by multiple jobs. [Reported by bhosts_gpu]                                                                                                                           |
| **ibm\_spectrum\_lsf.server.max\_jobs**(gauge)                               | The maximum number of job slots available. A -1 indicates no limit. [Reported by bhosts]*Shown as job*                                                                                                                                      |
| **ibm\_spectrum\_lsf.server.num\_jobs**(gauge)                               | The number of tasks for all jobs that are dispatched to the host. The NJOBS value includes running, suspended, and chunk jobs. [Reported by bhosts]*Shown as task*                                                                          |
| **ibm\_spectrum\_lsf.server.reserved**(gauge)                                | The number of tasks for all pending jobs with reserved slots on the host. [Reported by bhosts]*Shown as task*                                                                                                                               |
| **ibm\_spectrum\_lsf.server.running**(gauge)                                 | The number of tasks for all running jobs on the host. [Reported by bhosts]                                                                                                                                                                  |
| **ibm\_spectrum\_lsf.server.slots\_per\_user**(gauge)                        | The maximum number of job slots that the host can process on a per user basis. A -1 indicates no limit. [Reported by bhosts]                                                                                                                |
| **ibm\_spectrum\_lsf.server.status**(gauge)                                  | The status of the host and the sbatchd daemon. Batch jobs can be dispatched only to hosts with an ok status. 1 if ok, 0 otherwise. [Reported by bhosts]                                                                                     |
| **ibm\_spectrum\_lsf.server.suspended**(gauge)                               | The number of tasks for all system suspended jobs on the host. [Reported by bhosts]                                                                                                                                                         |
| **ibm\_spectrum\_lsf.server.user\_suspended**(gauge)                         | The number of tasks for all user suspended jobs on the host. Jobs can be suspended by the user or by the LSF administrator. [Reported by bhosts]                                                                                            |
| **ibm\_spectrum\_lsf.slots.backfill.available**(gauge)                       | The available slots for backfill jobs. [Reported by bslots]                                                                                                                                                                                 |
| **ibm\_spectrum\_lsf.slots.runtime\_limit**(gauge)                           | The runtime limit for the backfill slots. [Reported by bslots]                                                                                                                                                                              |

### Events{% #events %}

The IBM Spectrum LSF integration does not include any events.

### Service Checks{% #service-checks %}

The IBM Spectrum LSF integration does not include any service checks.

## Troubleshooting{% #troubleshooting %}

Use the `datadog-agent check` command to view the metrics the integration is collecting, as well as debug logs from the check:

```
sudo -u dd-agent bash -c "source /usr/share/lsf/conf/profile.lsf && datadog-agent check ibm_spectrum_lsf -l debug"
```

Need help? Contact [Datadog support](https://docs.datadoghq.com/help/).
