slurm.node.cpu.allocated (gauge) | Number of CPUs allocated on the node for job-related tasks. Shown as cpu |
slurm.node.cpu.idle (gauge) | Number of idle CPUs on the node. Shown as cpu |
slurm.node.cpu.other (gauge) | Number of CPUs performing other or non-job-related tasks on the node. Shown as cpu |
slurm.node.cpu.total (gauge) | Total number of CPUs on the node. Shown as cpu |
slurm.node.cpu_load (gauge) | CPU load on the node as reported by the OS. |
slurm.node.free_mem (gauge) | Free memory on the node as reported by the OS. Shown as megabyte |
slurm.node.gpu_total (gauge) | Total number of GPUs on the node. |
slurm.node.gpu_used (gauge) | Number of GPUs used on the node. |
slurm.node.info (gauge) | Information about the Slurm node. |
slurm.node.tmp_disk (gauge) | Temporary disk space on the node as reported by the OS. Shown as megabyte |
slurm.partition.cpu.allocated (gauge) | Number of CPUs allocated on the partition for job-related tasks. Shown as cpu |
slurm.partition.cpu.idle (gauge) | Number of idle CPUs on the partition. Shown as cpu |
slurm.partition.cpu.other (gauge) | Number of CPUs performing other or non-job-related tasks on the partition. Shown as cpu |
slurm.partition.cpu.total (gauge) | Total number of CPUs on the partition. Shown as cpu |
slurm.partition.gpu_total (gauge) | Total number of GPUs on the partition. |
slurm.partition.gpu_used (gauge) | Number of GPUs used on the partition. |
slurm.partition.info (gauge) | Information about the Slurm partition. |
slurm.partition.nodes.count (gauge) | Number of nodes in the partition. Shown as node |
slurm.sacct.enabled (gauge) | Shows whether we're collecting sacct metrics or not for this host. |
slurm.sacct.job.duration (gauge) | Duration of the job in seconds. Shown as second |
slurm.sacct.job.info (gauge) | Information about the Slurm job in sacct. |
slurm.sacct.slurm_job_avgcpu (gauge) | Average (system + user) CPU time of all tasks in job. Shown as second |
slurm.sacct.slurm_job_avgrss (gauge) | Average resident set size of all tasks in job. |
slurm.sacct.slurm_job_cputime (gauge) | Time used (Elapsed time * CPU count) by a job or step in cpu-seconds. Shown as second |
slurm.sacct.slurm_job_maxrss (gauge) | Maximum resident set size of all tasks in job. |
slurm.sdiag.agent_count (gauge) | Number of agent threads. Shown as thread |
slurm.sdiag.agent_queue_size (gauge) | Number of enqueued outgoing RPC requests in an internal retry list. Shown as request |
slurm.sdiag.agent_thread_count (gauge) | Total count of active threads created by all the agent threads. Shown as thread |
slurm.sdiag.backfill.depth_mean (gauge) | Mean count of jobs processed during all backfilling scheduling cycles since last reset. Shown as job |
slurm.sdiag.backfill.depth_mean_try_depth (gauge) | The subset of Depth Mean that the backfill scheduler attempted to schedule. Shown as job |
slurm.sdiag.backfill.last_cycle (gauge) | Time in microseconds of last backfill scheduling cycle. Shown as microsecond |
slurm.sdiag.backfill.last_depth_cycle (gauge) | Number of processed jobs during last backfilling scheduling cycle. Shown as job |
slurm.sdiag.backfill.last_depth_try_schedule (gauge) | Number of processed jobs during last backfilling scheduling cycle. Shown as job |
slurm.sdiag.backfill.last_queue_length (gauge) | Number of jobs pending to be processed by backfilling algorithm. Shown as job |
slurm.sdiag.backfill.last_table_size (gauge) | Number of different time slots tested by the backfill scheduler in its last iteration. |
slurm.sdiag.backfill.max_cycle (gauge) | Time in microseconds of maximum backfill scheduling cycle execution since last reset. Shown as microsecond |
slurm.sdiag.backfill.mean_cycle (gauge) | Mean time in microseconds of backfilling scheduling cycles since last reset. Shown as microsecond |
slurm.sdiag.backfill.mean_table_size (gauge) | Mean count of different time slots tested by the backfill scheduler. |
slurm.sdiag.backfill.queue_length_mean (gauge) | Mean count of jobs pending to be processed by backfilling algorithm. Shown as job |
slurm.sdiag.backfill.total_cycles (gauge) | Number of backfill scheduling cycles since last reset. |
slurm.sdiag.backfill.total_heterogeneous_components (gauge) | Number of heterogeneous job components started thanks to backfilling since last Slurm start. |
slurm.sdiag.backfill.total_jobs_since_cycle_start (gauge) | Total backfilled jobs since last stats cycle restart. Shown as job |
slurm.sdiag.backfill.total_jobs_since_start (gauge) | Total backfilled jobs since last slurm restart. Shown as job |
slurm.sdiag.cycles_per_minute (gauge) | Scheduling executions per minute. |
slurm.sdiag.dbd_agent_queue_size (gauge) | DBD Agent message queue size for SlurmDBD. Shown as message |
slurm.sdiag.enabled (gauge) | Shows whether we're collecting sdiag metrics or not for this host. |
slurm.sdiag.jobs_canceled (gauge) | Number of jobs canceled since last reset. Shown as job |
slurm.sdiag.jobs_completed (gauge) | Number of jobs completed since last reset. Shown as job |
slurm.sdiag.jobs_failed (gauge) | Number of jobs failed since last reset. Shown as job |
slurm.sdiag.jobs_pending (gauge) | Number of jobs pending since last reset. Shown as job |
slurm.sdiag.jobs_running (gauge) | Number of jobs running since last reset. Shown as job |
slurm.sdiag.jobs_started (gauge) | Number of jobs started since last reset. Shown as job |
slurm.sdiag.jobs_submitted (gauge) | Number of jobs submitted since last reset. Shown as job |
slurm.sdiag.last_cycle (gauge) | Time in microseconds for last scheduling cycle. Shown as microsecond |
slurm.sdiag.last_queue_length (gauge) | Length of jobs pending queue. Shown as job |
slurm.sdiag.max_cycle (gauge) | Maximum time in microseconds for any scheduling cycle since last reset. Shown as microsecond |
slurm.sdiag.mean_cycle (gauge) | Mean time in microseconds for all scheduling cycles since last reset. Shown as microsecond |
slurm.sdiag.mean_depth_cycle (gauge) | Mean of cycle depth. Depth means number of jobs processed in a scheduling cycle. Shown as job |
slurm.sdiag.server_thread_count (gauge) | The number of current active slurmctld threads. Shown as thread |
slurm.sdiag.total_cycles (gauge) | The total run time in microseconds for all scheduling cycles since the last reset. Shown as microsecond |
slurm.share.effective_usage (gauge) | The association's usage normalized with its parent. |
slurm.share.fair_share (gauge) | The Fair-Share factor, based on a user or account's assigned shares and the effective usage charged to them or their accounts. |
slurm.share.level_fs (gauge) | This is the association's fairshare value compared to its siblings,calculated as normshares / effectiveusage. |
slurm.share.norm_shares (gauge) | The shares assigned to the user or account normalized to the total number of assigned shares. |
slurm.share.norm_usage (gauge) | The Raw Usage normalized to the total number of tres-seconds of all jobs run on the cluster. |
slurm.share.raw_shares (gauge) | The raw shares assigned to the user or account. |
slurm.share.raw_usage (gauge) | The number of tres-seconds (cpu-seconds if TRESBillingWeights is not defined) of all the jobs charged to the account or user. |
slurm.sinfo.node.enabled (gauge) | Shows whether we're collecting node metrics or not for this host. |
slurm.sinfo.partition.enabled (gauge) | Shows whether we're collecting partition metrics or not for this host. |
slurm.sinfo.squeue.enabled (gauge) | Shows whether we're collecting squeue metrics or not for this host. |
slurm.squeue.job.info (gauge) | Information about the Slurm job in squeue. |
slurm.sshare.enabled (gauge) | Shows whether we're collecting sshare metrics or not for this host. |