This check monitors Lustre through the Datadog Agent.
Lustre is a distributed file system commonly used in high-performance computing (HPC) environments. This integration provides comprehensive monitoring of Lustre cluster performance, health, and operations across all node types: clients, metadata servers (MDS), and object storage servers (OSS).
The Datadog Agent can collect many metrics from Lustre clusters, including:
Device Health: Monitor the status and health of all Lustre devices and targets
Job Statistics: Track per-job I/O operations, latency, and throughput on MDS and OSS nodes
Network Statistics: Monitor LNET performance including local and peer network interface metrics
General Performance: Collect detailed statistics on file system operations, locks, and client activities
Changelog Events: Capture filesystem change events for audit and analysis (client nodes only)
Setup
Follow the instructions below to install and configure this check for an Agent running on a host.
Installation
The Lustre check is included in the Datadog Agent package. No additional installation is needed on your server.
Configuration
To configure the Agent check:
Edit the lustre.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting your Lustre performance data. See the sample lustre.d/conf.yaml for all available configuration options.
Add the dd-agent user to the sudoers file to allow it to run Lustre commands without a password. Edit the sudoers file with visudo and add:
Note: The Datadog Agent must have sufficient privileges to execute Lustre commands (lctl, lnetctl, lfs). This typically requires running the Agent as root or with appropriate sudo permissions.
On client nodes, the Lustre integration can collect changelog events as structured logs. These logs contain:
operation_type: The type of filesystem operation
timestamp: When the operation occurred
flags: Operation flags
message: Detailed operation information
Important: Changelog users must be registered for changelogs to be collected. Use the lctl changelog_register command to register changelog users. Refer to the Lustre manual.
To collect Lustre changelogs:
Enable logs in your datadog.yaml file:
logs_enabled:true
Uncomment and edit the logs configuration block in your lustre.d/conf.yaml file. For example: