Collect resource usage metrics for specific running processes on any host: CPU, memory, I/O, number of threads, etc.
Use Process Monitors: configure thresholds for how many instances of a specific process ought to be running and get alerts when the thresholds aren’t met (see Service Checks below).
Setup
Installation
The Process Check is included in the Datadog Agent package, so you don’t need to install anything else on your server.
Configuration
Unlike many checks, the Process Check doesn’t monitor anything useful by default. You must configure which processes you want to monitor, and how.
While there’s no standard default check configuration, here’s an example process.d/conf.yaml that monitors SSH/SSHD processes. See the sample process.d/conf.yaml for all available configuration options:
init_config:instances:## @param name - string - required## Used to uniquely identify your metrics as they are tagged with this name in Datadog.#- name:ssh## @param search_string - list of strings - optional## If one of the elements in the list matches, it returns the count of## all the processes that match the string exactly by default. Change this behavior with the## parameter `exact_match: false`.#### Note: Exactly one of search_string, pid or pid_file must be specified per instance.#search_string:- ssh- sshd
Some process metrics require either running the Datadog collector as the same user as the monitored process or privileged access to be retrieved. Where the former option is not desired, and to avoid running the Datadog collector as `root`, the `try_sudo` option lets the Process Check try using `sudo` to collect this metric. As of now, only the `open_file_descriptors` metric on Unix platforms is taking advantage of this setting. Note: the appropriate sudoers rules have to be configured for this to work:
Note: Some metrics are not available on Linux or OSX:
Process I/O metrics are not available on Linux or OSX since the files that the Agent reads (/proc//io) are only readable by the process’s owner. For more information, read the Agent FAQ
system.cpu.iowait is not available on Windows.
All metrics are per instance configured in process.yaml, and are tagged process_name:<instance_name>.
The system.processes.cpu.pct metric sent by this check is only accurate for processes that live for more
than 30 seconds. Do not expect its value to be accurate for shorter-lived processes.
The number of major page faults per second for children of this process. Shown as occurrence
system.processes.mem.pct (gauge)
The process memory consumption. Shown as percent
system.processes.mem.real (gauge)
The non-swapped physical memory a process has used and cannot be shared with another process (Linux only). Shown as byte
system.processes.mem.rss (gauge)
The non-swapped physical memory a process has used. aka "Resident Set Size". Shown as byte
system.processes.mem.vms (gauge)
The total amount of virtual memory used by the process. aka "Virtual Memory Size". Shown as byte
system.processes.number (gauge)
The number of processes. Shown as process
system.processes.open_file_descriptors (gauge)
The number of file descriptors used by this process (only available for processes run as the dd-agent user)
system.processes.open_handles (gauge)
The number of handles used by this process.
system.processes.threads (gauge)
The number of threads used by this process. Shown as thread
system.processes.voluntary_ctx_switches (gauge)
The number of voluntary context switches performed by this process. Shown as event
system.processes.run_time.avg (gauge)
The average running time of all instances of this process Shown as second
system.processes.run_time.max (gauge)
The longest running time of all instances of this process Shown as second
system.processes.run_time.min (gauge)
The shortest running time of all instances of this process Shown as second
Events
The Process Check does not include any events.
Service Checks
process.up Returns OK if the check is within the warning thresholds, CRITICAL if it’s outside of the critical thresholds, and WARNING if it’s outside of the warning thresholds. Statuses: ok, warning, critical