Datadog’s Process Monitoring allows for real-time visibility of the most granular elements in a deployment. Taking inspiration from bedrock tools like
htop, this centralized view, combined with existing tagging capabilities, allows you to understand what is going on at any level of your system and drill all the way down into the most fine details.
The process Agent is shipped by default with Agent 6 in Linux packages only. Refer to the instructions for standard Agent installation for platform-specific details.
Once the Datadog Agent is installed, enable Live Processes collection by editing the Agent main configuration file by setting the following parameter to true:
process_config: enabled: "true"
enabled value is a string with the following options:
"true": Enable the process-agent to collect processes and containers.
"false": Only collect containers if available (the default).
"disabled": Don’t run the process-agent at all.
Additionally, some configuration options may be set as environment variables.
Note: options set as environment variables override the settings defined in the configuration file.
After configuration is complete, restart the Agent.
Follow the instructions for the Docker Agent, passing in the following attributes, in addition to any other custom settings as appropriate:
-v /etc/passwd:/etc/passwd:ro -e DD_PROCESS_AGENT_ENABLED=true
dd-agentuser needs to have permissions to access
In the dd-agent.yaml manifest used to create the daemonset, add the following environmental variables, volume mount, and volume:
env: - name: DD_PROCESS_AGENT_ENABLED value: "true" volumeMounts: - name: passwd mountPath: /etc/passwd readOnly: true volumes: - hostPath: path: /etc/passwd name: passwd
Note: Running the Agent as container still allows you to collect host processes.
In order to hide sensitive data on the Live Processes page, the Agent scrubs sensitive arguments from the process command line. This feature is enabled by default and any process argument that matches one of the following words has its value hidden.
"password", "passwd", "mysql_pwd", "access_token", "auth_token", "api_key", "apikey", "secret", "credentials", "stripetoken"
Note: The matching is case insensitive.
Define your own list to be merged with the default one, using the
custom_sensitive_words field in
datadog.yaml file under the
process_config section. Use wildcards (
*) to define your own matching scope. However, a single wildcard (
'*') is not supported as a sensitive word.
process_config: scrub_args: true custom_sensitive_words: ['personal_key', '*token', 'sql*', *pass*d*']
The next image shows one process on the Live Processes page whose arguments have been hidden by using the configuration above.
false to completely disable the process arguments scrubbing.
You can also scrub all arguments from processes by enabling the
strip_proc_arguments flag in your
datadog.yaml configuration file:
process_config: strip_proc_arguments: true
Processes and containers are, by their nature, extremely high cardinality objects. The fuzzy string search gives you a view into exactly what you want. Below is Datadog’s demo environment, filtered with the string
/9. has matched in the command path, and
postgres matches the command itself.
Tagging makes navigation easy. In addition to all existing host-level tags, processes are tagged by
Furthermore, processes in ECS containers are also tagged by:
Processeses in Kubernetes containers are tagged by:
First, you can filter down to
role:McNulty-Query, Datadog’s front-end query service, in order to narrow the search. Then you can search for the NGINX master processes, and pivot the table by Availability-Zone, to be confident about that service staying highly available.
Here, you are checking the Elasticsearch processes for an individual feature team. You have also added metrics for voluntary and involuntary context switches, available in the gear menu on the upper-right of the table.
Below, you have searched for SSH processes and pivoted by
user to understand who is logged into which hosts.
Perhaps this one is less exciting after redaction.
Use the ScatterPlot analytic to compare two metrics with one another in order to better understand the performance of your containers.
To access the ScatterPlot analytic in the Processes page click on the Show Summary graph button the select the ScatterPlot tab:
By default, the graph groups by the
command tag key. The size of each dot represents the number of processes in that group, and clicking on a dot drills in it to display the individual pids and containers that contribute to the group.
The query at the top of the Scatterplot analytic allows you to control your ScatterPlot analytic:
Live Processes adds extra visibility to your container deployments. The Live Containers feature gives you a similarly comprehensive view of your container and orchestrator environment. When Live Processes is enabled, the process tree for each container is included in the container inspection panel on that page.
While actively working with the Live Processes, metrics are collected at 2s resolution. This is very important for highly volatile metrics such as CPU. In the background, for historical context, metrics are collected at 10s resolution.
Collection of open files and current working directory is limited based on the level of privilege of the user running
dd-process-agent. In the event that
dd-process-agent is able to access these fields, they are collected automatically.
Real-time (2s) data collection is turned off after 30 minutes. To resume real-time collection, refresh the page.
In container deployments, the
/etc/passwd file mounted into the docker-dd-agent is necessary to collect usernames for each process. This is a public file and the Process Agent does not use any fields except the username. All features except the
user metadata field will function without access to this file.
passwdfile and will not perform username resolution for users created within containers.
Additional helpful documentation, links, and articles: