Live Containers

Live Containers

Overview

Datadog Live Containers enables real-time visibility into all containers across your environment.

Taking inspiration from bedrock tools like htop, ctop, and kubectl, live containers give you complete coverage of your container infrastructure in a continuously updated table with resource metrics at two-second resolution, faceted search, and streaming container logs.

Coupled with Docker, Kubernetes, ECS, and other container technologies, plus built-in tagging of dynamic components, the live container view provides a detailed overview of your containers' health, resource consumption, logs, and deployment in real time:

Configuration

Kubernetes resources

The Datadog Agent and Cluster Agent can be configured to retrieve Kubernetes resources for Live Containers. This feature allows you to monitor the state of pods, deployments, and other Kubernetes concepts in a specific namespace or availability zone, view resource specifications for failed pods within a deployment, correlate node activity with related logs, and more.

Kubernetes resources for Live Containers requires Agent version >= 7.27.0 and Cluster Agent version >= 1.11.0 prior to the configurations below.

If you are using the official Datadog Helm Chart:

  • Use chart version 2.10.0 or above Note: Ensure the Agent and Cluster Agent versions are hardcoded with the minimum versions required or above in your helm chart values.yaml file.

  • Make sure the Process Agent is enabled. You can do this by modifying your datadog-values.yaml file to include:

    datadog:
        # (...)
        processAgent:
            enabled: true
    
  • Deploy a new release.

In some setups, the Process Agent and Cluster Agent cannot automatically detect a Kubernetes cluster name. If this happens the feature does not start, and the following warning displays in the Cluster Agent log: Orchestrator explorer enabled but no cluster name set: disabling. In this case you must set datadog.clusterName to your cluster name in values.yaml.

Cluster Agent version >= 1.11.0 is required before configuring the DaemonSet. The Cluster Agent must be running, and the Agent must be able to communicate with it. See the Cluster Agent Setup for configuration.

  1. Set the Cluster Agent container with the following environment variable:

      - name: DD_ORCHESTRATOR_EXPLORER_ENABLED
        value: "true"
    
  2. Set the Cluster Agent ClusterRole with the following RBAC permissions.

    Note in particular that for the apps and batch apiGroups, Live Containers need permissions to collect common kubernetes resources (pods, services, nodes, etc.), which should be already in the RBAC if you followed Cluster Agent Setup. But if they are missing, ensure they are added (after deployments, replicasets):

      ClusterRole:
      - apiGroups:  # To create the datadog-cluster-id ConfigMap
        - ""
        resources:
        - configmaps
        verbs:
        - create
        - get
        - update
      ...
      - apiGroups:  # Required to get the kube-system namespace UID and generate a cluster ID
        - ""
        resources:
        - namespaces
        verbs:
        - get
      ...
      - apiGroups:  # To collect new resource types
        - "apps"
        resources:
        - deployments
        - replicasets
        verbs:
        - list
        - get
        - watch
      - apiGroups:
        - "batch"
        resources:
        - cronjobs
        - jobs
        verbs:
        - list
        - get
        - watch
      ...
    

    These permissions are needed to create a datadog-cluster-id ConfigMap in the same Namespace as the Agent DaemonSet and the Cluster Agent Deployment, as well as to collect supported Kubernetes resources.

    If the cluster-id ConfigMap isn’t created by the Cluster Agent, the Agent pod cannot collect resources. In such a case, update the Cluster Agent permissions and restart its pods to let it create the ConfigMap, and then restart the Agent pod.

  3. The Process Agent, which runs in the Agent DaemonSet, must be enabled and running (it doesn’t need to run the process collection), and configured with the following options:

    - name: DD_ORCHESTRATOR_EXPLORER_ENABLED
      value: "true"
    

In some setups, the Process Agent and Cluster Agent cannot automatically detect a Kubernetes cluster name. If this happens the feature does not start, and the following warning displays in the Cluster Agent log: Orchestrator explorer enabled but no cluster name set: disabling. In this case you must add the following options in the env section of both the Cluster Agent and the Process Agent:

- name: DD_CLUSTER_NAME
  value: "<YOUR_CLUSTER_NAME>"

Resource collection compatibility matrix

The following table presents the list of collected resources and the minimal Agent, Cluster Agent and Helm chart versions for each.

ResourceMinimal Agent versionMinimal Cluster Agent versionMinimal Helm chart version
Clusters7.27.01.12.02.10.0 
Deployments7.27.01.11.02.10.0 
Nodes7.27.0 1.11.0 2.10.0 
Pods7.27.01.11.02.10.0
ReplicaSets7.27.01.11.0 2.10.0 
Services7.27.01.11.0 2.10.0 
Jobs7.27.01.13.1 2.15.5 
CronJobs7.27.01.13.1 2.15.5 
DaemonSets7.27.01.14.0 2.16.3 
Statefulsets7.27.01.15.0 2.20.1 

Instructions for previous Agent and Cluster Agent versions.

The Kubernetes resources view for Live Containers used to require Agent version >= 7.21.1 and Cluster Agent version >= 1.9.0 before minimal versions were updated. For those older versions, the DaemonSet configuration was slightly different and full instructions are retained here for reference.

If you are using the official Datadog Helm Chart:

  • Use chart version above 2.4.5 and before 2.10.0. Starting from chart version 2.10.0 onwards, see the latest configuration instructions instead. Note: Ensure the Agent and Cluster Agent versions are hardcoded with the minimum versions required or above in your Helm chart values.yaml file.
  • Set datadog.orchestratorExplorer.enabled to true in values.yaml
  • Deploy a new release.

In some setups, the Process Agent and Cluster Agent cannot automatically detect a Kubernetes cluster name. If this happens the feature does not start, and the following warning displays in the Cluster Agent log: Orchestrator explorer enabled but no cluster name set: disabling.. In this case you must set datadog.clusterName to your cluster name in values.yaml.

The Cluster Agent must be running, and the Agent must be able to communicate with it. See the Cluster Agent Setup for configuration.

  1. Set the Cluster Agent container with the following environment variable:

      - name: DD_ORCHESTRATOR_EXPLORER_ENABLED
        value: "true"
    
  2. Set the Cluster Agent ClusterRole with the following RBAC permissions.

    Note: For the apps apiGroups, Live Containers need permissions to collect common kubernetes resources (pods, services, nodes, etc.), which should be already in the RBAC if you followed Cluster Agent Setup. But if they are missing, ensure they are added (after deployments, replicasets):

      ClusterRole:
      - apiGroups:  # To create the datadog-cluster-id ConfigMap
        - ""
        resources:
        - configmaps
        verbs:
        - create
        - get
        - update
      ...
      - apiGroups:  # Required to get the kube-system namespace UID and generate a cluster ID
        - ""
        resources:
        - namespaces
        verbs:
        - get
      ...
      - apiGroups:  # To collect new resource types
        - "apps"
        resources:
        - deployments
        - replicasets
        - daemonsets
        - statefulsets
        verbs:
        - list
        - get
        - watch
    

    These permissions are needed to create a datadog-cluster-id ConfigMap in the same Namespace as the Agent DaemonSet and the Cluster Agent Deployment, as well as to collect Deployments and ReplicaSets.

    If the cluster-id ConfigMap doesn’t get created by the Cluster Agent, the Agent pod does not start, and falls into CreateContainerConfigError status. If the Agent pod is stuck because the ConfigMap doesn’t exist, update the Cluster Agent permissions and restart its pods. This creates the ConfigMap and the Agent pod recovers automatically.

  3. The Process Agent, which runs in the Agent DaemonSet, must be enabled and running (it doesn’t need to run the process collection), and configured with the following options:

    - name: DD_ORCHESTRATOR_EXPLORER_ENABLED
      value: "true"
    - name: DD_ORCHESTRATOR_CLUSTER_ID
      valueFrom:
        configMapKeyRef:
          name: datadog-cluster-id
          key: id
    

In some setups, the Process Agent and Cluster Agent are unable to automatically detect a Kubernetes cluster name. If this happens the feature does not start, and the following warning displays in the Cluster Agent log: Orchestrator explorer enabled but no cluster name set: disabling. In this case you must add the following options in the env section of both the Cluster Agent and the Process Agent:

- name: DD_CLUSTER_NAME
  value: "<YOUR_CLUSTER_NAME>"

Add custom tags to resources

You can add custom tags to Kubernetes resources to ease filtering inside the Kubernetes resources view.

Additional tags are added through the DD_ORCHESTRATOR_EXPLORER_EXTRA_TAGS environment variable.

Note: These tags only show up in the Kubernetes resources view.

If you are using the official Helm chart, add the environment variable on both the Process Agent and the Cluster Agent by setting agents.containers.processAgent.env and clusterAgent.env respectively in values.yaml.

  agents:
    containers:
      processAgent:
        env:
          - name: "DD_ORCHESTRATOR_EXPLORER_EXTRA_TAGS"
            value: "tag1:value1 tag2:value2"
  clusterAgent:
    env:
      - name: "DD_ORCHESTRATOR_EXPLORER_EXTRA_TAGS"
        value: "tag1:value1 tag2:value2"

Then deploy a new release.

Set the environment variable on both the Process Agent and Cluster Agent containers:

- name: DD_ORCHESTRATOR_EXPLORER_EXTRA_TAGS
  value: "tag1:value1 tag2:value2"

Include or exclude containers

It is possible to include and/or exclude containers from real-time collection:

  • Exclude containers either by passing the environment variable DD_CONTAINER_EXCLUDE or by adding container_exclude: in your datadog.yaml main configuration file.
  • Include containers either by passing the environment variable DD_CONTAINER_INCLUDE or by adding container_include: in your datadog.yaml main configuration file.

Both arguments take an image name as value; regular expressions are also supported.

For example, to exclude all Debian images except containers with a name starting with frontend, add these two configuration lines in your datadog.yaml file:

container_exclude: ["image:debian"]
container_include: ["name:frontend.*"]

Note: For Agent 5, instead of including the above in the datadog.conf main configuration file, explicitly add a datadog.yaml file to /etc/datadog-agent/, as the Process Agent requires all configuration options here. This configuration only excludes containers from real-time collection, not from Autodiscovery.

Container scrubbing

To prevent the leaking of sensitive data, you can scrub sensitive words in container YAML files. Container scrubbing is enabled by default for Helm charts, and some default sensitive words are provided:

  • password
  • passwd
  • mysql_pwd
  • access_token
  • auth_token
  • api_key
  • apikey
  • pwd
  • secret
  • credentials
  • stripetoken

You can set additional sensitive words by providing a list of words to the environment variable DD_ORCHESTRATOR_EXPLORER_CUSTOM_SENSITIVE_WORDS. This adds to, and does not overwrite, the default words. You need to setup this environment variable for the following agents:

  • process-agent
  • cluster-agent
env:
    - name: DD_ORCHESTRATOR_EXPLORER_CUSTOM_SENSITIVE_WORDS
      value: "customword1 customword2 customword3"

For example, because password is a sensitive word, the scrubber changes <MY_PASSWORD> in any of the following to a string of asterisks, ***********:

password <MY_PASSWORD>
password=<MY_PASSWORD>
password: <MY_PASSWORD>
password::::== <MY_PASSWORD>

However it does not scrub paths that contain sensitive words. For example, it does not overwrite /etc/vaultd/secret/haproxy-crt.pem with /etc/vaultd/secret/****** even though secret is a sensitive word.

Getting started

Navigate to the Containers page. This automatically brings you to the Containers view.

Searching, filtering, and pivoting

Containers are, by their nature, extremely high cardinality objects. Datadog’s flexible string search matches substrings in the container name, ID, or image fields.

If you’ve enabled Kubernetes Resources, strings such as pod, deployment, ReplicaSet, and service name, as well as Kubernetes labels are searchable in a Kubernetes Resources view.

To combine multiple string searches into a complex query, you can use any of the following Boolean operators:

AND
Intersection: both terms are in the selected events (if nothing is added, AND is taken by default)
Example: java AND elasticsearch
OR
Union: either term is contained in the selected events
Example: java OR python
NOT / !
Exclusion: the following term is NOT in the event. You may use the word NOT or ! character to perform the same operation
Example: java NOT elasticsearch or java !elasticsearch

Use parentheses to group operators together. For example, (NOT (elasticsearch OR kafka) java) OR python.

Filtering and pivoting

The screenshot below displays a system that has been filtered down to a Kubernetes cluster of 25 nodes. RSS and CPU utilization on containers is reported compared to the provisioned limits on the containers, when they exist. Here, it is apparent that the containers in this cluster are over-provisioned. You could use tighter limits and bin packing to achieve better utilization of resources.

Container environments are dynamic and can be hard to follow. The following screenshot displays a view that has been pivoted by kube_service and host—and, to reduce system noise, filtered to kube_namespace:default. You can see what services are running where, and how saturated key metrics are:

You could pivot by ECS ecs_task_name and ecs_task_version to understand changes to resource utilization between updates.

For Kubernetes resources, select Datadog tags such as environment, service, or pod_phase to filter by. You can also use the container facets on the left to filter a specific Kubernetes resource. Group pods by Datadog tags to get an aggregated view which allows you to find information quicker.

Tagging

Containers are tagged with all existing host-level tags, as well as with metadata associated with individual containers.

All containers are tagged by image_name, including integrations with popular orchestrators, such as ECS and Kubernetes, which provide further container-level tags. Additionally, each container is decorated with Docker, ECS, or Kubernetes icons so you can tell which are being orchestrated at a glance.

ECS containers are tagged by:

  • task_name
  • task_version
  • ecs_cluster

Kubernetes containers are tagged by:

  • pod_name
  • kube_pod_ip
  • kube_service
  • kube_namespace
  • kube_replica_set
  • kube_daemon_set
  • kube_job
  • kube_deployment
  • kube_cluster

If you have a configuration for Unified Service Tagging in place, env, service, and version is picked up automatically. Having these tags available lets you tie together APM, logs, metrics, and live container data.

Views

Containers view

The Containers view includes Scatter Plot and Timeseries views, and a table to better organize your container data by fields such as container name, status, and start time.

Scatter plot

Use the scatter plot analytic to compare two metrics with one another in order to better understand the performance of your containers.

You can switch between the “Scatter Plot” and “Timeseries” tabs in the collapsible Summary Graphs section in the Containers page:

By default, the graph groups by the short_image tag key. The size of each dot represents the number of containers in that group, and clicking on a dot displays the individual containers and hosts that contribute to the group.

The query at the top of the scatter plot analytic allows you to control your scatter plot analytic:

  • Selection of metrics to display.
  • Selection of the aggregation method for both metrics.
  • Selection of the scale of both X and Y axis (Linear/Log).

Real-time monitoring

While actively working with the containers page, metrics are collected at a 2-second resolution. This is important for highly volatile metrics such as CPU. In the background, for historical context, metrics are collected at 10s resolution.

Kubernetes resources view

If you have enabled Kubernetes Resources for Live Containers, toggle among the Clusters, Pods, Deployments, ReplicaSets, DaemonSets, Services, CronJobs, Jobs, and Nodes views in the “Select a resource” dropdown menu in the top left corner of the page.

Each of these views includes a data table to help you better organize your data by field such as status, name, and Kubernetes labels, and a detailed Cluster Map to give you a bigger picture of your pods and Kubernetes clusters.

Group by functionality and facets

Group pods by tags or Kubernetes labels to get an aggregated view which allows you to find information quicker. You can perform a group by using the “Group by” bar on the top right of the page or by clicking on a particular tag or label and locating the group by function in the context menu as shown below.

An example of grouping by team

You can also leverage facets on the left hand side of the page to quickly group resources or filter for resources you care most about, such as pods with a CrashLoopBackOff pod status.

An example of grouping the CrashLoopBackOff pod status

Cluster map

A Kubernetes Cluster Map gives you a bigger picture of your pods and Kubernetes clusters. You can see all of your resources together on one screen with customized groups and filters, and choose which metrics to fill the color of the pods by.

Drill down into resources from Cluster Maps by click on any circle or group to populate a detailed panel.

You can see all of your resources together on one screen with customized groups and filters, and choose which metrics to fill the color of the pods by.

A cluster map with customized groups and filters

Information panel

Click on any row in the table or on any object in a Cluster Map to view information about a specific resource in a side panel.

A view of resources in the side panel

For a detailed dashboard of this resource, click the View Dashboard in the top right corner of this panel.

This panel is useful for troubleshooting and finding information about a selected container or resource, such as:

  • Logs: View logs from your container or resource. Click on any log to view related logs in Logs Explorer.
  • Metrics: View live metrics for your container or resource. You can view any graph full screen, share a snapshot of it, or export it from this tab.
  • Network: View a container or resource’s network performance, including source, destination, sent and received volume, and throughput fields. Use the Destination field to search by tags like DNS or ip_type, or use the Group by filter in this view to group network data by tags, like pod_name or service.
  • Traces: View traces from your container or resource, including the date, service, duration, method, and status code of a trace.

Kubernetes Resources views have a few additional tabs:

  • Processes: View all processes running in the container of this resource.
  • YAML: A detailed YAML overview for the resource.
  • Events: View all Kubernetes events for your resource.

For a detailed dashboard of this resource, click the View Dashboard in the top right corner of this panel.

Container logs

View streaming logs for any container like docker logs -f or kubectl logs -f in Datadog. Click any container in the table to inspect it. Click the Logs tab to see real-time data from live tail or indexed logs for any time in the past.

Live tail

With live tail, all container logs are streamed. Pausing the stream allows you to easily read logs that are quickly being written; unpause to continue streaming.

Streaming logs can be searched with simple string matching. See Live Tail for more details.

Note: Streaming logs are not persisted, and entering a new search or refreshing the page clears the stream.

Indexed logs

You can see indexed logs that you have chosen to index and persist by selecting a corresponding timeframe. Indexing allows you to filter your logs using tags and facets. For example, to search for logs with an Error status, type status:error into the search box. Autocompletion can help you locate the particular tag that you want. Key attributes about your logs are already stored in tags, which enables you to search, filter, and aggregate as needed.

Notes and known issues

  • Real-time (2s) data collection is turned off after 30 minutes. To resume real-time collection, refresh the page.
  • RBAC settings can restrict Kubernetes metadata collection. See the RBAC entities for the Datadog Agent.
  • In Kubernetes the health value is the containers' readiness probe, not its liveness probe.

Kubernetes resources

  • Data is updated automatically in constant intervals. Update intervals may change during beta.
  • In clusters with 1000+ Deployments or ReplicaSets you may notice elevated CPU usage from the Cluster Agent. There is an option to disable container scrubbing in the Helm chart, see the Helm Chart repo for more details.

Further reading