Orchestrator Explorer

Orchestrator Explorer, showing Kubernetes Pods.

Overview

The Datadog Agent and Cluster Agent can retrieve Kubernetes resources for the Orchestrator Explorer. This feature allows you to monitor the state of pods, deployments, and other Kubernetes concepts in a specific namespace or availability zone, view resource specifications for failed pods within a deployment, correlate node activity with related logs, and more.

Orchestrator Explorer requires Agent version >= 7.27.0 and Cluster Agent version >= 1.11.0.

Note: For Kubernetes version 1.25 and above, the minimal Cluster Agent version required is 7.40.0.

Setup

Ensure that you have enabled the Process Agent. If you are using Datadog Operator or the official Helm chart, the Orchestrator Explorer is enabled by default.

The Orchestrator Explorer is enabled in the Datadog Operator by default.

For verification, ensure that the features.orchestratorExplorer.enabled parameter is set to true in your datadog-agent.yaml:

apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
  name: datadog
spec:
  global:
    credentials:
      apiKey: <DATADOG_API_KEY>
      appKey: <DATADOG_APP_KEY>
  features:
    orchestratorExplorer:
      enabled: true

If you are using the official Helm chart, Orchestrator Explorer is enabled by default.

For verification, ensure that the orchestratorExplorer.enabled parameter is set to true in your values.yaml file:

datadog:
  # (...)
  processAgent:
    enabled: true
  orchestratorExplorer:
    enabled: true

Then, upgrade your Helm chart.

Usage

Views

Toggle among the Pods, Clusters, Namespaces, and other Kubernetes resources in the Select Resources dropdown menu in the top left corner of the page.

Each of these views includes a data table to help you better organize your data by field such as status, name, and Kubernetes labels, and a detailed Cluster Map to give you a bigger picture of your pods and Kubernetes clusters.

See Query filter details for more details on how to filter these views.

Orchestrator Explorer opened to show Workloads > Replica Sets, in Summary mode

Group by functionality and facets

Group pods by tags, Kubernetes labels, or Kubernetes annotations to get an aggregated view which allows you to find information quicker. You can perform a group by using the “Group by” bar on the top right of the page or by clicking on a particular tag or label and locating the group by function in the context menu as shown below.

An example of grouping by team

You can also use facets on the left hand side of the page to group resources or filter for resources you care most about, such as pods with a CrashLoopBackOff pod status.

Cluster map

A cluster map gives you a bigger picture of your pods and Kubernetes clusters. You can see all of your resources together on one screen with customized groups and filters, and choose which metrics to fill the color of the nodes.

Examine resources from cluster maps by clicking on any circle or group to populate a detailed panel.

Information panel

Click on any row in the table or on any object in a Cluster Map to view information about a specific resource in a side panel.

A view of resources in the side panel, opened to processes.

The side panel’s YAML tab shows the full resource definition. Starting in Agent version 7.44.0, it also includes seven days of definition history. You can compare what changed over time and across different versions. The time indicated is approximately when the changes were applied to the resource.

To prevent displaying a large number of irrelevant changes, updates affecting only the following fields are ignored:

  • metadata.resourceVersion
  • metadata.managedFields
  • metadata.generation
  • status
A view of resources in the side panel, showing the yaml history feature

The other tabs show more information for troubleshooting the selected resource:

  • Logs: View logs from your container or resource. Click on any log to view related logs in the Log Explorer.
  • APM: View traces from your container or resource, including the date, service, duration, method, and status code of a trace.
  • Metrics: View live metrics for your container or resource. You can view any graph full screen, share a snapshot of it, or export it from this tab.
  • Processes: View all processes running in the container of this resource.
  • Network: View a container or resource’s network performance, including source, destination, sent and received volume, and throughput fields. Use the Destination field to search by tags like DNS or ip_type, or use the Group by filter in this view to group network data by tags, like pod_name or service.
  • Events: View all Kubernetes events for your resource.
  • Monitors: View monitors tagged, scoped, or grouped for this resource.

For a detailed dashboard of this resource, click the View Dashboard in the top right corner of this panel.

A link to a pod dashboard from Live Containers overview

Resource utilization

For the Resource Utilization page, see Resource Utilization.

Within the Kubernetes Explorer tab, you can explore a selection of resource utilization metrics.

Container Resource Utilization

All of these columns support sorting, which helps you to pinpoint individual workloads based on their resource utilization.

Container Resource Utilization Sorted Columns

Query filter details

You can narrow down the displayed resources by supplying a query within the “Filter by” search bar on the top left of the page.

Syntax

A query filter is composed of terms and operators. Example:

Orchestrator Explorer query filter syntax.

Terms

There are multiple types of terms available:

TypeExamples
Tags: Attached to resources by the agent collecting them. There are also additional tags that Datadog generates for Kubernetes resources.datacenter:staging
tag#datacenter:staging
(the tag# is optional)
Labels: Extracted from a resource’s metadata. They are typically used to organize your cluster and target specific resources with selectors.label#chart_version:2.1.0
Annotations: Extracted from a resource’s metadata. They are generally used to support tooling that aid in cluster management.annotation#checksum/configmap:a1bc23d4
Metrics: Added to workload resources (pods, deployments, etc.). You can find resources based on their utilization. To see what metrics are supported, see Resource Utilization Filters.metric#cpu_usage_pct_limits_avg15:>80%
String matching: Supported by some specific resource attributes, see below.
Note: string matching does not use the key-value format, and you cannot specify the attribute to match on.
"10.132.6.23" (IP)
"9cb4b43f-8dc1-4a0e" (UID)
web-api-3 (Name)
Fields: Extracted from a resource’s metadata.field#metadata.creationTimestamp:>=4wk
field#metadata.deletionTimestamp:<=1hr

Note: You might find the same key-value pairs as both a tag and label (or annotation) - this is dependent on how your cluster is configured.

The following resource attributes are supported in arbitrary String Matching:

  • metadata.name
  • metadata.uid
  • IP Addresses found in:
    • Pods
    • Nodes (internal and external)
    • Services (cluster, external, and load balancer IPs)

You do not need to specify a key to search for a resource by name, or IP. Quotes are not required unless your string search includes certain special characters.

Comparators

All terms support the : equality operator. Metric value terms support numeric comparisons as well:

  • :> Greater than (for example, metric#cpu_usage_avg15:>0.9)
  • :>= Greater than or equal
  • :< Less than
  • :<= Less than or equal

Operators

To combine multiple terms into a complex query, you can use any of the following case sensitive boolean operators:

OperatorDescriptionExample
ANDIntersection: Both terms are in the selected events (if nothing is added, AND is taken by default)a AND b
ORUnion: Either term is contained in the selected eventsa OR b
NOT / -Exclusion: The following term is NOT in the event (apply to each individual raw text search)a AND NOT b or
a AND -b
( )Grouping: Specify how to group terms logically.a AND (b OR c) or
(a AND b) or c
OR value shorthand

Multiple terms sharing the same key can be combined into a single term if they all use the OR operator. For example, this query:

app_name:web-server OR app_name:database OR app_name:event-consumer

Can be reduced to:

app_name:(web-server OR database OR event-consumer)

Wildcards

You can use * wildcards as part of a term to filter by partial matches, both for values and keys. Some examples:

  • kube_job:stats-*: Find all resources with a kube_deployment tag value starting with stats-.
  • pod_name:*canary: Find all resources with a pod_name value ending in canary.
  • label#release:*: Find all resources with a release label, regardless of its value.
  • -label#*.datadoghq.com/*: Find resources that do not have any Datadog scoped labels.
  • kube_*:*stats*canary: Find resources that have related resource tags (kube_*), with stats in the middle of the value, also ending with canary.

Extracted tags

In addition to the tags you have configured within your Datadog agent, Datadog injects generated tags based on resource attributes that can help your searching and grouping needs. These tags are added to resources conditionally, when they are relevant.

All resources

All resources have the kube_cluster_name tag and all namespaced resources have the kube_namespace tag added to them.

Additionally, resources contain a kube_<api_kind>:<metadata.name> tag. For example, a deployment named web-server-2 would have the kube_deployment:web-server-2 tag automatically added to it.

Note: There are some exceptions to this pattern:

  • Pods use pod_name instead.
  • VPAs: verticalpodautoscaler.
  • VPHs: horizontalpodautoscaler.
  • Persistent Volume Claims: persistentvolumeclaim.

Based on the labels attached to the resource, the following tags will also be extracted:

TagSource Label
kube_app_nameapp.kubernetes.io/name
kube_app_instanceapp.kubernetes.io/instance
kube_app_versionapp.kubernetes.io/version
kube_app_componentapp.kubernetes.io/component
kube_app_part_ofapp.kubernetes.io/part-of
kube_app_managed_byapp.kubernetes.io/managed-by
envtags.datadoghq.com/env
versiontags.datadoghq.com/version
servicetags.datadoghq.com/service

Relationships

Related Resources will be tagged with each other. Some examples:

  • A pod that is part of the “XYZ” deployment will have a kube_deployment:xyz tag.
  • An ingress that points at service “A” will have a kube_service:a tag.

Resources that are spawned from “parent” resources will have the kube_ownerref_kind and kube_ownerref_name tags (such as pods and jobs).

Tip: Utilize the filter query autocomplete feature to discover what related resource tags are available. Type kube_ and see what results are suggested.

Pods

Pods are given the following tags:

  • pod_name
  • pod_phase (extracted from the manifest)
  • pod_status (calculated similarly to kubectl)

Workloads

Workload resources (pods, deployments, stateful sets, etc.) will have the following tags, indicating their support within the Resources Utilization page:

  • resource_utilization (supported or unsupported)
  • missing_cpu_requests
  • missing_cpu_limits
  • missing_memory_requests
  • missing_memory_limits

Conditions

Some conditions, for some resources, are extracted as tags. For example, you can find the kube_condition_available tag on deployments. The tag format is always kube_condition_<name> with a true or false value.

Tip: Use the autocomplete feature to discover what conditions are available on a given resource type by entering kube_condition and reviewing the results.

Resource specific tags

Some resources have specific tags that are extracted based on your cluster’s environment. The following tags are available in addition to the shared tags above.

ResourceExtracted Tags
Clusterapi_server_version
kubelet_version
Custom Resource Definitions &
Custom Resources
kube_crd_kind
kube_crd_group
kube_crd_version
kube_crd_scope
Namespacephase
Nodekube_node_unschedulable
kube_node_kubelet_version
kube_node_kernel_version
kube_node_runtime_version
eks_fargate_node
node_schedulable
node_status
Persistent Volumekube_reclaim_policy
kube_storage_class_name
pv_type
pv_phase
Persistent Volume Claimpvc_phase
kube_storage_class_name
Podpod_name (instead of kube_pod)
pod_phase (extracted from the Manifest)
pod_status (calculated similarly to kubectl)
Servicekube_service_type
kube_service_port

Resource Utilization Filters

The following workload resources are enriched with resource utilization metrics:

  • Clusters
  • Daemonsets
  • Deployments
  • Nodes
  • Pods
  • Replica Sets
  • Stateful Sets

These metrics are calculated at the time of collection, based on the average values over the last 15 minutes. You can filter by metric values like so: metric#<metric_name><comparator><numeric_value>.

  • metric_name is an available metric (see below)
  • comparator is a supported comparator
  • and numeric_value is a floating point value.

For these workload resources, the following metric names are available:

CPUMemory
cpu_limits_avg15mem_limits_avg15
cpu_requests_avg15mem_requests_avg15
cpu_usage_avg15mem_usage_avg15
cpu_usage_pct_limits_avg15mem_usage_pct_limits_avg15
cpu_usage_pct_requests_avg15mem_usage_pct_requests_avg15
cpu_waste_avg15mem_waste_avg15

In addition, clusters, and nodes have the following metrics available to them:

  • cpu_usage_pct_alloc_avg15
  • cpu_requests_pct_alloc_avg15
  • mem_usage_pct_alloc_avg15
  • mem_requests_pct_alloc_avg15

Metric units

CPU metrics are stored as a number of cores.

Memory metrics are stored as bytes.

Percents (*_pct_*) are stored as floats, where 0.0 is 0%, and 1.0 is 100%. The value is the ratio of the two indicated metrics - for example cpu_usage_pct_limits_avg15 is the value of usage / limits. Metric values can be above 100%, such as Percentage CPU Usage of Requests.

Notes and known issues

  • Data is updated automatically in constant intervals. Update intervals may change during beta.
  • In clusters with 1000+ Deployments or ReplicaSets you may notice elevated CPU usage from the Cluster Agent. There is an option to disable container scrubbing in the Helm chart. See the Helm Chart repo for more details.