Google Cloud Run for Anthos

Overview

Google Cloud Run for Anthos is a flexible serverless development platform for hybrid and multicloud environments. Cloud Run for Anthos is Google’s managed and fully supported Knative offering. If you are using fully managed Google Cloud, see the Google Cloud Run documentation.

Use the Datadog Google Cloud Platform integration to collect metrics from Google Cloud Run for Anthos.

Setup

Metric collection

Installation

If you haven’t already, set up the Google Cloud Platform integration.

If you are already authenticating your Cloud Run for Anthos services using Workload Identity, then no further steps are needed.

If you have not enabled Workload Identity, you must migrate to use Workload Identity to start collecting Knative metrics. This involves binding a Kubernetes service account to a Google service account and configuring each service that you want to collect metrics from to use Workload Identity.

For detailed setup instructions, see Google Cloud Workload Identity.

Log collection

Google Cloud Run for Anthos exposes service logs. Google Cloud Run logs can be collected with Google Cloud Logging and sent to a Dataflow job through a Cloud Pub/Sub topic. If you haven’t already, set up logging with the Datadog Dataflow template.

Once this is done, export your Google Cloud Run logs from Google Cloud Logging to the Pub/Sub topic:

  1. Go to Cloud Run for Anthos, click on your desired services and navigate to the Logs tab.

  2. Click on View in Logs Explorer to go to the Google Cloud Logging Page.

  3. Click Create Sink and name the sink accordingly.

  4. Choose “Cloud Pub/Sub” as the destination and select the Pub/Sub topic that was created for that purpose. Note: The Pub/Sub topic can be located in a different project.

    Export Google Cloud Pub/Sub Logs to Pub Sub
  5. Click Create and wait for the confirmation message to show up.

Tracing and Custom Metrics

Use the Datadog Admission Controller to configure APM tracers and DogStatsD clients automatically. Inject the DD_AGENT_HOST and DD_ENTITY_ID environment variables by using one of the following methods:

  • Add the admission.datadoghq.com/enabled: "true" label to your pod.
  • Configure the Cluster Agent admission controller by setting mutateUnlabelled: true.

To prevent pods from receiving environment variables, add the admission.datadoghq.com/enabled: "false" label. This works even if you set mutateUnlabelled: true. For more information, see the Datadog Admission Controller documentation.

Data Collected

Metrics

gcp.knative.eventing.broker.event_count
(count)
Number of events received by a broker.
gcp.knative.eventing.trigger.event_count
(count)
Number of events received by a trigger.
gcp.knative.eventing.trigger.event_dispatch_latencies.avg
(gauge)
Average of time spent dispatching an event to a trigger subscriber.
Shown as millisecond
gcp.knative.eventing.trigger.event_dispatch_latencies.p99
(gauge)
99th percentile of time spent dispatching an event to a trigger subscriber.
Shown as millisecond
gcp.knative.eventing.trigger.event_dispatch_latencies.p95
(gauge)
95th percentile of time spent dispatching an event to a trigger subscriber.
Shown as millisecond
gcp.knative.eventing.trigger.event_processing_latencies.avg
(gauge)
Average of time spent processing an event before it is dispatched to a trigger subscriber.
Shown as millisecond
gcp.knative.eventing.trigger.event_processing_latencies.p99
(gauge)
99th percentile of time spent processing an event before it is dispatched to a trigger subscriber.
Shown as millisecond
gcp.knative.eventing.trigger.event_processing_latencies.p95
(gauge)
95th percentile of time spent processing an event before it is dispatched to a trigger subscriber.
Shown as millisecond
gcp.knative.serving.activator.request_count
(count)
The number of requests that are routed to the activator.
Shown as request
gcp.knative.serving.activator.request_latencies.avg
(gauge)
Average of service request times in milliseconds for requests that go through the activator.
Shown as millisecond
gcp.knative.serving.activator.request_latencies.p99
(gauge)
99th percentile of service request times in milliseconds for requests that go through the activator.
Shown as millisecond
gcp.knative.serving.activator.request_latencies.p95
(gauge)
95th percentile of service request times in milliseconds for requests that go through the activator.
Shown as millisecond
gcp.knative.serving.autoscaler.actual_pods
(gauge)
Number of pods that are allocated currently.
gcp.knative.serving.autoscaler.desired_pods
(gauge)
Number of pods autoscaler wants to allocate.
gcp.knative.serving.autoscaler.panic_mode
(gauge)
Set to 1 if autoscaler is in panic mode for the revision, otherwise 0.
gcp.knative.serving.autoscaler.panic_request_concurrency
(gauge)
Average requests concurrency observed per pod during the shorter panic autoscaling window.
Shown as request
gcp.knative.serving.autoscaler.requested_pods
(gauge)
Number of pods autoscaler requested from Kubernetes.
gcp.knative.serving.autoscaler.stable_request_concurrency
(gauge)
Average requests concurrency observed per pod during the stable autoscaling window.
Shown as request
gcp.knative.serving.autoscaler.target_concurrency_per_pod
(gauge)
The desired average requests concurrency per pod during the stable autoscaling window.
Shown as request
gcp.knative.serving.revision.request_count
(count)
The number of requests reaching the revision.
Shown as request
gcp.knative.serving.revision.request_latencies.avg
(gauge)
Average of service request times in milliseconds for requests reaching the revision.
Shown as millisecond
gcp.knative.serving.revision.request_latencies.p99
(gauge)
99th percentile of service request times in milliseconds for requests reaching the revision.
Shown as millisecond
gcp.knative.serving.revision.request_latencies.p95
(gauge)
95th percentile of service request times in milliseconds for requests reaching the revision.
Shown as millisecond

Events

The Google Cloud Run for Anthos integration does not include any events.

Service Checks

The Google Cloud Run for Anthos integration does not include any service checks.

Troubleshooting

Need help? Contact Datadog support.

Further Reading

Additional helpful documentation, links, and articles: