Datadog Synthetics is now available!

Google Cloud Platform

Crawler Crawler

Overview

Connect to Google Cloud Platform to see all your Google Compute Engine (GCE) hosts in Datadog. You can see your hosts in the infrastructure overview in Datadog and sort through them, since Datadog automatically tags them with GCE host tags and any GCE labels you may have added.

Related integrations include:

App Engine PaaS (platform as a service) to build scalable applications
Big Query Enterprise data warehouse
CloudSQL MySQL database service
Cloud Run Managed compute platform that runs stateless containers through HTTP  
Compute Engine High performance virtual machines
Container Engine Kubernetes, managed by google
Datastore NoSQL database
Firebase Mobile platform for application development
Functions Serverless platform for building event-based microservices
Machine Learning Machine learning services
Pub/Sub Real-time messaging service
Spanner Horizontally scalable, globally consistent, relational database service
Stackdriver Logging Real-time log management and analysis
Storage Unified object storage
VPN Managed network functionality

Setup

Metric Collection

Installation

The Datadog <> Google Cloud integration uses Service Accounts to create an API connection between Google Cloud and Datadog. Below are instructions for creating a service account and providing Datadog with service account credentials to begin making API calls on your behalf.

  1. Navigate to the Google Cloud credentials page for the Google Cloud project where you would like to setup the Datadog integration.
  2. Press Create credentials and then select Service account key.

    settings
  3. In the Service account dropdown, select New service account.

  4. Give the service account a unique name.

  5. For Role, select Compute engine —> Compute Viewer and Monitoring —> Monitoring Viewer.

    Note: these roles allow us to collect metrics, tags, events, and GCE labels on your behalf.

  6. Select JSON as the key type, and press create.

  7. Take note where this file is saved, as it is needed to complete the integration.

  8. Navigate to the Datadog Google Cloud Integration tile.

  9. Select Upload Key File to integrate this project with Datadog.

  10. Optionally, you can use tags to filter out hosts from being included in this integration. Detailed instructions on this can be found below.

    settings
  11. Press Install/Update.

  12. For each project you want to monitor, repeat this process.

Google Cloud billing, the Stackdriver Monitoring API, and the Compute Engine API must all be enabled for the project(s) you wish to monitor.

Configuration

Optionally, you can limit the GCE instances that are pulled into Datadog by entering tags in the Limit Metric Collection textbox. Only hosts that match one of the defined tags are imported into Datadog. You can use wildcards (? for single character, * for multi-character) to match many hosts, or ! to exclude certain hosts. This example includes all c1* sized instances, but excludes staging hosts:

datadog:monitored,env:production,!env:staging,instance-type:c1.*

Log Collection

Log collection only works with Datadog US site.

For applications running in GCE or GKE, the Datadog Agent can be used to collect logs locally. GCP service logs are collected via Stackdriver and sent to a Cloud Pub/Sub with a HTTP Push forwarder. The log collection requires 4 steps:

  1. Create a new Cloud Pub/Sub.
  2. Validate your Datadog domain so that logs can be pushed from GCP to Datadog.
  3. Setup the Pub/Sub to forward logs to Datadog.
  4. Configure exports from Stackdriver logs to the Pub/Sub.

Create a Cloud Pub Sub

  1. Go to the Cloud Pub Sub console and create a new topic.

    Create a topic
  2. Give that topic an explicit name such as export-logs-to-datadog and Save.

Validate the Datadog Domain

To validate the domain, ask Google to generate an HTML file that is used as a unique identifier. This allows Google to validate the Datadog endpoint and forwards logs to it.

  1. Connect to the Google Search Console.
  2. In the URL section add https://gcp-intake.logs.datadoghq.com/v1/input/<API_KEY> (find your Datadog API key here).
  3. Download the HTML file locally:

    Download HTML file
  4. Push this HTML file to Datadog with the following command:

    curl -X POST -H "Content-type: application/json" -d '{"file_contents": "google-site-verification: <GOOGLE_FILE_NAME>.html"}' "https://app.datadoghq.com/api/latest/integration/gcp_logs_site_verification?api_key=<DATADOG_API_KEY>&application_key=<DATADOG_APPLICATION_KEY>"
    

    <DATADOG_API_KEY> and <DATADOG_APPLICATION_KEY> can be found in the API Datadog section. The expected result of this command is {}.

  5. Click Verify on the Google console and wait until the confirmation message shows that it worked.

  6. Go to the API credentials page in the GCP console and click on add domain

    Credential page
  7. Enter the same endpoint as earlier and click add:

    Download domain

Once this is done, click on the Search Console link of the pop-up to confirm that it was properly enabled as shown below:

Property enabled

The GCP project is now ready to forward logs from the Pub/Sub to Datadog.

Configure the Pub/Sub to forward logs to Datadog

  1. Go back to the Pub/Sub that was previously created, and add a new subscription:

    Create a new subscription
  2. Select the Push method and enter the following https://gcp-intake.logs.datadoghq.com/v1/input/<DATADOG_API_KEY>/

    Push method
  3. Hit Create at the bottom.

The Pub/Sub is now ready to receive logs from Stackdriver and forward them to Datadog.

Note: If you see an error here at step 3, it means that the Datadog site was not validated. Refer to the domain validation steps and make sure the domain is validated.

Export logs from Stackdriver to the Pub/Sub

  1. Go to the Stackdriver page and filter the logs that need to be exported.
  2. Hit Create Export and name the sink accordingly.
  3. Choose Cloud Pub/Sub as the destination and select the Pub/Sub that was created for that purpose. Note that the Pub/Sub can be located in a different project.

    Export log from Stackdriver
  4. Hit Create and wait for the confirmation message to show up.

Note: It is possible to create several exports from Stackdriver to the same Pub/Sub with different sinks.

Data Collected

Metrics

gcp.bigtable.cluster.cpu_load
(gauge)
CPU load of a cluster.
shown as percent
gcp.bigtable.cluster.cpu_load_hottest_node
(gauge)
CPU load of the busiest node in a cluster.
shown as percent
gcp.bigtable.cluster.disk_load
(gauge)
Utilization of HDD disks in a cluster
shown as percent
gcp.bigtable.cluster.node_count
(gauge)
Number of nodes in a cluster.
shown as node
gcp.bigtable.cluster.storage_utilization
(gauge)
Storage used as a fraction of total storage capacity.
shown as percent
gcp.bigtable.disk.bytes_used
(gauge)
Amount of compressed data for tables stored in a cluster.
shown as byte
gcp.bigtable.server.error_count
(gauge)
Number of server requests for a table that failed with an error.
shown as error
gcp.bigtable.server.latencies.avg
(gauge)
Distribution of server request latencies for a table.
shown as millisecond
gcp.bigtable.server.latencies.samplecount
(gauge)
Distribution of replication request latencies for a table. Includes only requests that have been received by the destination cluster
shown as millisecond
gcp.bigtable.server.latencies.sumsqdev
(gauge)
Sum of Squared Deviation for replication latencies between clusters of a table. Indicates the time frame during which latency information may not be accurate.
shown as second
gcp.bigtable.server.modified_rows_count
(gauge)
Number of rows modified by server requests for a table.
gcp.bigtable.server.received_bytes_count
(gauge)
Number of uncompressed bytes of request data received by servers for a table.
shown as byte
gcp.bigtable.server.request_count
(gauge)
Number of server requests for a table.
shown as request
gcp.bigtable.server.returned_rows_count
(gauge)
Number of rows returned by server requests for a table.
gcp.bigtable.server.sent_bytes_count
(gauge)
Number of uncompressed bytes of response data sent by servers for a table.
shown as byte
gcp.bigtable.table.bytes_used
(gauge)
Amount of compressed data stored in a table.
shown as byte
gcp.loadbalancing.https.backend_latencies.avg
(gauge)
Average latency of request sent by the proxy to backend until proxy receives last byte of response from backend.
shown as millisecond
gcp.loadbalancing.https.backend_latencies.samplecount
(count)
Sample Count of latency of request sent by the proxy to backend until proxy receives last byte of response from backend.
shown as millisecond
gcp.loadbalancing.https.backend_latencies.sumsqdev
(gauge)
Sum of Squared Deviation for latency of request sent by the proxy to backend until proxy receives last byte of response from backend.
shown as millisecond
gcp.loadbalancing.https.frontend_tcp_rtt.avg
(gauge)
Average RTT for each connection between client and proxy.
shown as millisecond
gcp.loadbalancing.https.frontend_tcp_rtt.samplecount
(count)
Sample Count of RTT for each connection between client and proxy.
shown as millisecond
gcp.loadbalancing.https.frontend_tcp_rtt.sumsqdev
(gauge)
Sum of Squared Deviation of RTT for each connection between client and proxy.
shown as millisecond
gcp.loadbalancing.https.request_bytes_count
(count)
Bytes sent as requests from clients to L7 load balancer.
shown as byte
gcp.loadbalancing.https.request_count
(count)
Number of requests served by L7 load balancer.
shown as request
gcp.loadbalancing.https.response_bytes_count
(count)
Bytes sent as responses from L7 load balancer to clients.
shown as byte
gcp.loadbalancing.https.total_latencies.avg
(gauge)
Average latency calculated from request received by proxy until proxy sees ACK from client on last response byte.
shown as millisecond
gcp.loadbalancing.https.total_latencies.samplecount
(count)
Sample Count of latency calculated from request received by proxy until proxy sees ACK from client on last response byte.
shown as millisecond
gcp.loadbalancing.https.total_latencies.sumsqdev
(gauge)
Sum of Squared Deviation of latency calculated from request received by proxy until proxy sees ACK from client on last response byte.
shown as millisecond
gcp.loadbalancing.tcp_ssl_proxy.closed_connections
(count)
Number of connections that were terminated over TCP/SSL proxy.
shown as connection
gcp.loadbalancing.tcp_ssl_proxy.egress_bytes_count
(count)
Number of bytes sent from VM to client using proxy.
shown as byte
gcp.loadbalancing.tcp_ssl_proxy.frontend_tcp_rtt.avg
(gauge)
Average smoothed RTT measured by the proxy's TCP stack. Each minute application layer bytes pass from proxy to client.
shown as millisecond
gcp.loadbalancing.tcp_ssl_proxy.frontend_tcp_rtt.samplecount
(count)
Sample count of smoothed RTT measured by the proxy's TCP stack. Each minute application layer bytes pass from proxy to client.
shown as millisecond
gcp.loadbalancing.tcp_ssl_proxy.frontend_tcp_rtt.sumsqdev
(gauge)
Sum of squared deviation of smoothed RTT measured by the proxy's TCP stack. Each minute application layer bytes pass from proxy to client.
shown as millisecond
gcp.loadbalancing.tcp_ssl_proxy.ingress_bytes_count
(count)
Number of bytes sent from client to VM using proxy.
shown as byte
gcp.loadbalancing.tcp_ssl_proxy.new_connections
(count)
Number of connections that were created over TCP/SSL proxy.
shown as connection
gcp.loadbalancing.tcp_ssl_proxy.open_connections
(count)
Current number of outstanding connections through the TCP/SSL proxy.
shown as connection
gcp.interconnect.network.attachment.received_bytes_count
(count)
Number of inbound bytes received.
shown as byte
gcp.interconnect.network.attachment.received_packets_count
(count)
Number of inbound packets received.
shown as packet
gcp.interconnect.network.attachment.sent_bytes_count
(count)
Number of outbound bytes sent.
shown as byte
gcp.interconnect.network.attachment.sent_packets_count
(count)
Number of outbound packets sent.
shown as packet
gcp.interconnect.network.interconnect.capacity
(gauge)
Active capacity of the interconnect.
shown as byte
gcp.interconnect.network.interconnect.dropped_packets_count
(count)
Number of outbound packets dropped due to link congestion.
shown as packet
gcp.interconnect.network.interconnect.link.operational
(gauge)
Whether the operational status of the circuit is up.
gcp.interconnect.network.interconnect.link.rx_power
(gauge)
Light level received over physical circuit.
gcp.interconnect.network.interconnect.link.tx_power
(gauge)
Light level transmitted over physical circuit.
gcp.interconnect.network.interconnect.operational
(gauge)
Whether the operational status of the interconnect is up.
gcp.interconnect.network.interconnect.receive_errors_count
(count)
Number of errors encountered while receiving packets.
shown as error
gcp.interconnect.network.interconnect.received_bytes_count
(count)
Number of inbound bytes received.
shown as byte
gcp.interconnect.network.interconnect.received_unicast_packets_count
(count)
Number of inbound unicast packets received.
shown as packet
gcp.interconnect.network.interconnect.send_errors_count
(count)
Number of errors encountered while sending packets.
shown as error
gcp.interconnect.network.interconnect.sent_bytes_count
(count)
Number of outbound bytes sent.
shown as byte
gcp.interconnect.network.interconnect.sent_unicast_packets_count
(count)
Number of outbound unicast packets sent.
shown as packet

Events

All service events generated by your Google Cloud Platform are forwarded to your Datadog event stream. Other events captured in Stackdriver are not currently available, but will be in the future with the Datadog Log management product.

Service Checks

The Google Cloud Platform integration does not include any service checks.

Troubleshooting

Incorrect metadata for user defined gcp.logging metrics?

For non-standard gcp.logging metrics (i.e. metrics beyond Datadog’s out of the box logging metrics), the metadata applied may not be consistent with Stackdriver.

In these cases, the metadata should be manually set by navigating to the metric summary page, searching and selecting the metric in question, and clicking the Pencil icon next to metadata.

Need help? Contact Datadog support.

Further Reading

Knowledge Base

Tags Assigned

Tags are automatically assigned based on a variety of configuration options with regards to Google Cloud Platform and the Google Compute Engine. The following tags are automatically assigned:

  • Zone
  • Instance-type
  • Instance-id
  • Automatic-restart
  • On-host-maintenance
  • Project
  • Numeric_project_id
  • Name

Also, any hosts with <key>:<value> labels are tagged accordingly.