Temporal Cloud - OpenMetrics

Supported OS Linux Windows Mac OS

Integration version1.0.0

Overview

Temporal Cloud is a scalable platform for orchestrating complex workflows, enabling developers to focus on building applications, without worrying about fault tolerance and consistency.

This integration brings Temporal Cloud metrics into Datadog, offering insights into system health, workflow efficiency, task execution, and performance bottlenecks. This integration uses the OpenMetrics endpoint from Temporal Cloud to fetch metrics.

Setup

Generate an API Key

  1. Log into Temporal Cloud with an account owner or global admin role.
  2. Create a Service Account with the “Metrics Read-Only” role.
  3. Generate a new API Key for the newly created Service Account.

Connect your Temporal Cloud account to Datadog

  1. Add the newly created API Key to the Datadog integration form.

  2. Optionally, add a comma-separated list of namespaces you want the metrics to be fetched for. If left empty, Datadog will try to fetch the metrics for all the available namespaces.

    ParametersDescription
    API KeyAPI Key for a Service account with “Metrics Read-Only” role
    Namespaces to includeComma-separated list of namespaces to filter the metrics
  3. Save the changes

Data Collected

Metrics

temporal.cloud.v1_action_limit
(gauge)
The current action per second limit for a namespace
temporal.cloud.v1_approximate_backlog
(gauge)
Approximate number of tasks in a task queue
temporal.cloud.v1_frontend_rps_limit
(gauge)
The current frontend requests per second limit for a namespace
temporal.cloud.v1_frontend_service_error
(gauge)
The number of frontend service gRPC errors per second
temporal.cloud.v1_frontend_service_pending_requests
(gauge)
The number of pollers that are waiting for a task
temporal.cloud.v1_frontend_service_request
(gauge)
The number of RPC requests received by the service per second
temporal.cloud.v1_namespace_open_workflows
(gauge)
The number of open workflows in a namespace
temporal.cloud.v1_no_poller_tasks
(gauge)
The per second rate of tasks added to the task queue with no poller
temporal.cloud.v1_operations
(gauge)
The number of operations per second for a namespace
temporal.cloud.v1_operations_limit
(gauge)
The current operations per second limit for a namespace
temporal.cloud.v1_operations_throttled
(gauge)
The number of throttled operations per second for a namespace
temporal.cloud.v1_poll_success
(gauge)
The number of successfully matched tasks per second
temporal.cloud.v1_poll_success_sync
(gauge)
The number of successfully sync matched tasks per second
temporal.cloud.v1_poll_timeout
(gauge)
The per second rate of occurrences where no tasks are available for a poller before timing out
temporal.cloud.v1_poller_limit
(gauge)
The current concurrent task poller limit for a namespace
temporal.cloud.v1_replication_lag_p50
(gauge)
The 50th percentile replication lag in seconds
temporal.cloud.v1_replication_lag_p95
(gauge)
The 95th percentile replication lag in seconds
temporal.cloud.v1_replication_lag_p99
(gauge)
The 99th percentile replication lag in seconds
temporal.cloud.v1_resource_exhausted_error
(gauge)
The number of resource exhaustion service errors per second
temporal.cloud.v1_schedule_action_success
(gauge)
The number of successfully scheduled workflow executions per second
temporal.cloud.v1_schedule_buffer_overruns
(gauge)
The per second rate of the number of times an average schedule run length is greater than average schedule interval while a buffer_all overlap policy is configured
temporal.cloud.v1_schedule_missed_catchup_window
(gauge)
The per second rate of the number of skipped scheduled executions when workflows were delayed longer than the catchup window
temporal.cloud.v1_schedule_rate_limited
(gauge)
The per second rate of the number of scheduled workflows that were delayed due to exceeding a rate limit
temporal.cloud.v1_service_latency_p50
(gauge)
The 50th percentile latency of temporal service requests in seconds
temporal.cloud.v1_service_latency_p95
(gauge)
The 95th percentile latency of temporal service requests in seconds
temporal.cloud.v1_service_latency_p99
(gauge)
The 99th percentile latency of temporal service requests in seconds
temporal.cloud.v1_state_transition
(gauge)
The number of state transitions per second
temporal.cloud.v1_total_action
(gauge)
The number of actions taken per second
temporal.cloud.v1_total_action_throttled
(gauge)
The number of throttled actions per second
temporal.cloud.v1_workflow_cancel
(gauge)
The number of workflow cancellations per second
temporal.cloud.v1_workflow_continued_as_new
(gauge)
The number of workflows continued as new per second
temporal.cloud.v1_workflow_failed
(gauge)
The number of workflow failures per second
temporal.cloud.v1_workflow_schedule_to_close_latency_p50
(gauge)
The 50th percentile workflow schedule-to-close latency in seconds
temporal.cloud.v1_workflow_schedule_to_close_latency_p95
(gauge)
The 95th percentile workflow schedule-to-close latency in seconds
temporal.cloud.v1_workflow_schedule_to_close_latency_p99
(gauge)
The 99th percentile workflow schedule-to-close latency in seconds
temporal.cloud.v1_workflow_success
(gauge)
The number of successful workflows per second
temporal.cloud.v1_workflow_terminate
(gauge)
The number of terminated workflows per second
temporal.cloud.v1_workflow_timeout
(gauge)
The number of timed out workflows per second

Uninstallation

Remove any accounts created during installation. Note: Deleting an account will not remove data that was already collected by this integration, but will stop further collection of metrics for this account.

Support

Need help? Contact Datadog Support.

Further Reading