Join us at the Dash conference! July 16-17, NYC

Presto

Agent Check Agent Check

Supported OS: Linux Mac OS Windows

Overview

This check collects Presto metrics, for example:

  • Overall activity metrics: completed/failed queries, data input/output size, execution time
  • Performance metrics: cluster memory, input CPU, execution CPU time

Setup

Installation

The Presto check is included in the Datadog Agent package. No additional installation is needed on your server. Install the Agent on each Coordinator and Worker node from which you wish to collect usage and performance metrics.

Configuration

  1. Edit the presto.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting your presto performance data. See the sample presto.d/conf.yaml for all available configuration options.

This check has a limit of 350 metrics per instance. The number of returned metrics is indicated in the info page. You can specify the metrics you are interested in by editing the configuration below. To learn how to customize the metrics to collect visit the JMX Checks documentation for more detailed instructions. If you need to monitor more metrics, contact Datadog support.

  1. Restart the Agent.

Metric Collection

Use the default configuration of your presto.d/conf.yaml file to activate the collection of your Presto metrics. See the sample presto.d/conf.yaml for all available configuration options.

Log Collection

Available for Agent >6.0

  • Collecting logs is disabled by default in the Datadog Agent, enable it in your datadog.yaml file:
logs_enabled: true
  • Add this configuration block to your presto.d/conf.yaml file to start collecting your Presto logs:
logs:
  - type: file
    path: /var/log/presto/*.log
    source: presto
    sourcecategory: database
    service: <SERVICE_NAME>

Change the path and service parameter values and configure them for your environment. See the sample presto.d/conf.yaml for all available configuration options.

Restart the Agent.

Validation

Run the Agent’s status subcommand and look for presto under the Checks section.

Data Collected

Metrics

presto.execution.abandoned_queries.one_minute.count
(gauge)
Abandoned queries - one minute count.
shown as query
presto.execution.abandoned_queries.one_minute.rate
(gauge)
Abandoned queries - one minute rate.
shown as query
presto.execution.abandoned_queries.total_count
(gauge)
Abandoned queries - total count.
shown as query
presto.execution.canceled_queries.one_minute.count
(gauge)
Canceled queries - one minute count.
shown as query
presto.execution.canceled_queries.one_minute.rate
(gauge)
Canceled queries - one minute queries per second.
shown as query
presto.execution.canceled_queries.total_count
(gauge)
Canceled queries - total count.
shown as query
presto.execution.completed_queries.one_minute.count
(gauge)
Completed queries - one minute count.
shown as query
presto.execution.completed_queries.one_minute.rate
(gauge)
Completed queries - one minute queries per second.
shown as query
presto.execution.completed_queries.total_count
(gauge)
Completed queries - total count.
shown as query
presto.execution.consumed_cpu_time_secs.one_minute.count
(gauge)
CPU (processing) time consumed - one minute count (seconds).
shown as second
presto.execution.consumed_cpu_time_secs.one_minute.rate
(gauge)
CPU (processing) time consumed - one minute rate.
shown as second
presto.execution.consumed_cpu_time_secs.total_count
(gauge)
CPU (processing) time consumed - total count (seconds).
shown as second
presto.execution.cpu_input_byte_rate.all_time.avg
(gauge)
Distribution of query input data rates (cpu) - all time average bytes per second.
shown as byte
presto.execution.cpu_input_byte_rate.all_time.p75
(gauge)
Distribution of query input data rates (cpu) - all time bytes per second - p75.
shown as byte
presto.execution.cpu_input_byte_rate.all_time.p95
(gauge)
Distribution of query input data rates (cpu) - all time bytes per second - p95.
shown as byte
presto.execution.cpu_input_byte_rate.one_minute.avg
(gauge)
Distribution of query input data rates (cpu) - one minute average bytes per second.
shown as byte
presto.execution.cpu_input_byte_rate.one_minute.count
(gauge)
Distribution of query input data rates (cpu) - one minute count.
shown as byte
presto.execution.cpu_input_byte_rate.one_minute.max
(gauge)
Distribution of query input data rates (cpu) - one minute max bytes per second.
shown as byte
presto.execution.cpu_input_byte_rate.one_minute.min
(gauge)
Distribution of query input data rates (cpu) - one minute min bytes per second.
shown as byte
presto.execution.cpu_input_byte_rate.one_minute.p75
(gauge)
Distribution of query input data rates (cpu) - one minute bytes per second - p75.
shown as byte
presto.execution.cpu_input_byte_rate.one_minute.p95
(gauge)
Distribution of query input data rates (cpu) - one minute bytes per second - p95.
shown as byte
presto.execution.cpu_input_byte_rate.one_minute.total
(gauge)
Distribution of query input data rates (cpu) - one minute total bytes per second.
shown as byte
presto.execution.execution_time.all_time.avg
(gauge)
Query execution time (millisecond) - all time average.
shown as millisecond
presto.execution.execution_time.all_time.count
(gauge)
Query execution time (millisecond) - all time count.
shown as millisecond
presto.execution.execution_time.all_time.max
(gauge)
Query execution time (millisecond) - all time max.
shown as millisecond
presto.execution.execution_time.all_time.min
(gauge)
Query execution time (millisecond) - all time min.
shown as millisecond
presto.execution.execution_time.all_time.p75
(gauge)
Query execution time (millisecond) - all time - p75.
shown as millisecond
presto.execution.execution_time.all_time.p95
(gauge)
Query execution time (millisecond) - all time - p95.
shown as millisecond
presto.execution.execution_time.one_minute.avg
(gauge)
Query execution time (millisecond) - one minute average.
shown as millisecond
presto.execution.execution_time.one_minute.max
(gauge)
Query execution time (millisecond) - one minute max.
shown as millisecond
presto.execution.execution_time.one_minute.min
(gauge)
Query execution time (millisecond) - one minute min.
shown as millisecond
presto.execution.execution_time.one_minute.p75
(gauge)
Query execution time (millisecond) - one minute p75.
shown as millisecond
presto.execution.execution_time.one_minute.p95
(gauge)
Query execution time (millisecond) - one minute p95.
shown as millisecond
presto.execution.executor.active_count
(gauge)
presto.execution.executor.completed_task_count
(gauge)

shown as task
presto.execution.executor.core_pool_size
(gauge)
presto.execution.executor.task_count
(gauge)

shown as task
presto.execution.executor.pool_size
(gauge)
presto.execution.executor.queued_task_count
(gauge)
presto.execution.executor.blocked_splits
(gauge)
Blocked splits count.
shown as split
presto.execution.executor.running_splits
(gauge)
Running splits count.
shown as split
presto.execution.executor.total_splits
(gauge)
Total splits count.
shown as split
presto.execution.executor.waiting_splits
(gauge)
Waiting splits count.
shown as split
presto.execution.executor.processor_executor.queued_task_count
(gauge)
Queued task count.
shown as task
presto.execution.external_failures.one_minute.count
(gauge)
Failed queries (external) - one minute count.
shown as query
presto.execution.external_failures.one_minute.rate
(gauge)
Failed queries (external) - one minute failures per second.
shown as query
presto.execution.external_failures.total_count
(gauge)
Failed queries (external) - total count.
shown as query
presto.execution.failed_queries.one_minute.count
(gauge)
Failed queries - one minute count.
shown as query
presto.execution.failed_queries.one_minute.rate
(gauge)
Failed queries - one minute queries per second.
shown as query
presto.execution.failed_queries.total_count
(gauge)
Failed queries - total count.
shown as query
presto.execution.input_data_size.one_minute.count
(gauge)
Input data (bytes) - one minute count.
shown as byte
presto.execution.input_data_size.one_minute.rate
(gauge)
Input data (bytes) - one minute bytes per second.
shown as byte
presto.execution.input_data_size.total_count
(gauge)
Input data (bytes) - total count.
shown as byte
presto.execution.input_positions.one_minute.count
(gauge)
Input positions (rows) - one minute count.
shown as row
presto.execution.input_positions.one_minute.rate
(gauge)
Input positions (rows) - one minute rows per second.
shown as row
presto.execution.input_positions.total_count
(gauge)
Input positions (rows) - total count.
shown as row
presto.execution.internal_failures.one_minute.count
(gauge)
Failed queries (internal) - one minute count.
shown as query
presto.execution.internal_failures.one_minute.rate
(gauge)
Failed queries (internal) - one minute queries per second.
shown as query
presto.execution.internal_failures.total_count
(gauge)
Failed queries (internal) - total count.
shown as query
presto.execution.insufficient_resources_failures.one_minute.count
(gauge)
Insufficient resources failures one minute count.
presto.execution.insufficient_resources_failures.one_minute.rate
(gauge)
Insufficient resources failures one minute failures per second.
presto.execution.insufficient_resources_failures.total_count
(gauge)
Insufficient resources failures total count.
presto.execution.management_executor.active_count
(gauge)
presto.execution.management_executor.completed_task_count
(gauge)

shown as task
presto.execution.management_executor.queued_task_count
(gauge)

shown as task
presto.execution.output_data_size.one_minute.count
(gauge)
Output data (bytes) - one minute count.
shown as byte
presto.execution.output_data_size.one_minute.rate
(gauge)
Output data (bytes) - one minute bytes per second.
shown as byte
presto.execution.output_data_size.total_count
(gauge)
Output data (bytes) - total count.
shown as byte
presto.execution.output_positions.one_minute.count
(gauge)
Output positions (rows) - one minute count.
shown as row
presto.execution.output_positions.one_minute.rate
(gauge)
Output positions (rows) - one minute rows per second.
shown as row
presto.execution.output_positions.total_count
(gauge)
Output positions (rows) - total count.
shown as row
presto.execution.running_queries
(gauge)
Active queries.
shown as query
presto.execution.started_queries.one_minute.count
(gauge)
Queries started - one minute count.
shown as query
presto.execution.started_queries.one_minute.rate
(gauge)
Queries started - one minute queries per second.
shown as query
presto.execution.started_queries.total_count
(gauge)
Queries started - total count.
shown as query
presto.execution.task_notification_executor.active_count
(gauge)
presto.execution.task_notification_executor.completed_task_count
(gauge)

shown as task
presto.execution.task_notification_executor.pool_size
(gauge)
presto.execution.task_notification_executor.queued_task_count
(gauge)

shown as task
presto.execution.user_error_failures.one_minute.count
(gauge)
Failed queries (user error) - one minute count.
shown as query
presto.execution.user_error_failures.one_minute.rate
(gauge)
Failed queries (user error) - one minute queries per second.
shown as query
presto.execution.user_error_failures.total_count
(gauge)
Failed queries (user error) - total count.
shown as query
presto.execution.wall_input_bytes_rate.one_minute.avg
(gauge)
Input data rate (bytes) - one minute average.
shown as byte
presto.execution.wall_input_bytes_rate.one_minute.max
(gauge)
Input data rate (bytes) - one minute max.
shown as byte
presto.execution.wall_input_bytes_rate.one_minute.min
(gauge)
Input data rate (bytes) - one minute min.
shown as byte
presto.execution.wall_input_bytes_rate.one_minute.p75
(gauge)
Input data rate (bytes) - one minute p75.
shown as byte
presto.execution.wall_input_bytes_rate.one_minute.p95
(gauge)
Input data rate (bytes) - one minute p95.
shown as byte
presto.failure_detector.active_count
(gauge)
Active node count.
shown as node
presto.memory.assigned_queries
(gauge)
Memory (assigned queries).
shown as byte
presto.memory.cluster_memory_bytes
(gauge)
Cluster memory (bytes).
shown as byte
presto.memory.blocked_nodes
(gauge)
Memory (blocked nodes).
shown as byte
presto.memory.free_bytes
(gauge)
Memory (free bytes).
shown as byte
presto.memory.free_distributed_bytes
(gauge)
Memory (free distributed bytes).
shown as byte
presto.memory.max_bytes
(gauge)
Memory (max bytes).
shown as byte
presto.memory.nodes
(gauge)
Memory (nodes).
shown as byte
presto.memory.reserved_bytes
(gauge)
Memory (reserved bytes).
shown as byte
presto.memory.reserved_distributed_bytes
(gauge)
Memory (reserved distributed bytes).
shown as byte
presto.memory.reserved_revocable_bytes
(gauge)
Memory (reserved revocable bytes).
shown as byte
presto.memory.reserved_revocable_distributed_bytes
(gauge)
Memory (reserved revocable distributed bytes).
shown as byte
presto.memory.total_distributed_bytes
(gauge)
Memory (total distributed bytes).
shown as byte

Service Checks

presto.can_connect Returns CRITICAL if the Agent is unable to connect to and collect metrics from the monitored Presto instance. Returns OK otherwise.

Events

Presto does not include any events.

Troubleshooting

Need help? Contact Datadog support.


Mistake in the docs? Feel free to contribute!