Amazon Elasticsearch
New announcements from Dash: Incident Management, Continuous Profiler, and more! New announcements from Dash!

Amazon Elasticsearch

Crawler Crawler

Overview

Amazon Elasticsearch Service is a managed service that makes it easy to deploy, operate, and scale Elasticsearch in the AWS Cloud.

Enable this integration to see custom tags and metrics for your ES clusters in Datadog.

Setup

Installation

If you haven’t already, set up the Amazon Web Services integration first.

Metric collection

  1. In the AWS integration tile, ensure that ES is checked under metric collection.
  2. Add those permissions to your Datadog IAM policy in order to collect Amazon ES metrics:

    • es:ListTags: Adds custom ES domain tags to ES metrics
    • es:ListDomainNames: Lists all Amazon ES domains owned by the current user in the active region.
    • es:DescribeElasticsearchDomains: Collects the domain ID, domain service endpoint, and domain ARN for all Domains as tags.

    For more information on ES policies, review the documentation on the AWS website.

  3. Install the Datadog - AWS ES integration.

Log collection

Enable logging

Configure Amazon Elasticsearch to send logs either to a S3 bucket or to Cloudwatch.

Note: If you log to a S3 bucket, make sure that amazon_elasticsearch is set as Target prefix.

Send logs to Datadog

  1. If you haven’t already, set up the Datadog log collection AWS Lambda function.
  2. Once the lambda function is installed, manually add a trigger on the S3 bucket or Cloudwatch log group that contains your Amazon Elasticsearch logs in the AWS console:

Data Collected

Metrics

aws.es.2xx
(count)
The number of requests to a domain with the HTTP response code 2xx.
Shown as request
aws.es.2xx.average
(gauge)
The average number of requests to a domain with the HTTP response code 2xx.
Shown as request
aws.es.3xx
(count)
The number of requests to a domain with the HTTP response code 3xx.
Shown as request
aws.es.3xx.average
(gauge)
The average number of requests to a domain with the HTTP response code 3xx.
Shown as request
aws.es.4xx
(count)
The number of requests to a domain with the HTTP response code 4xx.
Shown as request
aws.es.4xx.average
(gauge)
The average number of requests to a domain with the HTTP response code 4xx.
Shown as request
aws.es.5xx
(count)
The number of requests to a domain with the HTTP response code 5xx.
Shown as request
aws.es.5xx.average
(gauge)
The average number of requests to a domain with the HTTP response code 5xx.
Shown as request
aws.es.alerting_degraded
(gauge)
Indicates whether the ES alerting service is degraded. A value of 0 means 'No'. A value of 1 means 'Yes'.
aws.es.alerting_index_exists
(gauge)
A value of 1 means the .opendistro-alerting-config index exists. This will be 0 until you use the alerting feature for the first time.
aws.es.alerting_index_statusgreen
(gauge)
The health of the index. A value of 1 means green. A value of 0 means that the index either doesn't exist or isn't green.
aws.es.alerting_index_statusred
(gauge)
The health of the index. A value of 1 means red. A value of 0 means that the index either doesn't exist or isn't red.
aws.es.alerting_index_statusyellow
(gauge)
The health of the index. A value of 1 means yellow. A value of 0 means that the index either doesn't exist or isn't yellow.
aws.es.alerting_nodes_not_on_schedule
(gauge)
A value of 1 means some jobs are not running on schedule.
aws.es.alerting_nodes_on_schedule
(gauge)
A value of 1 means that all alerting jobs are running on schedule (or that no alerting jobs exist).
aws.es.alerting_scheduled_job_enabled
(gauge)
A value of 1 means that the opendistro.scheduled_jobs.enabled cluster setting is true. A value of 0 means it is false, and scheduled jobs are disabled.
aws.es.anomaly_detection_failure_count
(count)
The number of failed requests to detect anomalies.
Shown as error
aws.es.anomaly_detection_plugin_unhealthy
(gauge)
A value of 1 means that the anomaly detection plugin is not functioning properly.
aws.es.anomaly_detection_request_count
(count)
The number of requests to detect anomalies.
Shown as request
aws.es.anomaly_detectors_index_status_index_exists
(gauge)
A value of 1 means that the .opendistro-anomaly-detectors index exists. Until you use the anomaly detection feature for the first time, this value remains 0.
aws.es.anomaly_detectors_index_statusred
(gauge)
A value of 1 means that the .opendistro-anomaly-detectors index is red. Until you use the anomaly detection feature for the first time, this value remains 0.
aws.es.anomaly_results_index_status_index_exists
(gauge)
A value of 1 means the index that the .opendistro-anomaly-results alias points to exists. Until you use the anomaly detection feature for the first time, this value remains 0.
aws.es.anomaly_results_index_statusred
(gauge)
A value of 1 means the index that the .opendistro-anomaly-results alias points to is red. Until you use the anomaly detection feature for the first time, this value remains 0.
aws.es.automated_snapshot_failure
(gauge)
The number of failed automated snapshots for the cluster.
Shown as error
aws.es.automated_snapshot_failure.maximum
(gauge)
The maximum number of failed automated snapshots for the cluster.
Shown as error
aws.es.automated_snapshot_failure.minimum
(gauge)
The minimum number of failed automated snapshots for the cluster.
Shown as error
aws.es.cluster_index_writes_blocked
(gauge)
Indicates whether your cluster is accepting or blocking incoming write requests. A value of 0 means that the cluster is accepting requests. A value of 1 means that it is blocking requests.
aws.es.cluster_statusgreen
(gauge)
Indicates whether all index shards are allocated to nodes in the cluster.
aws.es.cluster_statusgreen.maximum
(gauge)
Indicates maximum of index shards allocated to nodes in the cluster.
aws.es.cluster_statusgreen.minimum
(gauge)
Indicates minimum of index shards allocated to nodes in the cluster.
aws.es.cluster_statusred
(gauge)
Indicates whether both primary and replica shards of at least one index are not allocated to nodes in a cluster.
aws.es.cluster_statusred.maximum
(gauge)
Indicates maximum of whether both primary and replica shards of at least one index are not allocated to nodes in a cluster.
aws.es.cluster_statusred.minimum
(gauge)
Indicates minimum of whether both primary and replica shards of at least one index are not allocated to nodes in a cluster.
aws.es.cluster_statusyellow
(gauge)
Indicates whether replica shards are not allocated to nodes in a cluster.
aws.es.cluster_statusyellow.maximum
(gauge)
Indicates the maximum of whether replica shards are not allocated to nodes in a cluster.
aws.es.cluster_statusyellow.minimum
(gauge)
Indicates the minimum of whether replica shards are not allocated to nodes in a cluster.
aws.es.cluster_used_space
(gauge)
The total used space, in MiB, for the cluster.
Shown as mebibyte
aws.es.cluster_used_space.maximum
(gauge)
The maximum used space, in MiB, for the cluster.
Shown as mebibyte
aws.es.cluster_used_space.minimum
(gauge)
The minimum used space, in MiB, for the cluster.
Shown as mebibyte
aws.es.cpucredit_balance
(gauge)
The remaining CPU credits available for data nodes in the cluster.
aws.es.cpuutilization
(gauge)
The average percentage of CPU resources used across all nodes in the cluster.
Shown as percent
aws.es.cpuutilization.maximum
(gauge)
The maximum percentage of CPU resources used by any node in the cluster.
Shown as percent
aws.es.cpuutilization.minimum
(gauge)
The minimum percentage of CPU resources used by any node in the cluster.
Shown as percent
aws.es.cross_cluster_inbound_requests
(count)
Destination domain metric. Number of incoming connection requests received from the source domain.
Shown as request
aws.es.cross_cluster_outbound_connections
(gauge)
Source domain metric. Number of connected nodes. If this number drops to 0, then the connection is unhealthy.
aws.es.cross_cluster_outbound_requests
(count)
Source domain metric. Number of search requests sent to the destionation domain.
Shown as request
aws.es.deleted_documents
(gauge)
The total number of documents marked for deletion across all indices in the cluster.
Shown as document
aws.es.deleted_documents.maximum
(gauge)
The maximum number of documents marked for deletion across all indices in the cluster.
Shown as document
aws.es.deleted_documents.minimum
(gauge)
The minimum number of documents marked for deletion across all indices in the cluster.
Shown as document
aws.es.disk_queue_depth
(gauge)
The average number of pending input and output (I/O) requests for an EBS volume. across all nodes in the cluster
Shown as request
aws.es.disk_queue_depth.maximum
(gauge)
The maximum number for any node in the cluster of pending input and output (I/O) requests for an EBS volume.
Shown as request
aws.es.disk_queue_depth.minimum
(gauge)
The minimum number for any node in the cluster of pending input and output (I/O) requests for an EBS volume.
Shown as request
aws.es.elasticsearch_requests
(count)
The number of requests made to the Elasticsearch cluster.
Shown as request
aws.es.elasticsearch_requests.average
(gauge)
The average number of requests made to the Elasticsearch cluster.
Shown as request
aws.es.free_storage_space
(gauge)
The average free space, in megabytes, across all the data nodes in a cluster.
Shown as mebibyte
aws.es.free_storage_space.maximum
(gauge)
The free space, in megabytes, for the single data node with the most available free space in a cluster.
Shown as mebibyte
aws.es.free_storage_space.minimum
(gauge)
The free space, in megabytes, for the single data node with the least available free space in a cluster.
Shown as mebibyte
aws.es.free_storage_space.sum
(gauge)
The free space, in megabytes, for all data nodes in the cluster.
Shown as mebibyte
aws.es.hot_storage_space_utilization
(gauge)
The total amount of hot storage space that the cluster is using.
Shown as mebibyte
aws.es.hot_to_warm_migration_queue_size
(gauge)
The number of indices currently migrating from hot to warm storage.
aws.es.indexing_latency
(gauge)
The average time, in milliseconds, that it takes a shard to complete an indexing operation.
Shown as millisecond
aws.es.indexing_rate
(count)
The number of indexing operations per minute.
Shown as operation
aws.es.invalid_host_header_requests
(count)
The number of HTTP requests made to the Elasticsearch cluster that included an invalid (or missing) host header.
Shown as request
aws.es.invalid_host_header_requests.average
(gauge)
The average number of HTTP requests made to the Elasticsearch cluster that included an invalid (or missing) host header.
Shown as request
aws.es.jvmgcold_collection_count
(count)
The number of times that 'old generation' garbage collection has run. In a cluster with sufficient resources, this number should remain small and grow infrequently.
Shown as garbage collection
aws.es.jvmgcold_collection_time
(gauge)
The amount of time, in milliseconds, that the cluster has spent performing 'old generation' garbage collection.
Shown as millisecond
aws.es.jvmgcyoung_collection_count
(count)
The number of times that 'young generation' garbage collection has run. A large, ever-growing number of runs is a normal part of cluster operations.
Shown as garbage collection
aws.es.jvmgcyoung_collection_time
(gauge)
The amount of time, in milliseconds, that the cluster has spent performing 'young generation' garbage collection.
Shown as millisecond
aws.es.jvmmemory_pressure
(gauge)
The average percentage of the Java heap used for all data nodes in the cluster.
Shown as percent
aws.es.jvmmemory_pressure.maximum
(gauge)
The maximum percentage of the Java heap used by any data node in the cluster.
Shown as percent
aws.es.jvmmemory_pressure.minimum
(gauge)
The minimum percentage of the Java heap used by any data node in the cluster.
Shown as percent
aws.es.kibana_healthy_nodes
(gauge)
A health check for Kibana. A value of 1 indicates normal behavior. A value of 0 indicates that Kibana is inaccessible.
aws.es.kmskey_error
(gauge)
A value of 1 indicates that the KMS customer master key used to encrypt data at rest has been disabled. Only available for domains that encrypt data at rest.
aws.es.kmskey_inaccessible
(gauge)
A value of 1 indicates that the KMS customer master key used to encrypt data at rest has been deleted or revoked its grants to Amazon ES. Only available for domains that encrypt data at rest.
aws.es.master_cpucredit_balance
(gauge)
The remaining CPU credits available for dedicated master nodes in the cluster.
aws.es.master_cpuutilization
(gauge)
The maximum percentage of CPU resources used by the dedicated master nodes.
Shown as percent
aws.es.master_free_storage_space
(gauge)
This metric is not relevant and can be ignored. The service does not use master nodes as data nodes.
Shown as mebibyte
aws.es.master_jvmmemory_pressure
(gauge)
The maximum percentage of the Java heap used for all dedicated master nodes in the cluster.
Shown as percent
aws.es.master_reachable_from_node
(gauge)
A health check for MasterNotDiscovered exceptions. A value of 1 indicates normal behavior. A value of 0 indicates that /_cluster/health/ is failing.
aws.es.master_sys_memory_utilization
(gauge)
The percentage of the instance's memory that is in use.
Shown as percent
aws.es.models_checkpoint_index_status_index_exists
(gauge)
A value of 1 means that the .opendistro-anomaly-checkpoints index exists. Until you use the anomaly detection feature for the first time, this value remains 0.
aws.es.models_checkpoint_index_statusred
(gauge)
A value of 1 means that the .opendistro-anomaly-checkpoints index is red. Until you use the anomaly detection feature for the first time, this value remains 0.
aws.es.nodes
(gauge)
The number of nodes in the Amazon ES cluster.
Shown as node
aws.es.nodes.maximum
(gauge)
The maximum number of nodes in the Amazon ES cluster.
Shown as node
aws.es.nodes.minimum
(gauge)
The minimum number of nodes in the Amazon ES cluster.
Shown as node
aws.es.read_iops
(gauge)
The number of input and output (I/O) operations per second for read operations on EBS volumes.
Shown as operation
aws.es.read_iops.maximum
(gauge)
The maximum number for any node of input and output (I/O) operations per second for read operations on EBS volumes.
Shown as operation
aws.es.read_iops.minimum
(gauge)
The minimum number for any node of input and output (I/O) operations per second for read operations on EBS volumes.
Shown as operation
aws.es.read_latency
(gauge)
The latency, in seconds, for read operations on EBS volumes.
Shown as second
aws.es.read_latency.maximum
(gauge)
The maximum latency for any node, in seconds, for read operations on EBS volumes.
Shown as second
aws.es.read_latency.minimum
(gauge)
The minimum latency for any node, in seconds, for read operations on EBS volumes.
Shown as second
aws.es.read_throughput
(gauge)
The throughput, in bytes per second, for read operations on EBS volumes.
Shown as byte
aws.es.read_throughput.maximum
(gauge)
The maximum throughput for any node, in bytes per second, for read operations on EBS volumes.
Shown as byte
aws.es.read_throughput.minimum
(gauge)
The minimum throughput for any node, in bytes per second, for read operations on EBS volumes.
Shown as byte
aws.es.search_latency
(gauge)
The average time, in milliseconds, that it takes a shard to complete a search operation.
Shown as millisecond
aws.es.search_rate
(count)
The total number of search requests per minute for all shards on a node.
Shown as request
aws.es.searchable_documents
(gauge)
The total number of searchable documents across all indices in the cluster.
Shown as document
aws.es.searchable_documents.maximum
(gauge)
The maximum number of searchable documents across all indices in the cluster.
Shown as document
aws.es.searchable_documents.minimum
(gauge)
The minimum number of searchable documents across all indices in the cluster.
Shown as document
aws.es.sqldefault_cursor_request_count
(count)
The number of pagination requests to the _opendistro/_sql API.
Shown as request
aws.es.sqlfailed_request_count_by_cus_err
(count)
The number of requests to the _opendistro/_sql API that failed due to a client issue.
Shown as request
aws.es.sqlfailed_request_count_by_sys_err
(count)
The number of requests to the _opendistro/_sql API that failed due to a server problem or feature limitation.
Shown as request
aws.es.sqlrequest_count
(count)
The number of requests to the _opendistro/_sql API.
Shown as request
aws.es.sqlunhealthy
(gauge)
A value of 1 indicates that, in response to certain requests, the SQL plugin is returning 5xx response codes or passing invalid query DSL to Elasticsearch.
aws.es.sys_memory_utilization
(gauge)
The percentage of the instance's memory that is in use.
Shown as percent
aws.es.sys_memory_utilization.maximum
(gauge)
The maximum percentage of the instance's memory that is in use.
Shown as percent
aws.es.sys_memory_utilization.minimum
(gauge)
The minimum percentage of the instance's memory that is in use.
Shown as percent
aws.es.threadpool_bulk_queue
(count)
The number of queued tasks in the bulk thread pool.
Shown as task
aws.es.threadpool_bulk_rejected
(count)
The number of rejected tasks in the bulk thread pool.
Shown as task
aws.es.threadpool_bulk_threads
(gauge)
The size of the bulk thread pool.
aws.es.threadpool_forcemerge_queue
(count)
The number of queued tasks in the force merge thread pool.
Shown as task
aws.es.threadpool_forcemerge_rejected
(count)
The number of rejected tasks in the force merge thread pool.
Shown as task
aws.es.threadpool_forcemerge_threads
(gauge)
The size of the force merge thread pool.
aws.es.threadpool_index_queue
(count)
The number of queued tasks in the index thread pool.
Shown as task
aws.es.threadpool_index_rejected
(count)
The number of rejected tasks in the index thread pool.
Shown as task
aws.es.threadpool_index_threads
(gauge)
The size of the index thread pool.
aws.es.threadpool_merge_queue
(count)
The number of queued tasks in the merge thread pool.
Shown as task
aws.es.threadpool_merge_rejected
(count)
The number of rejected tasks in the merge thread pool.
Shown as task
aws.es.threadpool_merge_threads
(gauge)
The size of the merge thread pool.
aws.es.threadpool_search_queue
(count)
The number of queued tasks in the search thread pool.
Shown as task
aws.es.threadpool_search_rejected
(count)
The number of rejected tasks in the search thread pool.
Shown as task
aws.es.threadpool_search_threads
(gauge)
The size of the search thread pool.
aws.es.threadpool_write_queue
(count)
The number of queued tasks in the write thread pool.
Shown as task
aws.es.threadpool_write_rejected
(count)
The number of rejected tasks in the write thread pool.
Shown as task
aws.es.threadpool_write_threads
(gauge)
The size of the write thread pool.
aws.es.warm_cpuutilization
(gauge)
The percentage of CPU usage for UltraWarm nodes in the cluster.
Shown as percent
aws.es.warm_free_storage_space
(gauge)
The amount of free warm storage space in MiB.
Shown as mebibyte
aws.es.warm_jvmmemory_pressure
(gauge)
The maximum percentage of the Java heap used for the UltraWarm nodes.
Shown as percent
aws.es.warm_search_latency
(gauge)
The average time, in milliseconds, that it takes a shard on an UltraWarm node to complete a search operation.
Shown as millisecond
aws.es.warm_search_rate
(count)
The total number of search requests per minute for all shards on an UltraWarm node.
Shown as request
aws.es.warm_searchable_documents
(gauge)
The total number of searchable documents across all warm indices in the cluster.
Shown as document
aws.es.warm_storage_space_utilization
(gauge)
The total amount of warm storage space that the cluster is using.
Shown as mebibyte
aws.es.warm_sys_memory_utilization
(gauge)
The percentage of the warm node's memory that is in use.
Shown as percent
aws.es.warm_to_hot_migration_queue_size
(gauge)
The number of indices currently migrating from warm to hot storage.
aws.es.write_iops
(gauge)
The number of input and output (I/O) operations per second for write operations on EBS volumes.
Shown as operation
aws.es.write_iops.maximum
(gauge)
The maximum number for any node of input and output (I/O) operations per second for write operations on EBS volumes.
Shown as operation
aws.es.write_iops.minimum
(gauge)
The minimum number for any node of input and output (I/O) operations per second for write operations on EBS volumes.
Shown as operation
aws.es.write_latency
(gauge)
The latency, in seconds, for write operations on EBS volumes.
Shown as second
aws.es.write_latency.maximum
(gauge)
The maximum latency for any node, in seconds, for write operations on EBS volumes.
Shown as second
aws.es.write_latency.minimum
(gauge)
The minimum latency for any node, in seconds, for write operations on EBS volumes.
Shown as second
aws.es.write_throughput
(gauge)
The throughput, in bytes per second, for write operations on EBS volumes.
Shown as byte
aws.es.write_throughput.maximum
(gauge)
The maximum throughput for any node, in bytes per second, for write operations on EBS volumes.
Shown as byte
aws.es.write_throughput.minimum
(gauge)
The minimum throughput for any node, in bytes per second, for write operations on EBS volumes.
Shown as byte

Each of the metrics retrieved from AWS will be assigned the same tags that appear in the AWS console, including but not limited to host name, security-groups, and more.

Events

The AWS ES integration does not include any events.

Service Checks

The AWS ES integration does not include any service checks.

Troubleshooting

Need help? Contact Datadog support.