KEDA

Supported OS Linux Windows Mac OS

통합 버전1.0.1

이 페이지는 아직 영어로 제공되지 않습니다. 번역 작업 중입니다.
현재 번역 프로젝트에 대한 질문이나 피드백이 있으신 경우 언제든지 연락주시기 바랍니다.

Overview

This check monitors KEDA through the Datadog Agent. For more information, see KEDA monitoring.

Setup

Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the Autodiscovery Integration Templates for guidance on applying these instructions.

Installation

Starting from Agent release 7.62.0, the KEDA check is included in the Datadog Agent package. No additional installation is needed in your environment.

Configuration

KEDA consists of multiple components, including the Admissions Controller, Metrics API Server, and the Operator. Each of these components can be scraped for metrics. Prometheus-formatted metrics are available at /metrics on port 8080 for each component.

To expose these metrics, ensure that Prometheus scraping is enabled for each component. For example, in Helm, you need to enable the following Helm configuration options:

prometheus.metricServer.enabled
prometheus.operator.enabled
prometheus.webhooks.enabled

Alternatively, you can achieve this by providing the following configuration in a values.yaml file used during the Helm installation of KEDA:

prometheus:
  metricServer:
    enabled: true
  operator:
    enabled: true
  webhooks:
    enabled: true

For the Agent to start collecting metrics, the KEDA controller pods need to be annotated. For more information about annotations, refer to the Autodiscovery Integration Templates for guidance. You can find additional configuration options by reviewing the sample keda.d/conf.yaml.

Note: The listed metrics can only be collected if they are available. Some metrics are generated only when certain actions are performed. For example, the keda.scaler.detail_errors.count metric is exposed only after a scaler encountered an error.

The only parameter required for configuring the KEDA check is:

openmetrics_endpoint: This parameter should be set to the location where the Prometheus-formatted metrics are exposed. The default port is 8080. In containerized environments, %%host%% should be used for host autodetection.

apiVersion: v1
kind: Pod
# (...)
metadata:
  name: '<POD_NAME>'
  annotations:
    ad.datadoghq.com/<CONTAINER_NAME>.checks: | # <CONTAINER_NAME> Needs to match the container name at the bottom. 'keda-operator-metrics-apiserver' in this example.
      {
        "keda": {
          "init_config": {},
          "instances": [
            {
              "openmetrics_endpoint": "http://%%host%%:8080/metrics"
            }
          ]
        }
      }
    # (...)
spec:
  containers:
    - name: <CONTAINER_NAME> # e.g. 'keda-operator-metrics-apiserver' in the Metrics API Server
# (...)

To collect metrics from each KEDA component, the above pod annotations need to be applied to each KEDA component pod. Example pod annotations for the Operator pod:

# Pod manifest from a basic Helm chart deployment
apiVersion: v1
kind: Pod
# (...)
metadata:
  name: 'keda-operator'
  annotations:
    ad.datadoghq.com/<CONTAINER_NAME>.checks: |
      {
        "keda": {
          "init_config": {},
          "instances": [
            {
              "openmetrics_endpoint": "http://%%host%%:8080/metrics"
            }
          ]
        }
      }      
    # (...)
spec:
  containers:
    - name: keda-operator
# (...)

Log collection

Available for Agent versions >6.0

Kyverno logs can be collected from the different KEDA pods through Kubernetes. Collecting logs is disabled by default in the Datadog Agent. To enable it, see Kubernetes Log Collection.

See the Autodiscovery Integration Templates for guidance on applying the parameters below.

Parameter	Value
`<LOG_CONFIG>`	`{"source": "keda", "service": "<SERVICE_NAME>"}`

Validation

Run the Agent’s status subcommand and look for keda under the Checks section.

Data Collected

Metrics


keda.aggregator_discovery_aggregation.count (count)	Number of times discovery was aggregated.
keda.apiserver_audit_event.count (count)	Number of audit events generated and sent to the audit backend.
keda.apiserver_audit_requests_rejected.count (count)	Number of API server requests rejected due to an error in audit logging backend. Shown as request
keda.apiserver_client_certificate_expiration_seconds.bucket (count)	Number of certificates observed in the apiserver_client_certificate_expiration_seconds histogram tagged by upper_bound tags.
keda.apiserver_client_certificate_expiration_seconds.count (count)	Number of certificates observed in the apiserver_client_certificate_expiration_seconds histogram.
keda.apiserver_client_certificate_expiration_seconds.sum (count)	The sum of duration of certificates remaining expiration time in the apiserver_client_certificate_expiration_seconds histogram. Shown as second
keda.apiserver_current_inflight_requests (gauge)	Maximal number of currently used inflight request limit of this API server per request kind in last second. Shown as request
keda.apiserver_delegated_authz_request.count (count)	Number of HTTP requests partitioned by status code. Shown as request
keda.apiserver_delegated_authz_request_duration_seconds.bucket (count)	Number of observations apiserver_delegated_authz_request_duration_seconds histogram. Broken down by status code and upper_bound duration tags.
keda.apiserver_delegated_authz_request_duration_seconds.count (count)	Number of observations apiserver_delegated_authz_request_duration_seconds histogram. Broken down by status code.
keda.apiserver_delegated_authz_request_duration_seconds.sum (count)	The sum of duration of requests in the apiserver_delegated_authz_request_duration_seconds histogram. Shown as second
keda.apiserver_envelope_encryption_dek_cache_fill_percent (gauge)	Percent of the cache slots currently occupied by cached DEKs.
keda.apiserver_flowcontrol_read_vs_write_current_requests.bucket (count)	The number of requests (as a fraction of the relevant limit) waiting or in execution at the end of each nanosecond. Tagged by upper_bound tags. Shown as request
keda.apiserver_flowcontrol_read_vs_write_current_requests.count (count)	The number of requests (as a fraction of the relevant limit) observed at the end of each nanosecond. Shown as request
keda.apiserver_flowcontrol_read_vs_write_current_requests.sum (count)	The sum of all observed request fractions at the end of each nanosecond.
keda.apiserver_flowcontrol_seat_fair_frac (gauge)	Fair fraction of server’s concurrency to allocate to each priority level that can use it.
keda.apiserver_request.count (count)	Number of API server requests broken out for each verb, dry run value, group, version, resource, scope, component, and HTTP response code. Shown as request
keda.apiserver_request_duration_seconds.bucket (count)	The number of requests used to calculate the response time tagged by upper_bound tags. Shown as request
keda.apiserver_request_duration_seconds.count (count)	The number of requests used to calculate the response time. Shown as request
keda.apiserver_request_duration_seconds.sum (count)	The sum in response time in seconds for each verb, dry run value, group, version, resource, subresource, scope and component. Shown as second
keda.apiserver_request_filter_duration_seconds.bucket (count)	The number of observations used to calculate the request filter latency tagged by upper_bound tags.
keda.apiserver_request_filter_duration_seconds.count (count)	The number of observations used to calculate the request filter latency.
keda.apiserver_request_filter_duration_seconds.sum (count)	Request filter latency distribution in seconds, for each filter type. Shown as second
keda.apiserver_request_sli_duration_seconds.bucket (count)	The number of observations used to calculate SLI response time tagged by upper_bound tags.
keda.apiserver_request_sli_duration_seconds.count (count)	The number of observations used to calculate SLI response time.
keda.apiserver_request_sli_duration_seconds.sum (count)	The sum of the response time (not counting webhook duration and priority & fairness queue wait times) in seconds for each verb, group, version, resource, subresource, scope and component. Shown as second
keda.apiserver_request_slo_duration_seconds.bucket (count)	The number of observations used to calculate SLO response time tagged by upper_bound tags.
keda.apiserver_request_slo_duration_seconds.count (count)	The number of observations used to calculate SLO response time.
keda.apiserver_request_slo_duration_seconds.sum (count)	The sum of the response time (not counting webhook duration and priority & fairness queue wait times) in seconds for each verb, group, version, resource, subresource, scope and component. Shown as second
keda.apiserver_response_sizes.bucket (count)	Number of responses used to calculate the response size tagged by upper_bound tags.
keda.apiserver_response_sizes.count (count)	Number of responses used to calculate the response size.
keda.apiserver_response_sizes.sum (count)	The sum of sizes of responses in bytes for each group, version, verb, resource, subresource, scope and component. Shown as byte
keda.apiserver_storage_data_key_generation_duration_seconds.bucket (count)	Number of observations used to calculate data encryption key(DEK) duration tagged by upper_bound tags.
keda.apiserver_storage_data_key_generation_duration_seconds.count (count)	Number of observations used to calculate data encryption key(DEK) duration.
keda.apiserver_storage_data_key_generation_duration_seconds.sum (count)	Time in seconds used for data encryption key(DEK) generation operations. Shown as second
keda.apiserver_storage_data_key_generation_failures.count (count)	Number of failed data encryption key(DEK) generation operations.
keda.apiserver_storage_envelope_transformation_cache_misses.count (count)	Number of cache misses while accessing key decryption key(KEK).
keda.apiserver_tls_handshake_errors.count (count)	Number of requests dropped with ‘TLS handshake error from’ error.
keda.apiserver_webhooks_x509_insecure_sha1.count (count)	Counts the number of requests to servers with insecure SHA1 signatures in their serving certificate OR the number of connection failures due to the insecure SHA1 signatures (either/or, based on the runtime environment).
keda.apiserver_webhooks_x509_missing_san.count (count)	Counts the number of requests to servers missing SAN extension in their serving certificate OR the number of connection failures due to the lack of x509 certificate SAN extension missing (either/or, based on the runtime environment).
keda.authenticated_user_requests.count (count)	Number of authenticated requests broken out by username. Shown as request
keda.authentication_attempts.count (count)	Number of authenticated attempts.
keda.authentication_duration_seconds.bucket (count)	Number of observations used to calculate authentication duration tagged by upper_bound bucket tags.
keda.authentication_duration_seconds.count (count)	Number of observations used to calculate authentication duration.
keda.authentication_duration_seconds.sum (count)	Authentication duration in seconds broken out by result. Shown as second
keda.authorization_attempts.count (count)	Number of authorization attempts broken down by result. It can be either ‘allowed’, ‘denied’, ’no-opinion’ or ’error’.
keda.authorization_duration_seconds.bucket (count)	The number of events used to calculate authorization duration tagged by upper_bound tags.
keda.authorization_duration_seconds.count (count)	The number of events used to calculate authorization duration.
keda.authorization_duration_seconds.sum (count)	Authorization duration in seconds broken out by result. Shown as second
keda.build_info (gauge)	Info metric, with static information about KEDA build like: version, git commit and Golang runtime info.
keda.cardinality_enforcement_unexpected_categorizations.count (count)	The count of unexpected categorizations during cardinality enforcement.
keda.certwatcher.read_certificate.count (count)	Number of certificate reads.
keda.certwatcher.read_certificate_errors.count (count)	Number of certificate read errors. Shown as error
keda.controller.runtime.active_workers (gauge)	Number of currently used workers per controller.
keda.controller.runtime.max_concurrent_reconciles (gauge)	Maximum number of concurrent reconciles per controller.
keda.controller.runtime.reconcile.count (count)	Number of reconciliations per controller.
keda.controller.runtime.reconcile_errors.count (count)	Number of reconciliation errors per controller.
keda.controller.runtime.reconcile_panics.count (count)	Number of reconciliation panics per controller.
keda.controller.runtime.reconcile_time.seconds.bucket (count)	The number of events observed to calculate reconciliation time tagged by upper_bound tags.
keda.controller.runtime.reconcile_time.seconds.count (count)	The number of events observed to calculate reconciliation time.
keda.controller.runtime.reconcile_time.seconds.sum (count)	The time per reconciliation per controller. Shown as second
keda.controller.runtime.terminal_reconcile_errors.count (count)	Number of terminal reconciliation errors per controller.
keda.controller.runtime.webhook_panics.count (count)	Number of webhook panics.
keda.controller.runtime.webhook_requests.count (count)	Number of admission requests by HTTP status code.
keda.controller.runtime.webhook_requests_in_flight (gauge)	Current number of admission requests being served.
keda.disabled_metrics.count (count)	The count of disabled metrics.
keda.field_validation_request_duration_seconds.bucket (count)	The number of observations used to calculate the field validation response time tagged by upper_bound tags.
keda.field_validation_request_duration_seconds.count (count)	The number of observations used to calculate the field validation response time.
keda.field_validation_request_duration_seconds.sum (count)	The response time in seconds for each field validation value. Shown as second
keda.go.gc.duration.seconds.count (count)	The summary count of garbage collection cycles in the Keda instance.
keda.go.gc.duration.seconds.quantile (gauge)	A summary of the pause duration of garbage collection cycles in the Keda instance.
keda.go.gc.duration.seconds.sum (count)	The sum of the pause duration of garbage collection cycles in the Keda instance.
keda.go.goroutines (gauge)	Number of goroutines that currently exist.
keda.go.info (gauge)	Information about the Go environment.
keda.go.memstats.alloc_bytes (gauge)	Number of bytes allocated and still in use. Shown as byte
keda.go.memstats.alloc_bytes.count (count)	Number of bytes allocated, even if freed. Shown as byte
keda.go.memstats.buck_hash.sys_bytes (gauge)	Number of bytes used by the profiling bucket hash table. Shown as byte
keda.go.memstats.frees.count (count)	Number of frees.
keda.go.memstats.gc.sys_bytes (gauge)	Number of bytes used for garbage collection system metadata. Shown as byte
keda.go.memstats.heap.alloc_bytes (gauge)	Number of heap bytes allocated and still in use. Shown as byte
keda.go.memstats.heap.idle_bytes (gauge)	Number of heap bytes waiting to be used. Shown as byte
keda.go.memstats.heap.inuse_bytes (gauge)	Number of heap bytes that are in use. Shown as byte
keda.go.memstats.heap.objects (gauge)	Number of allocated objects.
keda.go.memstats.heap.released_bytes (gauge)	Number of heap bytes released to OS. Shown as byte
keda.go.memstats.heap.sys_bytes (gauge)	Number of heap bytes obtained from system. Shown as byte
keda.go.memstats.lookups.count (count)	Number of pointer lookups.
keda.go.memstats.mallocs.count (count)	Number of mallocs.
keda.go.memstats.mcache.inuse_bytes (gauge)	Number of bytes in use by mcache structures. Shown as byte
keda.go.memstats.mcache.sys_bytes (gauge)	Number of bytes used for mcache structures obtained from system. Shown as byte
keda.go.memstats.mspan.inuse_bytes (gauge)	Number of bytes in use by mspan structures. Shown as byte
keda.go.memstats.mspan.sys_bytes (gauge)	Number of bytes used for mspan structures obtained from system. Shown as byte
keda.go.memstats.next.gc_bytes (gauge)	Number of heap bytes when next garbage collection will take place. Shown as byte
keda.go.memstats.other.sys_bytes (gauge)	Number of bytes used for other system allocations. Shown as byte
keda.go.memstats.stack.inuse_bytes (gauge)	Number of bytes in use by the stack allocator. Shown as byte
keda.go.memstats.stack.sys_bytes (gauge)	Number of bytes obtained from system for stack allocator. Shown as byte
keda.go.memstats.sys_bytes (gauge)	Number of bytes obtained from system. Shown as byte
keda.go.memstats.time_since_last_gc.seconds (gauge)	Number of seconds since 1970 of last garbage collection. Shown as second
keda.go.threads (gauge)	Number of OS threads created.
keda.hidden_metrics.count (count)	The count of hidden metrics.
keda.internal_metricsservice.grpc_client_handled.count (count)	The number of RPCs completed by the client, regardless of success or failure.
keda.internal_metricsservice.grpc_client_handling_seconds.bucket (count)	The number of events used to calculate response time tagged by upper_bound bucket tags
keda.internal_metricsservice.grpc_client_handling_seconds.count (count)	The number of events used to calculate response time.
keda.internal_metricsservice.grpc_client_handling_seconds.sum (count)	Response time (seconds) of the gRPC until it is finished by the application.
keda.internal_metricsservice.grpc_client_msg_received.count (count)	The number of RPC stream messages received by the client.
keda.internal_metricsservice.grpc_client_msg_sent.count (count)	The number of gRPC stream messages sent by the client.
keda.internal_metricsservice.grpc_client_started.count (count)	The number of RPCs started on the client.
keda.internal_scale.loop_latency (gauge)	(Keda <v2.16) The deviation (in seconds) between the expected execution time and the actual execution time for the scaling loop. Shown as second
keda.internal_scale.loop_latency_seconds (gauge)	(Keda >=v2.16) The deviation (in seconds) between the expected execution time and the actual execution time for the scaling loop. Shown as second
keda.leader_election.master_status (gauge)	Gauge of if the reporting system is master of the relevant lease, 0 indicates backup, 1 indicates master. ’name’ is the string used to identify the lease. Please make sure to group by name.
keda.process.cpu.seconds.count (count)	Number of user and system CPU time spent in seconds.
keda.process.max_fds (gauge)	Maximum number of open file descriptors.
keda.process.open_fds (gauge)	Number of open file descriptors.
keda.process.resident_memory.bytes (gauge)	Resident memory size in bytes.
keda.process.uptime.seconds (gauge)	How long in seconds the process has been up. Shown as second
keda.process.virtual_memory.bytes (gauge)	Virtual memory size in bytes. Shown as byte
keda.process.virtual_memory.max_bytes (gauge)	Maximum amount of virtual memory available in bytes. Shown as byte
keda.registered_metrics.count (count)	The count of registered metrics broken by stability level and deprecation version.
keda.resource_registered (gauge)	(Keda >=v2.16) Number of KEDA custom resources per namespace for each custom resource type (CRD) registered.
keda.resource_totals (gauge)	(Keda <v2.16) Number of KEDA custom resources per namespace for each custom resource type (CRD) registered.
keda.rest.client.requests.count (count)	Number of HTTP requests, partitioned by status code, method, and host. Shown as request
keda.scaled_job.errors.count (count)	Number of scaled job errors.
keda.scaler.active (gauge)	Indicates whether a scaler is active (1), or not (0).
keda.scaler.detail_errors.count (count)	(Keda >=v2.16) The Number of errors encountered for each scaler.
keda.scaler.errors.count (count)	(Keda <v2.16) The Number of errors encountered for each scaler
keda.scaler.metrics_latency (gauge)	(Keda <v2.16) The latency of retrieving current metric from each scaler, in seconds. Shown as second
keda.scaler.metrics_latency_seconds (gauge)	(Keda >=v2.16) The latency of retrieving current metric from each scaler, in seconds. Shown as second
keda.scaler.metrics_value (gauge)	The current value for each scaler’s metric that would be used by the HPA in computing the target average.
keda.trigger_registered (gauge)	(Keda >=v2.16) Number of triggers per trigger type registered.
keda.trigger_totals (gauge)	(Keda <v2.16) Number of triggers per trigger type registered.
keda.workqueue.adds.count (count)	Number of adds handled by workqueue.
keda.workqueue.depth (gauge)	Current depth of workqueue.
keda.workqueue.longest.running_processor.seconds (gauge)	How many seconds has the longest running processor for workqueue been running. Shown as second
keda.workqueue.queue.duration.seconds.bucket (count)	The histogram bucket of how long in seconds an item stays in the workqueue before being requested.
keda.workqueue.queue.duration.seconds.count (count)	The total number of events in the workqueue duration histogram.
keda.workqueue.queue.duration.seconds.sum (count)	The cumulative sum of time (in seconds) that items have spent in the workqueue. Shown as second
keda.workqueue.retries.count (count)	Number of retries handled by workqueue.
keda.workqueue.unfinished_work.seconds (gauge)	How many seconds of work has been done that is in progress and hasn’t been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases. Shown as second
keda.workqueue.work.duration.seconds.bucket (count)	Number of work items that have been processed in the workqueue tagged by upper_bound buckets.
keda.workqueue.work.duration.seconds.count (count)	The total number of work items that have been processed in the workqueue.
keda.workqueue.work.duration.seconds.sum (count)	The cumulative sum of time spent processing all work items in the workqueue. Shown as second

Events

The KEDA integration does not include any events.

Service Checks

keda.openmetrics.health

Returns CRITICAL if the Agent is unable to connect to the Keda OpenMetrics endpoint, otherwise returns OK.

Statuses: ok, critical

Troubleshooting

Need help? Contact Datadog support.