- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- Administrator's Guide
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Supported OS
This check monitors KEDA through the Datadog Agent. For more information, see KEDA monitoring.
Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the Autodiscovery Integration Templates for guidance on applying these instructions.
Starting from Agent release 7.62.0, the KEDA check is included in the Datadog Agent package. No additional installation is needed in your environment.
KEDA consists of multiple components, including the Admissions Controller, Metrics API Server, and the Operator. Each of these components can be scraped for metrics. Prometheus-formatted metrics are available at /metrics on port 8080 for each component.
To expose these metrics, ensure that Prometheus scraping is enabled for each component. For example, in Helm, you need to enable the following Helm configuration options:
Alternatively, you can achieve this by providing the following configuration in a values.yaml file used during the Helm installation of KEDA:
prometheus:
metricServer:
enabled: true
operator:
enabled: true
webhooks:
enabled: true
For the Agent to start collecting metrics, the KEDA controller pods need to be annotated. For more information about annotations, refer to the Autodiscovery Integration Templates for guidance. You can find additional configuration options by reviewing the sample keda.d/conf.yaml.
Note: The listed metrics can only be collected if they are available. Some metrics are generated only when certain actions are performed. For example, the keda.scaler.detail_errors.count
metric is exposed only after a scaler encountered an error.
The only parameter required for configuring the KEDA check is:
openmetrics_endpoint
: This parameter should be set to the location where the Prometheus-formatted metrics are exposed. The default port is 8080
. In containerized environments, %%host%%
should be used for host autodetection.apiVersion: v1
kind: Pod
# (...)
metadata:
name: '<POD_NAME>'
annotations:
ad.datadoghq.com/<CONTAINER_NAME>.checks: | # <CONTAINER_NAME> Needs to match the container name at the bottom. 'keda-operator-metrics-apiserver' in this example.
{
"keda": {
"init_config": {},
"instances": [
{
"openmetrics_endpoint": "http://%%host%%:8080/metrics"
}
]
}
}
# (...)
spec:
containers:
- name: <CONTAINER_NAME> # e.g. 'keda-operator-metrics-apiserver' in the Metrics API Server
# (...)
To collect metrics from each KEDA component, the above pod annotations need to be applied to each KEDA component pod. Example pod annotations for the Operator pod:
# Pod manifest from a basic Helm chart deployment
apiVersion: v1
kind: Pod
# (...)
metadata:
name: 'keda-operator'
annotations:
ad.datadoghq.com/<CONTAINER_NAME>.checks: |
{
"keda": {
"init_config": {},
"instances": [
{
"openmetrics_endpoint": "http://%%host%%:8000/metrics"
}
]
}
}
# (...)
spec:
containers:
- name: keda-operator
# (...)
Available for Agent versions >6.0
Kyverno logs can be collected from the different KEDA pods through Kubernetes. Collecting logs is disabled by default in the Datadog Agent. To enable it, see Kubernetes Log Collection.
See the Autodiscovery Integration Templates for guidance on applying the parameters below.
Parameter | Value |
---|---|
<LOG_CONFIG> | {"source": "keda", "service": "<SERVICE_NAME>"} |
Run the Agent’s status subcommand and look for keda
under the Checks section.
keda.aggregator_discovery_aggregation.count (count) | Number of times discovery was aggregated. |
keda.apiserver_audit_event.count (count) | Number of audit events generated and sent to the audit backend. |
keda.apiserver_audit_requests_rejected.count (count) | Number of API server requests rejected due to an error in audit logging backend. Shown as request |
keda.apiserver_client_certificate_expiration_seconds.bucket (count) | Number of certificates observed in the apiserverclientcertificateexpirationseconds histogram tagged by upper_bound tags. |
keda.apiserver_client_certificate_expiration_seconds.count (count) | Number of certificates observed in the apiserverclientcertificateexpirationseconds histogram. |
keda.apiserver_client_certificate_expiration_seconds.sum (count) | The sum of duration of certificates remaining expiration time in the apiserverclientcertificateexpirationseconds histogram. Shown as second |
keda.apiserver_current_inflight_requests (gauge) | Maximal number of currently used inflight request limit of this API server per request kind in last second. Shown as request |
keda.apiserver_delegated_authz_request.count (count) | Number of HTTP requests partitioned by status code. Shown as request |
keda.apiserver_delegated_authz_request_duration_seconds.bucket (count) | Number of observations apiserverdelegatedauthzrequestdurationseconds histogram. Broken down by status code and upperbound duration tags. |
keda.apiserver_delegated_authz_request_duration_seconds.count (count) | Number of observations apiserverdelegatedauthzrequestduration_seconds histogram. Broken down by status code. |
keda.apiserver_delegated_authz_request_duration_seconds.sum (count) | The sum of duration of requests in the apiserverdelegatedauthzrequestduration_seconds histogram. Shown as second |
keda.apiserver_envelope_encryption_dek_cache_fill_percent (gauge) | Percent of the cache slots currently occupied by cached DEKs. |
keda.apiserver_flowcontrol_read_vs_write_current_requests.bucket (count) | The number of requests (as a fraction of the relevant limit) waiting or in execution at the end of each nanosecond. Tagged by upper_bound tags. Shown as request |
keda.apiserver_flowcontrol_read_vs_write_current_requests.count (count) | The number of requests (as a fraction of the relevant limit) observed at the end of each nanosecond. Shown as request |
keda.apiserver_flowcontrol_read_vs_write_current_requests.sum (count) | The sum of all observed request fractions at the end of each nanosecond. |
keda.apiserver_flowcontrol_seat_fair_frac (gauge) | Fair fraction of server's concurrency to allocate to each priority level that can use it. |
keda.apiserver_request.count (count) | Number of API server requests broken out for each verb, dry run value, group, version, resource, scope, component, and HTTP response code. Shown as request |
keda.apiserver_request_duration_seconds.bucket (count) | The number of requests used to calculate the response time tagged by upper_bound tags. Shown as request |
keda.apiserver_request_duration_seconds.count (count) | The number of requests used to calculate the response time. Shown as request |
keda.apiserver_request_duration_seconds.sum (count) | The sum in response time in seconds for each verb, dry run value, group, version, resource, subresource, scope and component. Shown as second |
keda.apiserver_request_filter_duration_seconds.bucket (count) | The number of observations used to calculate the request filter latency tagged by upper_bound tags. |
keda.apiserver_request_filter_duration_seconds.count (count) | The number of observations used to calculate the request filter latency. |
keda.apiserver_request_filter_duration_seconds.sum (count) | Request filter latency distribution in seconds, for each filter type. Shown as second |
keda.apiserver_request_sli_duration_seconds.bucket (count) | The number of observations used to calculate SLI response time tagged by upper_bound tags. |
keda.apiserver_request_sli_duration_seconds.count (count) | The number of observations used to calculate SLI response time. |
keda.apiserver_request_sli_duration_seconds.sum (count) | The sum of the response time (not counting webhook duration and priority & fairness queue wait times) in seconds for each verb, group, version, resource, subresource, scope and component. Shown as second |
keda.apiserver_request_slo_duration_seconds.bucket (count) | The number of observations used to calculate SLO response time tagged by upper_bound tags. |
keda.apiserver_request_slo_duration_seconds.count (count) | The number of observations used to calculate SLO response time. |
keda.apiserver_request_slo_duration_seconds.sum (count) | The sum of the response time (not counting webhook duration and priority & fairness queue wait times) in seconds for each verb, group, version, resource, subresource, scope and component. Shown as second |
keda.apiserver_response_sizes.bucket (count) | Number of responses used to calculate the response size tagged by upper_bound tags. |
keda.apiserver_response_sizes.count (count) | Number of responses used to calculate the response size. |
keda.apiserver_response_sizes.sum (count) | The sum of sizes of responses in bytes for each group, version, verb, resource, subresource, scope and component. Shown as byte |
keda.apiserver_storage_data_key_generation_duration_seconds.bucket (count) | Number of observations used to calculate data encryption key(DEK) duration tagged by upper_bound tags. |
keda.apiserver_storage_data_key_generation_duration_seconds.count (count) | Number of observations used to calculate data encryption key(DEK) duration. |
keda.apiserver_storage_data_key_generation_duration_seconds.sum (count) | Time in seconds used for data encryption key(DEK) generation operations. Shown as second |
keda.apiserver_storage_data_key_generation_failures.count (count) | Number of failed data encryption key(DEK) generation operations. |
keda.apiserver_storage_envelope_transformation_cache_misses.count (count) | Number of cache misses while accessing key decryption key(KEK). |
keda.apiserver_tls_handshake_errors.count (count) | Number of requests dropped with 'TLS handshake error from' error. |
keda.apiserver_webhooks_x509_insecure_sha1.count (count) | Counts the number of requests to servers with insecure SHA1 signatures in their serving certificate OR the number of connection failures due to the insecure SHA1 signatures (either/or, based on the runtime environment). |
keda.apiserver_webhooks_x509_missing_san.count (count) | Counts the number of requests to servers missing SAN extension in their serving certificate OR the number of connection failures due to the lack of x509 certificate SAN extension missing (either/or, based on the runtime environment). |
keda.authenticated_user_requests.count (count) | Number of authenticated requests broken out by username. Shown as request |
keda.authentication_attempts.count (count) | Number of authenticated attempts. |
keda.authentication_duration_seconds.bucket (count) | Number of observations used to calculate authentication duration tagged by upper_bound bucket tags. |
keda.authentication_duration_seconds.count (count) | Number of observations used to calculate authentication duration. |
keda.authentication_duration_seconds.sum (count) | Authentication duration in seconds broken out by result. Shown as second |
keda.authorization_attempts.count (count) | Number of authorization attempts broken down by result. It can be either 'allowed', 'denied', 'no-opinion' or 'error'. |
keda.authorization_duration_seconds.bucket (count) | The number of events used to calculate authorization duration tagged by upper_bound tags. |
keda.authorization_duration_seconds.count (count) | The number of events used to calculate authorization duration. |
keda.authorization_duration_seconds.sum (count) | Authorization duration in seconds broken out by result. Shown as second |
keda.build_info (gauge) | Info metric, with static information about KEDA build like: version, git commit and Golang runtime info. |
keda.cardinality_enforcement_unexpected_categorizations.count (count) | The count of unexpected categorizations during cardinality enforcement. |
keda.certwatcher.read_certificate.count (count) | Number of certificate reads. |
keda.certwatcher.read_certificate_errors.count (count) | Number of certificate read errors. Shown as error |
keda.controller.runtime.active_workers (gauge) | Number of currently used workers per controller. |
keda.controller.runtime.max_concurrent_reconciles (gauge) | Maximum number of concurrent reconciles per controller. |
keda.controller.runtime.reconcile.count (count) | Number of reconciliations per controller. |
keda.controller.runtime.reconcile_errors.count (count) | Number of reconciliation errors per controller. |
keda.controller.runtime.reconcile_panics.count (count) | Number of reconciliation panics per controller. |
keda.controller.runtime.reconcile_time.seconds.bucket (count) | The number of events observed to calculate reconciliation time tagged by upper_bound tags. |
keda.controller.runtime.reconcile_time.seconds.count (count) | The number of events observed to calculate reconciliation time. |
keda.controller.runtime.reconcile_time.seconds.sum (count) | The time per reconciliation per controller. Shown as second |
keda.controller.runtime.terminal_reconcile_errors.count (count) | Number of terminal reconciliation errors per controller. |
keda.controller.runtime.webhook_panics.count (count) | Number of webhook panics. |
keda.controller.runtime.webhook_requests.count (count) | Number of admission requests by HTTP status code. |
keda.controller.runtime.webhook_requests_in_flight (gauge) | Current number of admission requests being served. |
keda.disabled_metrics.count (count) | The count of disabled metrics. |
keda.field_validation_request_duration_seconds.bucket (count) | The number of observations used to calculate the field validation response time tagged by upper_bound tags. |
keda.field_validation_request_duration_seconds.count (count) | The number of observations used to calculate the field validation response time. |
keda.field_validation_request_duration_seconds.sum (count) | The response time in seconds for each field validation value. Shown as second |
keda.go.gc.duration.seconds.count (count) | The summary count of garbage collection cycles in the Keda instance. |
keda.go.gc.duration.seconds.quantile (gauge) | A summary of the pause duration of garbage collection cycles in the Keda instance. |
keda.go.gc.duration.seconds.sum (count) | The sum of the pause duration of garbage collection cycles in the Keda instance. |
keda.go.goroutines (gauge) | Number of goroutines that currently exist. |
keda.go.info (gauge) | Information about the Go environment. |
keda.go.memstats.alloc_bytes (gauge) | Number of bytes allocated and still in use. Shown as byte |
keda.go.memstats.alloc_bytes.count (count) | Number of bytes allocated, even if freed. Shown as byte |
keda.go.memstats.buck_hash.sys_bytes (gauge) | Number of bytes used by the profiling bucket hash table. Shown as byte |
keda.go.memstats.frees.count (count) | Number of frees. |
keda.go.memstats.gc.sys_bytes (gauge) | Number of bytes used for garbage collection system metadata. Shown as byte |
keda.go.memstats.heap.alloc_bytes (gauge) | Number of heap bytes allocated and still in use. Shown as byte |
keda.go.memstats.heap.idle_bytes (gauge) | Number of heap bytes waiting to be used. Shown as byte |
keda.go.memstats.heap.inuse_bytes (gauge) | Number of heap bytes that are in use. Shown as byte |
keda.go.memstats.heap.objects (gauge) | Number of allocated objects. |
keda.go.memstats.heap.released_bytes (gauge) | Number of heap bytes released to OS. Shown as byte |
keda.go.memstats.heap.sys_bytes (gauge) | Number of heap bytes obtained from system. Shown as byte |
keda.go.memstats.lookups.count (count) | Number of pointer lookups. |
keda.go.memstats.mallocs.count (count) | Number of mallocs. |
keda.go.memstats.mcache.inuse_bytes (gauge) | Number of bytes in use by mcache structures. Shown as byte |
keda.go.memstats.mcache.sys_bytes (gauge) | Number of bytes used for mcache structures obtained from system. Shown as byte |
keda.go.memstats.mspan.inuse_bytes (gauge) | Number of bytes in use by mspan structures. Shown as byte |
keda.go.memstats.mspan.sys_bytes (gauge) | Number of bytes used for mspan structures obtained from system. Shown as byte |
keda.go.memstats.next.gc_bytes (gauge) | Number of heap bytes when next garbage collection will take place. Shown as byte |
keda.go.memstats.other.sys_bytes (gauge) | Number of bytes used for other system allocations. Shown as byte |
keda.go.memstats.stack.inuse_bytes (gauge) | Number of bytes in use by the stack allocator. Shown as byte |
keda.go.memstats.stack.sys_bytes (gauge) | Number of bytes obtained from system for stack allocator. Shown as byte |
keda.go.memstats.sys_bytes (gauge) | Number of bytes obtained from system. Shown as byte |
keda.go.memstats.time_since_last_gc.seconds (gauge) | Number of seconds since 1970 of last garbage collection. Shown as second |
keda.go.threads (gauge) | Number of OS threads created. |
keda.hidden_metrics.count (count) | The count of hidden metrics. |
keda.internal_metricsservice.grpc_client_handled.count (count) | The number of RPCs completed by the client, regardless of success or failure. |
keda.internal_metricsservice.grpc_client_handling_seconds.bucket (count) | The number of events used to calculate response time tagged by upper_bound bucket tags |
keda.internal_metricsservice.grpc_client_handling_seconds.count (count) | The number of events used to calculate response time. |
keda.internal_metricsservice.grpc_client_handling_seconds.sum (count) | Response time (seconds) of the gRPC until it is finished by the application. |
keda.internal_metricsservice.grpc_client_msg_received.count (count) | The number of RPC stream messages received by the client. |
keda.internal_metricsservice.grpc_client_msg_sent.count (count) | The number of gRPC stream messages sent by the client. |
keda.internal_metricsservice.grpc_client_started.count (count) | The number of RPCs started on the client. |
keda.internal_scale.loop_latency (gauge) | (Keda |
keda.internal_scale.loop_latency_seconds (gauge) | (Keda >=v2.16) The deviation (in seconds) between the expected execution time and the actual execution time for the scaling loop. Shown as second |
keda.leader_election.master_status (gauge) | Gauge of if the reporting system is master of the relevant lease, 0 indicates backup, 1 indicates master. 'name' is the string used to identify the lease. Please make sure to group by name. |
keda.process.cpu.seconds.count (count) | Number of user and system CPU time spent in seconds. |
keda.process.max_fds (gauge) | Maximum number of open file descriptors. |
keda.process.open_fds (gauge) | Number of open file descriptors. |
keda.process.resident_memory.bytes (gauge) | Resident memory size in bytes. |
keda.process.uptime.seconds (gauge) | How long in seconds the process has been up. Shown as second |
keda.process.virtual_memory.bytes (gauge) | Virtual memory size in bytes. Shown as byte |
keda.process.virtual_memory.max_bytes (gauge) | Maximum amount of virtual memory available in bytes. Shown as byte |
keda.registered_metrics.count (count) | The count of registered metrics broken by stability level and deprecation version. |
keda.resource_registered (gauge) | (Keda >=v2.16) Number of KEDA custom resources per namespace for each custom resource type (CRD) registered. |
keda.resource_totals (gauge) | (Keda |
keda.rest.client.requests.count (count) | Number of HTTP requests, partitioned by status code, method, and host. Shown as request |
keda.scaled_job.errors.count (count) | Number of scaled job errors. |
keda.scaler.active (gauge) | Indicates whether a scaler is active (1), or not (0). |
keda.scaler.detail_errors.count (count) | (Keda >=v2.16) The Number of errors encountered for each scaler. |
keda.scaler.errors.count (count) | (Keda |
keda.scaler.metrics_latency (gauge) | (Keda |
keda.scaler.metrics_latency_seconds (gauge) | (Keda >=v2.16) The latency of retrieving current metric from each scaler, in seconds. Shown as second |
keda.scaler.metrics_value (gauge) | The current value for each scaler's metric that would be used by the HPA in computing the target average. |
keda.trigger_registered (gauge) | (Keda >=v2.16) Number of triggers per trigger type registered. |
keda.trigger_totals (gauge) | (Keda |
keda.workqueue.adds.count (count) | Number of adds handled by workqueue. |
keda.workqueue.depth (gauge) | Current depth of workqueue. |
keda.workqueue.longest.running_processor.seconds (gauge) | How many seconds has the longest running processor for workqueue been running. Shown as second |
keda.workqueue.queue.duration.seconds.bucket (count) | The histogram bucket of how long in seconds an item stays in the workqueue before being requested. |
keda.workqueue.queue.duration.seconds.count (count) | The total number of events in the workqueue duration histogram. |
keda.workqueue.queue.duration.seconds.sum (count) | The cumulative sum of time (in seconds) that items have spent in the workqueue. Shown as second |
keda.workqueue.retries.count (count) | Number of retries handled by workqueue. |
keda.workqueue.unfinished_work.seconds (gauge) | How many seconds of work has been done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases. Shown as second |
keda.workqueue.work.duration.seconds.bucket (count) | Number of work items that have been processed in the workqueue tagged by upper_bound buckets. |
keda.workqueue.work.duration.seconds.count (count) | The total number of work items that have been processed in the workqueue. |
keda.workqueue.work.duration.seconds.sum (count) | The cumulative sum of time spent processing all work items in the workqueue. Shown as second |
The KEDA integration does not include any events.
keda.openmetrics.health
Returns CRITICAL
if the Agent is unable to connect to the Keda OpenMetrics endpoint, otherwise returns OK
.
Statuses: ok, critical
Need help? Contact Datadog support.