- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- Administrator's Guide
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Supported OS
Redpanda is a Kafka API-compatible streaming platform for mission-critical workloads.
Connect Datadog with Redpanda to view key metrics and add additional metric groups based on specific user needs.
To configure this check for an Agent running on a host, run datadog-agent integration install -t datadog-redpanda==<INTEGRATION_VERSION>
.
For containerized environments, the best way to use this integration with the Docker Agent is to build the Agent with the Redpanda integration installed.
To build an updated version of the Agent:
FROM gcr.io/datadoghq/agent:latest
ARG INTEGRATION_VERSION=2.0.0
RUN agent integration install -r -t datadog-redpanda==${INTEGRATION_VERSION}
Build the image and push it to your private Docker registry.
Upgrade the Datadog Agent container image. If you are using a Helm chart, modify the agents.image
section in the values.yaml
file to replace the default agent image:
agents:
enabled: true
image:
tag: <NEW_TAG>
repository: <YOUR_PRIVATE_REPOSITORY>/<AGENT_NAME>
values.yaml
file to upgrade the Agent:helm upgrade -f values.yaml <RELEASE_NAME> datadog/datadog
To start collecting your Redpanda performance data:
Edit the redpanda.d/conf.yaml
file in the conf.d/
folder at the root of your Agent’s configuration directory. See the sample redpanda.d/conf.yaml.example file for all available configuration options.
By default, collecting logs is disabled in the Datadog Agent. Log collection is available for Agent v6.0+.
To enable logs, add the following in your datadog.yaml
file:
logs_enabled: true
Make sure dd-agent
user is member of systemd-journal
group, if not, run following command as root:
usermod -a -G systemd-journal dd-agent
Add the following in your redpanda.d/conf.yaml
file to start collecting your Redpanda logs:
logs:
- type: journald
source: redpanda
For containerized environments, Autodiscovery is configured by default after the Redpanda check integrates in the Datadog Agent image.
Metrics are automatically collected in Datadog’s server. For more information, see Autodiscovery Integration Templates.
By default, log collection is disabled in the Datadog Agent. Log collection is available for Agent v6.0+.
To enable logs, see Kubernetes Log Collection.
Parameter | Value |
---|---|
<LOG_CONFIG> | {"source": "redpanda", "service": "redpanda_cluster"} |
Run the Agent’s status subcommand and look for redpanda
under the Checks section.
redpanda.application.build (gauge) | Redpanda build information |
redpanda.application.uptime (gauge) | Redpanda uptime in seconds Shown as second |
redpanda.controller.log_limit_requests_available (gauge) | Controller log rate limiting. Available rps for group Shown as request |
redpanda.controller.log_limit_requests_dropped (count) | Controller log rate limiting. Amount of requests that are dropped due to exceeding limit in group Shown as request |
redpanda.partitions.moving_from_node (gauge) | Amount of partitions that are moving from node |
redpanda.partitions.moving_to_node (gauge) | Amount of partitions that are moving to node |
redpanda.partitions.node_cancelling_movements (gauge) | Amount of cancelling partition movements for node |
redpanda.reactor.cpu_busy_seconds (gauge) | Total CPU busy time in seconds Shown as second |
redpanda.io_queue.total_read_ops (count) | Total read operations passed in the queue Shown as operation |
redpanda.io_queue.total_write_ops (count) | Total write operations passed in the queue Shown as operation |
redpanda.kafka.group_offset (gauge) | Consumer group committed offset |
redpanda.kafka.group_count (gauge) | Number of consumers in a group |
redpanda.kafka.group_topic_count (gauge) | Number of topics in a group |
redpanda.cluster.partitions (gauge) | Configured number of partitions for the topic |
redpanda.cluster.replicas (gauge) | Configured number of replicas for the topic |
redpanda.kafka.request_latency_seconds (gauge) | Internal latency of kafka produce requests Shown as second |
redpanda.kafka.under_replicated_replicas (gauge) | Number of under replicated replicas (i.e. replicas that are live, but not at the latest offest) |
redpanda.memory.allocated_memory (gauge) | Allocated memory size in bytes Shown as byte |
redpanda.memory.available_memory_low_water_mark (gauge) | The low-water mark for available_memory from process start Shown as byte |
redpanda.memory.available_memory (gauge) | Total shard memory potentially available in bytes (free_memory plus reclaimable) Shown as byte |
redpanda.memory.free_memory (gauge) | Free memory size in bytes Shown as byte |
redpanda.node_status.rpcs_received (gauge) | Number of node status RPCs received by this node Shown as request |
redpanda.node_status.rpcs_sent (gauge) | Number of node status RPCs sent by this node Shown as request |
redpanda.node_status.rpcs_timed_out (gauge) | Number of timed out node status RPCs from this node Shown as request |
redpanda.raft.leadership_changes (count) | Number of leadership changes across all partitions of a given topic |
redpanda.raft.recovery_bandwidth (gauge) | Bandwidth available for partition movement. bytes/sec |
redpanda.pandaproxy.request_errors (count) | Total number of rest_proxy server errors Shown as error |
redpanda.pandaproxy.request_latency (gauge) | Internal latency of request for rest_proxy Shown as millisecond |
redpanda.rpc.active_connections (gauge) | Count of currently active connections Shown as connection |
redpanda.rpc.request_errors (count) | Number of rpc errors Shown as error |
redpanda.rpc.request_latency_seconds (gauge) | RPC latency Shown as second |
redpanda.scheduler.runtime_seconds (count) | Accumulated runtime of task queue associated with this scheduling group Shown as second |
redpanda.schema_registry.errors (count) | Total number of schema_registry server errors Shown as error |
redpanda.schema_registry_latency_seconds (gauge) | Internal latency of request for schema_registry Shown as second |
redpanda.storage.disk_free_bytes (count) | Disk storage bytes free. Shown as byte |
redpanda.storage.disk_free_space_alert (gauge) | Status of low storage space alert. 0-OK, 1-Low Space 2-Degraded |
redpanda.storage.disk_total_bytes (count) | Total size of attached storage, in bytes. Shown as byte |
redpanda.cloud.client_backoff (count) | Total number of requests that backed off |
redpanda.cloud.client_download_backoff (count) | Total number of download requests that backed off |
redpanda.cloud.client_downloads (count) | Total number of requests that downloaded an object from cloud storage |
redpanda.cloud.client_not_found (count) | Total number of requests for which the object was not found |
redpanda.cloud.client_upload_backoff (count) | Total number of upload requests that backed off |
redpanda.cloud.client_uploads (count) | Total number of requests that uploaded an object to cloud storage |
redpanda.cloud.storage.active_segments (gauge) | Number of remote log segments currently hydrated for read |
redpanda.cloud.storage.cache_op_hit (count) | Number of get requests for objects that are already in cache. |
redpanda.cloud.storage.op_in_progress_files (gauge) | Number of files that are being put to cache. |
redpanda.cloud.storage.cache_op_miss (count) | Number of get requests that are not satisfied from the cache. |
redpanda.cloud.storage.op_put (count) | Number of objects written into cache. Shown as operation |
redpanda.cloud.storage.cache_space_files (gauge) | Number of objects in cache. |
redpanda.cloud.storage.cache_space_size_bytes (gauge) | Sum of size of cached objects. Shown as byte |
redpanda.cloud.storage.deleted_segments (count) | Number of segments that have been deleted from S3 for the topic. This may grow due to retention or non compacted segments being replaced with their compacted equivalent. |
redpanda.cloud.storage.errors (count) | Number of transmit errors Shown as error |
redpanda.cloud.storage.housekeeping.drains (gauge) | Number of times upload housekeeping queue was drained |
redpanda.cloud.storage.housekeeping.jobs_completed (count) | Number of executed housekeeping jobs |
redpanda.cloud.storage.housekeeping.jobs_failed (count) | Number of failed housekeeping jobs Shown as error |
redpanda.cloud.storage.housekeeping.jobs_skipped (count) | Number of skipped housekeeping jobs |
redpanda.cloud.storage.housekeeping.pauses (gauge) | Number of times upload housekeeping was paused |
redpanda.cloud.storage.housekeeping.resumes (gauge) | Number of times upload housekeeping was resumed |
redpanda.cloud.storage.housekeeping.rounds (count) | Number of upload housekeeping rounds |
redpanda.cloud.storage.jobs.cloud_segment_reuploads (gauge) | Number of segment reuploads from cloud storage sources (cloud storage cache or direct download from cloud storage) |
redpanda.cloud.storage.jobs.local_segment_reuploads (gauge) | Number of segment reuploads from local data directory |
redpanda.cloud.storage.jobs.manifest_reuploads (gauge) | Number of manifest reuploads performed by all housekeeping jobs |
redpanda.cloud.storage.jobs.metadata_syncs (gauge) | Number of archival configuration updates performed by all housekeeping jobs |
redpanda.cloud.storage.jobs.segment_deletions (gauge) | Number of segments deleted by all housekeeping jobs |
redpanda.cloud.storage.readers (gauge) | Total number of segments pending deletion from the cloud for the topic |
redpanda.cloud.storage.segments (gauge) | Total number of uploaded bytes for the topic |
redpanda.cloud.storage.segments_pending_deletion (gauge) | Number of read cursors for hydrated remote log segments |
redpanda.cloud.storage.uploaded_bytes (count) | Total number of accounted segments in the cloud for the topic Shown as byte |
redpanda.cluster.brokers (gauge) | Number of configured brokers in the cluster |
redpanda.cluster.controller_log_limit_requests_dropped (count) | Controller log rate limiting. Amount of requests that are dropped due to exceeding limit in group |
redpanda.cluster.partition_num_with_broken_rack_constraint (gauge) | Number of partitions that don't satisfy the rack awareness constraint |
redpanda.cluster.topics (gauge) | Number of topics in the cluster |
redpanda.cluster.unavailable_partitions (gauge) | Number of partitions that lack quorum among replicants |
redpanda.kafka.partition_committed_offset (gauge) | Latest committed offset for the partition (i.e. the offset of the last message safely persisted on most replicas) |
redpanda.kafka.partitions (gauge) | Configured number of partitions for the topic |
redpanda.kafka.replicas (gauge) | Configured number of replicas for the topic |
redpanda.kafka.request_bytes (count) | Total number of bytes produced per topic Shown as byte |
The Redpanda integration does not include any events.
redpanda.openmetrics.health
Returns CRITICAL
if the check cannot access the metrics endpoint. Returns OK
otherwise.
Statuses: ok, critical
Need help? Contact Datadog support.