- 重要な情報
- はじめに
- 用語集
- エージェント
- インテグレーション
- OpenTelemetry
- 開発者
- API
- CoScreen
- アプリ内
- インフラストラクチャー
- アプリケーションパフォーマンス
- 継続的インテグレーション
- ログ管理
- セキュリティ
- UX モニタリング
- 管理
Supported OS
Redpanda は、ミッションクリティカルなワークロードのための Kafka API 互換のストリーミングプラットフォームです。
Datadog と Redpanda を接続し、主要なメトリクスを表示したり、特定のユーザーニーズに基づいて追加のメトリクスグループを追加することができます。
ホスト上で動作している Agent に対してこのチェックを構成するには、datadog-agent integration install -t datadog-redpanda==<INTEGRATION_VERSION>
を実行します。
コンテナ環境では、Docker Agent とこのインテグレーションを使用する最善の方法は、Redpanda インテグレーションをインストールした Agent をビルドすることです。
Agent のアップデート版をビルドするには
FROM gcr.io/datadoghq/agent:latest
ARG INTEGRATION_VERSION=1.0.0
RUN agent integration install -r -t datadog-redpanda==${INTEGRATION_VERSION}
イメージをビルドし、プライベート Docker レジストリにプッシュします。
Datadog Agent コンテナイメージをアップグレードします。Helm チャートを使用している場合は、values.yaml
ファイルの agents.image
セクションを変更して、デフォルトの Agent イメージを置き換えます。
agents:
enabled: true
image:
tag: <NEW_TAG>
repository: <YOUR_PRIVATE_REPOSITORY>/<AGENT_NAME>
values.yaml
ファイルを使用して Agent をアップグレードします。helm upgrade -f values.yaml <RELEASE_NAME> datadog/datadog
Redpanda のパフォーマンスデータの収集を開始するには
Agent のコンフィギュレーションディレクトリのルートにある conf.d/
フォルダーの redpanda.d/conf.yaml
ファイルを編集します。使用可能なすべてのコンフィギュレーションオプションについては、redpanda.d/conf.yaml.example のサンプルファイルを参照してください。
デフォルトでは、Datadog Agent でログを収集することは無効になっています。ログ収集は、Agent v6.0+ で利用可能です。
ログを有効にするには、datadog.yaml
ファイルに以下を追加します。
logs_enabled: true
dd-agent
ユーザーが systemd-journal
グループのメンバであることを確認してください。
usermod -a -G systemd-journal dd-agent
Redpanda のログの収集を開始するには、redpanda.d/conf.yaml
ファイルに以下を追加します。
logs:
- type: journald
source: redpanda
コンテナ環境では、Datadog Agent イメージに Redpanda チェックが統合された後、オートディスカバリーがデフォルトで構成されます。
メトリクスは、Datadog のサーバーに自動的に収集されます。詳細は、オートディスカバリーインテグレーションテンプレートを参照してください。
デフォルトでは、Datadog Agent でログ収集は無効になっています。ログ収集は、Agent v6.0+ で利用可能です。
ログを有効にするには、Kubernetes ログ収集を参照してください。
パラメーター | 値 |
---|---|
<LOG_CONFIG> | {"source": "redpanda", "service": "redpanda_cluster"} |
Agent のステータスサブコマンドを実行し、Checks セクションで redpanda
を探します。
redpanda.alien.receive_batch_queue_length (gauge) | Current receive batch queue length |
redpanda.alien.total_received_messages (count) | Total number of received messages |
redpanda.alien.total_sent_messages (count) | Total number of sent messages |
redpanda.application.build (gauge) | Redpanda build information |
redpanda.application.uptime (gauge) | Redpanda uptime in milliseconds Shown as millisecond |
redpanda.cluster.partition_committed_offset (gauge) | Partition commited offset. i.e. safely persisted on majority of replicas |
redpanda.cluster.partition_end_offset (gauge) | Last offset stored by current partition on this node |
redpanda.cluster.partition_high_watermark (gauge) | Partion high watermark i.e. highest consumable offset |
redpanda.cluster.partition_last_stable_offset (gauge) | Last stable offset |
redpanda.cluster.partition_leader (gauge) | Flag indicating if this partition instance is a leader |
redpanda.cluster.partition_leader_id (gauge) | Id of current partition leader |
redpanda.cluster.partition_records_fetched (count) | Total number of records fetched Shown as record |
redpanda.cluster.partition_records_produced (count) | Total number of records produced Shown as record |
redpanda.cluster.partition_under_replicated_replicas (gauge) | Number of under replicated replicas |
redpanda.httpd.connections_current (gauge) | The current number of open connections Shown as connection |
redpanda.httpd.connections_total (count) | The total number of connections opened Shown as connection |
redpanda.httpd.read_errors (count) | The total number of errors while reading http requests Shown as error |
redpanda.httpd.reply_errors (count) | The total number of errors while replying to http Shown as error |
redpanda.httpd.requests_served (count) | The total number of http requests served Shown as request |
redpanda.internal.rpc_active_connections (gauge) | internal_rpc: Currently active connections Shown as connection |
redpanda.internal.rpc_connection_close_errors (count) | internal_rpc: Number of errors when shutting down the connection Shown as connection |
redpanda.internal.rpc_connects (count) | internal_rpc: Number of accepted connections Shown as connection |
redpanda.internal.rpc_consumed_mem_bytes (count) | internal_rpc: Memory consumed by request processing Shown as byte |
redpanda.internal.rpc_corrupted_headers (count) | internal_rpc: Number of requests with corrupted headers |
redpanda.internal.rpc_dispatch_handler_latency.count (count) | internal_rpc: Latency Shown as millisecond |
redpanda.internal.rpc_dispatch_handler_latency.sum (gauge) | internal_rpc: Latency Shown as millisecond |
redpanda.internal.rpc_latency.count (count) | Internal RPC service latency Shown as millisecond |
redpanda.internal.rpc_latency.sum (gauge) | Internal RPC service latency Shown as millisecond |
redpanda.internal.rpc_max_service_mem_bytes (count) | internal_rpc: Maximum memory allowed for RPC Shown as byte |
redpanda.internal.rpc_method_not_found_errors (count) | internal_rpc: Number of requests with not available RPC method Shown as error |
redpanda.internal.rpc_received_bytes (count) | internal_rpc: Number of bytes received from the clients in valid requests Shown as byte |
redpanda.internal.rpc_requests_blocked_memory (count) | internal_rpc: Number of requests blocked in memory backpressure Shown as request |
redpanda.internal.rpc_requests_completed (count) | internal_rpc: Number of successful requests Shown as request |
redpanda.internal.rpc_requests_pending (gauge) | internal_rpc: Number of requests being processed by server Shown as request |
redpanda.internal.rpc_sent_bytes (count) | internal_rpc: Number of bytes sent to clients Shown as byte |
redpanda.internal.rpc_service_errors (count) | internal_rpc: Number of service errors Shown as error |
redpanda.io.queue_delay (gauge) | random delay time in the queue Shown as second |
redpanda.io.queue_disk_queue_length (gauge) | Number of requests in the disk |
redpanda.io.queue_queue_length (gauge) | Number of requests in the queue |
redpanda.io.queue_shares (gauge) | current amount of shares |
redpanda.io.queue_total_bytes (count) | Total bytes passed in the queue Shown as byte |
redpanda.io.queue_total_delay_sec (count) | Total time spent in the queue Shown as second |
redpanda.io.queue_total_exec_sec (count) | Total time spent in disk Shown as second |
redpanda.io.queue_total_operations (count) | Total bytes passed in the queue Shown as operation |
redpanda.kafka.fetch_sessions_cache_mem_usage_bytes (gauge) | Fetch sessions cache memory usage in bytes Shown as byte |
redpanda.kafka.fetch_sessions_cache_sessions_count (gauge) | Total number of fetch sessions |
redpanda.kafka.latency_fetch_latency_us.count (count) | Fetch Latency Shown as millisecond |
redpanda.kafka.latency_fetch_latency_us.sum (gauge) | Fetch Latency Shown as millisecond |
redpanda.kafka.latency_produce_latency_us.count (count) | Produce Latency Shown as millisecond |
redpanda.kafka.latency_produce_latency_us.sum (gauge) | Produce Latency Shown as millisecond |
redpanda.kafka.rpc_active_connections (gauge) | kafka_rpc: Currently active connections Shown as connection |
redpanda.kafka.rpc_connection_close_errors (count) | kafka_rpc: Number of errors when shutting down the connection Shown as error |
redpanda.kafka.rpc_connects (count) | kafka_rpc: Number of accepted connections Shown as connection |
redpanda.kafka.rpc_consumed_mem_bytes (count) | kafka_rpc: Memory consumed by request processing Shown as byte |
redpanda.kafka.rpc_corrupted_headers (count) | kafka_rpc: Number of requests with corrupted headers |
redpanda.kafka.rpc_dispatch_handler_latency.count (count) | kafka_rpc: Latency Shown as millisecond |
redpanda.kafka.rpc_dispatch_handler_latency.sum (gauge) | kafka_rpc: Latency Shown as millisecond |
redpanda.kafka.rpc_max_service_mem_bytes (count) | kafka_rpc: Maximum memory allowed for RPC Shown as byte |
redpanda.kafka.rpc_method_not_found_errors (count) | kafka_rpc: Number of requests with not available RPC method Shown as error |
redpanda.kafka.rpc_received_bytes (count) | kafka_rpc: Number of bytes received from the clients in valid requests Shown as byte |
redpanda.kafka.rpc_requests_blocked_memory (count) | kafka_rpc: Number of requests blocked in memory backpressure Shown as request |
redpanda.kafka.rpc_requests_completed (count) | kafka_rpc: Number of successful requests Shown as request |
redpanda.kafka.rpc_requests_pending (gauge) | kafka_rpc: Number of requests being processed by server Shown as request |
redpanda.kafka.rpc_sent_bytes (count) | kafka_rpc: Number of bytes sent to clients Shown as byte |
redpanda.kafka.rpc_service_errors (count) | kafka_rpc: Number of service errors Shown as error |
redpanda.kafka.group_offset (gauge) | consumer lag offset |
redpanda.leader.balancer_leader_transfer_error (count) | Number of errors attempting to transfer leader Shown as error |
redpanda.leader.balancer_leader_transfer_no_improvement (count) | Number of times no balance improvement was found |
redpanda.leader.balancer_leader_transfer_succeeded (count) | Number of successful leader transfers Shown as success |
redpanda.leader.balancer_leader_transfer_timeout (count) | Number of timeouts attempting to transfer leader Shown as timeout |
redpanda.memory.allocated_memory (count) | Allocated memeory size in bytes Shown as byte |
redpanda.memory.cross_cpu_free_operations (count) | Total number of cross cpu free Shown as operation |
redpanda.memory.free_memory (count) | Free memeory size in bytes Shown as byte |
redpanda.memory.free_operations (count) | Total number of free operations Shown as operation |
redpanda.memory.malloc_live_objects (gauge) | Number of live objects Shown as object |
redpanda.memory.malloc_operations (count) | Total number of malloc operations Shown as operation |
redpanda.memory.reclaims_operations (count) | Total reclaims operations Shown as operation |
redpanda.memory.total_memory (count) | Total memeory size in bytes Shown as byte |
redpanda.pandaproxy.request_latency.count (count) | Request latency Shown as millisecond |
redpanda.pandaproxy.request_latency.sum (gauge) | Request latency Shown as millisecond |
redpanda.raft.done_replicate_requests (count) | Number of finished replicate requests Shown as request |
redpanda.raft.group_count (gauge) | Number of raft groups |
redpanda.raft.heartbeat_requests_errors (count) | Number of failed heartbeat requests Shown as error |
redpanda.raft.leader_for (gauge) | Number of groups for which node is a leader |
redpanda.raft.leadership_changes (count) | Number of leadership changes |
redpanda.raft.log_flushes (count) | Number of log flushes Shown as flush |
redpanda.raft.log_truncations (count) | Number of log truncations |
redpanda.raft.received_append_requests (count) | Number of append requests received |
redpanda.raft.received_vote_requests (count) | Number of vote requests received |
redpanda.raft.recovery_requests_errors (count) | Number of failed recovery requests Shown as error |
redpanda.raft.replicate_ack_all_requests (count) | Number of replicate requests with quorum ack consistency Shown as request |
redpanda.raft.replicate_ack_leader_requests (count) | Number of replicate requests with leader ack consistency Shown as request |
redpanda.raft.replicate_ack_none_requests (count) | Number of replicate requests with no ack consistency Shown as request |
redpanda.raft.replicate_request_errors (count) | Number of failed replicate requests Shown as error |
redpanda.raft.sent_vote_requests (count) | Number of vote requests sent Shown as request |
redpanda.reactor.abandoned_failed_futures (count) | Total number of abandoned failed futures futures destroyed while still containing an exception |
redpanda.reactor.aio_bytes_read (count) | Total aio-reads bytes Shown as byte |
redpanda.reactor.aio_bytes_write (count) | Total aio-writes bytes Shown as byte |
redpanda.reactor.aio_errors (count) | Total aio errors Shown as error |
redpanda.reactor.aio_reads (count) | Total aio-reads operations Shown as read |
redpanda.reactor.aio_writes (count) | Total aio-writes operations Shown as write |
redpanda.reactor.cpp_exceptions (count) | Total number of C++ exceptions Shown as exception |
redpanda.reactor.cpu_busy_ms (count) | Total cpu busy time in milliseconds Shown as millisecond |
redpanda.reactor.cpu_steal_time_ms (count) | Total steal time the time in which some other process was running while Seastar was not trying to run (not sleeping).Because this is in userspace some time that could be legitimally thought as steal time is not accounted as such. For example if we are sleeping and can wake up but the kernel hasn't woken us up yet. Shown as millisecond |
redpanda.reactor.fstream_read_bytes (count) | Counts bytes read from disk file streams. A high rate indicates high disk activity. Divide by fstream_reads to determine average read size. Shown as byte |
redpanda.reactor.fstream_read_bytes_blocked (count) | Counts the number of bytes read from disk that could not be satisfied from read-ahead buffers and had to block. Indicates short streams or incorrect read ahead configuration. Shown as byte |
redpanda.reactor.fstream_reads (count) | Counts reads from disk file streams. A high rate indicates high disk activity. Contrast with other fstream_read* counters to locate bottlenecks. Shown as read |
redpanda.reactor.fstream_reads_ahead_bytes_discarded (count) | Counts the number of buffered bytes that were read ahead of time and were discarded because they were not needed wasting disk bandwidth. Indicates over-eager read ahead configuration. Shown as byte |
redpanda.reactor.fstream_reads_aheads_discarded (count) | Counts the number of times a buffer that was read ahead of time and was discarded because it was not needed wasting disk bandwidth. Indicates over-eager read ahead configuration. Shown as read |
redpanda.reactor.fstream_reads_blocked (count) | Counts the number of times a disk read could not be satisfied from read-ahead buffers and had to block. Indicates short streams or incorrect read ahead configuration. Shown as read |
redpanda.reactor.fsyncs (count) | Total number of fsync operations |
redpanda.reactor.io_threaded_fallbacks (count) | Total number of io-threaded-fallbacks operations Shown as read |
redpanda.reactor.logging_failures (count) | Total number of logging failures |
redpanda.reactor.polls (count) | Number of times pollers were executed |
redpanda.reactor.tasks_pending (gauge) | Number of pending tasks in the queue |
redpanda.reactor.tasks_processed (count) | Total tasks processed |
redpanda.reactor.timers_pending (count) | Number of tasks in the timer-pending queue |
redpanda.reactor.utilization (gauge) | CPU utilization Shown as percent |
redpanda.rpc.client_active_connections (gauge) | Currently active connections Shown as connection |
redpanda.rpc.client_client_correlation_errors (count) | Number of errors in client correlation id Shown as error |
redpanda.rpc.client_connection_errors (count) | Number of connection errors Shown as connection |
redpanda.rpc.client_connects (count) | Connection attempts Shown as connection |
redpanda.rpc.client_corrupted_headers (count) | Number of responses with corrupted headers |
redpanda.rpc.client_in_bytes (count) | Total number of bytes sent (including headers) Shown as byte |
redpanda.rpc.client_out_bytes (count) | Total number of bytes received Shown as byte |
redpanda.rpc.client_read_dispatch_errors (count) | Number of errors while dispatching responses Shown as read |
redpanda.rpc.client_request_errors (count) | Number or requests errors Shown as error |
redpanda.rpc.client_request_timeouts (count) | Number or requests timeouts Shown as timeout |
redpanda.rpc.client_requests (count) | Number of requests Shown as request |
redpanda.rpc.client_requests_blocked_memory (count) | Number of requests that are blocked because of insufficient memory Shown as request |
redpanda.rpc.client_requests_pending (gauge) | Number of requests pending Shown as request |
redpanda.rpc.client_server_correlation_errors (count) | Number of responses with wrong correlation id Shown as error |
redpanda.scheduler.queue_length (gauge) | Size of backlog on this queue in tasks; indicates whether the queue is busy and/or contended |
redpanda.scheduler.runtime_ms (count) | Accumulated runtime of this task queue; an increment rate of 1000ms per second indicates full utilization Shown as millisecond |
redpanda.scheduler.shares (gauge) | Shares allocated to this queue |
redpanda.scheduler.starvetime_ms (count) | Accumulated starvation time of this task queue; an increment rate of 1000ms per second indicates the scheduler feels really bad Shown as millisecond |
redpanda.scheduler.tasks_processed (count) | Count of tasks executing on this queue; indicates together with runtime_ms indicates length of tasks Shown as task |
redpanda.scheduler.time_spent_on_task_quota_violations_ms (count) | Total amount in milliseconds we were in violation of the task quota Shown as millisecond |
redpanda.scheduler.waittime_ms (count) | Accumulated waittime of this task queue; an increment rate of 1000ms per second indicates queue is waiting for something (e.g. IO) Shown as millisecond |
redpanda.stall.detector_reported (count) | Total number of reported stalls look in the traces for the exact reason |
redpanda.storage.compaction_backlog_controller_backlog_size (gauge) | controller backlog |
redpanda.storage.compaction_backlog_controller_error (gauge) | current controller error i.e difference between set point and backlog size Shown as error |
redpanda.storage.compaction_backlog_controller_shares (gauge) | controller output i.e. number of shares |
redpanda.storage.kvstore_cached_bytes (count) | Size of the database in memory Shown as byte |
redpanda.storage.kvstore_entries_fetched (count) | Number of entries fetched Shown as read |
redpanda.storage.kvstore_entries_removed (count) | Number of entries removaled |
redpanda.storage.kvstore_entries_written (count) | Number of entries written Shown as write |
redpanda.storage.kvstore_key_count (count) | Number of keys in the database |
redpanda.storage.kvstore_segments_rolled (count) | Number of segments rolled |
redpanda.storage.log_batch_parse_errors (count) | Number of batch parsing (reading) errors Shown as error |
redpanda.storage.log_batch_write_errors (count) | Number of batch write errors Shown as write |
redpanda.storage.log_batches_read (count) | Total number of batches read Shown as read |
redpanda.storage.log_batches_written (count) | Total number of batches written Shown as write |
redpanda.storage.log_cache_hits (count) | Reader cache hits Shown as hit |
redpanda.storage.log_cache_misses (count) | Reader cache misses Shown as miss |
redpanda.storage.log_cached_batches_read (count) | Total number of cached batches read Shown as read |
redpanda.storage.log_cached_read_bytes (count) | Total number of cached bytes read Shown as byte |
redpanda.storage.log_compacted_segment (count) | Number of compacted segments |
redpanda.storage.log_compaction_ratio (count) | Average segment compaction ratio |
redpanda.storage.log_corrupted_compaction_indices (count) | Number of times we had to re-construct the .compaction index on a segment |
redpanda.storage.log_log_segments_active (count) | Number of active log segments |
redpanda.storage.log_log_segments_created (count) | Number of created log segments |
redpanda.storage.log_log_segments_removed (count) | Number of removed log segments |
redpanda.storage.log_partition_size (gauge) | Current size of partition in bytes Shown as byte |
redpanda.storage.log_read_bytes (count) | Total number of bytes read Shown as byte |
redpanda.storage.log_readers_added (count) | Number of readers added to cache Shown as read |
redpanda.storage.log_readers_evicted (count) | Number of readers evicted from cache Shown as read |
redpanda.storage.log_written_bytes (count) | Total number of bytes written Shown as byte |
Redpanda インテグレーションには、イベントは含まれません。
redpanda.openmetrics.health
Returns CRITICAL
if the check cannot access the metrics endpoint. Returns OK
otherwise.
Statuses: ok, critical
ご不明な点は、Datadog のサポートチームまでお問合せください。