- 重要な情報
- はじめに
- 用語集
- ガイド
- エージェント
- インテグレーション
- OpenTelemetry
- 開発者
- API
- CoScreen
- アプリ内
- Service Management
- インフラストラクチャー
- アプリケーションパフォーマンス
- 継続的インテグレーション
- ログ管理
- セキュリティ
- UX モニタリング
- 管理
Supported OS
Redpanda は、ミッションクリティカルなワークロードのための Kafka API 互換のストリーミングプラットフォームです。
Datadog と Redpanda を接続し、主要なメトリクスを表示したり、特定のユーザーニーズに基づいて追加のメトリクスグループを追加することができます。
ホスト上で動作している Agent に対してこのチェックを構成するには、datadog-agent integration install -t datadog-redpanda==<INTEGRATION_VERSION>
を実行します。
コンテナ環境では、Docker Agent とこのインテグレーションを使用する最善の方法は、Redpanda インテグレーションをインストールした Agent をビルドすることです。
Agent のアップデート版をビルドするには
FROM gcr.io/datadoghq/agent:latest
ARG INTEGRATION_VERSION=1.0.0
RUN agent integration install -r -t datadog-redpanda==${INTEGRATION_VERSION}
イメージをビルドし、プライベート Docker レジストリにプッシュします。
Datadog Agent コンテナイメージをアップグレードします。Helm チャートを使用している場合は、values.yaml
ファイルの agents.image
セクションを変更して、デフォルトの Agent イメージを置き換えます。
agents:
enabled: true
image:
tag: <NEW_TAG>
repository: <YOUR_PRIVATE_REPOSITORY>/<AGENT_NAME>
values.yaml
ファイルを使用して Agent をアップグレードします。helm upgrade -f values.yaml <RELEASE_NAME> datadog/datadog
Redpanda のパフォーマンスデータの収集を開始するには
Agent のコンフィギュレーションディレクトリ
のルートにある conf.d/
フォルダーの redpanda.d/conf.yaml
ファイルを編集します。使用可能なすべてのコンフィギュレーションオプションについては、redpanda.d/conf.yaml.example
のサンプルファイルを参照してください。
デフォルトでは、Datadog Agent でログを収集することは無効になっています。ログ収集は、Agent v6.0+ で利用可能です。
ログを有効にするには、datadog.yaml
ファイルに以下を追加します。
logs_enabled: true
dd-agent
ユーザーが systemd-journal
グループのメンバであることを確認してください。
usermod -a -G systemd-journal dd-agent
Redpanda のログの収集を開始するには、redpanda.d/conf.yaml
ファイルに以下を追加します。
logs:
- type: journald
source: redpanda
コンテナ環境では、Datadog Agent イメージに Redpanda チェックが統合された後、オートディスカバリーがデフォルトで構成されます。
メトリクスは、Datadog のサーバーに自動的に収集されます。詳細は、オートディスカバリーインテグレーションテンプレート を参照してください。
デフォルトでは、Datadog Agent でログ収集は無効になっています。ログ収集は、Agent v6.0+ で利用可能です。
ログを有効にするには、Kubernetes ログ収集 を参照してください。
パラメーター | 値 |
---|---|
<LOG_CONFIG> | {"source": "redpanda", "service": "redpanda_cluster"} |
Agent のステータスサブコマンドを実行
し、Checks セクションで redpanda
を探します。
redpanda.application.build (gauge) | Redpanda build information |
redpanda.application.uptime (gauge) | Redpanda uptime in seconds Shown as second |
redpanda.controller.log_limit_requests_available (gauge) | Controller log rate limiting. Available rps for group Shown as request |
redpanda.controller.log_limit_requests_dropped (count) | Controller log rate limiting. Amount of requests that are dropped due to exceeding limit in group Shown as request |
redpanda.partitions.moving_from_node (gauge) | Amount of partitions that are moving from node |
redpanda.partitions.moving_to_node (gauge) | Amount of partitions that are moving to node |
redpanda.partitions.node_cancelling_movements (gauge) | Amount of cancelling partition movements for node |
redpanda.reactor.cpu_busy_seconds (gauge) | Total CPU busy time in seconds Shown as second |
redpanda.io_queue.total_read_ops (count) | Total read operations passed in the queue Shown as operation |
redpanda.io_queue.total_write_ops (count) | Total write operations passed in the queue Shown as operation |
redpanda.kafka.group_offset (gauge) | Consumer group committed offset |
redpanda.kafka.group_count (gauge) | Number of consumers in a group |
redpanda.kafka.group_topic_count (gauge) | Number of topics in a group |
redpanda.cluster.partitions (gauge) | Configured number of partitions for the topic |
redpanda.cluster.replicas (gauge) | Configured number of replicas for the topic |
redpanda.kafka.request_latency_seconds (gauge) | Internal latency of kafka produce requests Shown as second |
redpanda.kafka.under_replicated_replicas (gauge) | Number of under replicated replicas (i.e. replicas that are live, but not at the latest offest) |
redpanda.memory.allocated_memory (gauge) | Allocated memory size in bytes Shown as byte |
redpanda.memory.available_memory_low_water_mark (gauge) | The low-water mark for available_memory from process start Shown as byte |
redpanda.memory.available_memory (gauge) | Total shard memory potentially available in bytes (free_memory plus reclaimable) Shown as byte |
redpanda.memory.free_memory (gauge) | Free memory size in bytes Shown as byte |
redpanda.node_status.rpcs_received (gauge) | Number of node status RPCs received by this node Shown as request |
redpanda.node_status.rpcs_sent (gauge) | Number of node status RPCs sent by this node Shown as request |
redpanda.node_status.rpcs_timed_out (gauge) | Number of timed out node status RPCs from this node Shown as request |
redpanda.raft.leadership_changes (count) | Number of leadership changes across all partitions of a given topic |
redpanda.raft.recovery_bandwidth (gauge) | Bandwidth available for partition movement. bytes/sec |
redpanda.pandaproxy.request_errors (count) | Total number of rest_proxy server errors Shown as error |
redpanda.pandaproxy.request_latency (gauge) | Internal latency of request for rest_proxy Shown as millisecond |
redpanda.rpc.active_connections (gauge) | Count of currently active connections Shown as connection |
redpanda.rpc.request_errors (count) | Number of rpc errors Shown as error |
redpanda.rpc.request_latency_seconds (gauge) | RPC latency Shown as second |
redpanda.scheduler.runtime_seconds (count) | Accumulated runtime of task queue associated with this scheduling group Shown as second |
redpanda.schema_registry.errors (count) | Total number of schema_registry server errors Shown as error |
redpanda.schema_registry_latency_seconds (gauge) | Internal latency of request for schema_registry Shown as second |
redpanda.storage.disk_free_bytes (count) | Disk storage bytes free. Shown as byte |
redpanda.storage.disk_free_space_alert (gauge) | Status of low storage space alert. 0-OK, 1-Low Space 2-Degraded |
redpanda.storage.disk_total_bytes (count) | Total size of attached storage, in bytes. Shown as byte |
redpanda.cloud.client_backoff (count) | Total number of requests that backed off |
redpanda.cloud.client_download_backoff (count) | Total number of download requests that backed off |
redpanda.cloud.client_downloads (count) | Total number of requests that downloaded an object from cloud storage |
redpanda.cloud.client_not_found (count) | Total number of requests for which the object was not found |
redpanda.cloud.client_upload_backoff (count) | Total number of upload requests that backed off |
redpanda.cloud.client_uploads (count) | Total number of requests that uploaded an object to cloud storage |
redpanda.cloud.storage.active_segments (gauge) | Number of remote log segments currently hydrated for read |
redpanda.cloud.storage.cache_op_hit (count) | Number of get requests for objects that are already in cache. |
redpanda.cloud.storage.op_in_progress_files (gauge) | Number of files that are being put to cache. |
redpanda.cloud.storage.cache_op_miss (count) | Number of get requests that are not satisfied from the cache. |
redpanda.cloud.storage.op_put (count) | Number of objects written into cache. Shown as operation |
redpanda.cloud.storage.cache_space_files (gauge) | Number of objects in cache. |
redpanda.cloud.storage.cache_space_size_bytes (gauge) | Sum of size of cached objects. Shown as byte |
redpanda.cloud.storage.deleted_segments (count) | Number of segments that have been deleted from S3 for the topic. This may grow due to retention or non compacted segments being replaced with their compacted equivalent. |
redpanda.cloud.storage.errors (count) | Number of transmit errors Shown as error |
redpanda.cloud.storage.housekeeping.drains (gauge) | Number of times upload housekeeping queue was drained |
redpanda.cloud.storage.housekeeping.jobs_completed (count) | Number of executed housekeeping jobs |
redpanda.cloud.storage.housekeeping.jobs_failed (count) | Number of failed housekeeping jobs Shown as error |
redpanda.cloud.storage.housekeeping.jobs_skipped (count) | Number of skipped housekeeping jobs |
redpanda.cloud.storage.housekeeping.pauses (gauge) | Number of times upload housekeeping was paused |
redpanda.cloud.storage.housekeeping.resumes (gauge) | Number of times upload housekeeping was resumed |
redpanda.cloud.storage.housekeeping.rounds (count) | Number of upload housekeeping rounds |
redpanda.cloud.storage.jobs.cloud_segment_reuploads (gauge) | Number of segment reuploads from cloud storage sources (cloud storage cache or direct download from cloud storage) |
redpanda.cloud.storage.jobs.local_segment_reuploads (gauge) | Number of segment reuploads from local data directory |
redpanda.cloud.storage.jobs.manifest_reuploads (gauge) | Number of manifest reuploads performed by all housekeeping jobs |
redpanda.cloud.storage.jobs.metadata_syncs (gauge) | Number of archival configuration updates performed by all housekeeping jobs |
redpanda.cloud.storage.jobs.segment_deletions (gauge) | Number of segments deleted by all housekeeping jobs |
redpanda.cloud.storage.readers (gauge) | Total number of segments pending deletion from the cloud for the topic |
redpanda.cloud.storage.segments (gauge) | Total number of uploaded bytes for the topic |
redpanda.cloud.storage.segments_pending_deletion (gauge) | Number of read cursors for hydrated remote log segments |
redpanda.cloud.storage.uploaded_bytes (count) | Total number of accounted segments in the cloud for the topic Shown as byte |
redpanda.cluster.brokers (gauge) | Number of configured brokers in the cluster |
redpanda.cluster.controller_log_limit_requests_dropped (count) | Controller log rate limiting. Amount of requests that are dropped due to exceeding limit in group |
redpanda.cluster.partition_num_with_broken_rack_constraint (gauge) | Number of partitions that don't satisfy the rack awareness constraint |
redpanda.cluster.topics (gauge) | Number of topics in the cluster |
redpanda.cluster.unavailable_partitions (gauge) | Number of partitions that lack quorum among replicants |
redpanda.kafka.partition_committed_offset (gauge) | Latest committed offset for the partition (i.e. the offset of the last message safely persisted on most replicas) |
redpanda.kafka.partitions (gauge) | Configured number of partitions for the topic |
redpanda.kafka.replicas (gauge) | Configured number of replicas for the topic |
redpanda.kafka.request_bytes (count) | Total number of bytes produced per topic Shown as byte |
Redpanda インテグレーションには、イベントは含まれません。
redpanda.openmetrics.health
Returns CRITICAL
if the check cannot access the metrics endpoint. Returns OK
otherwise.
Statuses: ok, critical
ご不明な点は、Datadog のサポートチーム までお問合せください。