概要

Datadog サイトでは Confluent Cloud インテグレーションはサポートされていません

Confluent Cloud はフルマネージドの、クラウドホスティングのストリーミングデータサービスです。Datadog と Confluent Cloud を接続することで、Confluent Cloud リソースの主要メトリクスを視覚化し、アラートを発します。

Datadog のすぐに使える Confluent Cloud ダッシュボードには、アクティブな接続の変化率や、平均消費レコードと生成レコードの比率などの情報を含め、環境の健全性とパフォーマンスをモニタリングするための主要なクラスターメトリクスが表示されます。

推奨モニターを使用して、トピックのラグが大きくなりすぎた場合にチームに通知してアラートを出すことも、これらのメトリクスを使用して独自のメトリクスを作成することもできます。

ストリーミングデータパイプラインのトポロジーを視覚化したり、データストリームセットアップ内の局所的なボトルネックを調査したりすることが有益な場合は、Data Streams Monitoring をご覧ください。

セットアップ

インストール

Datadog の Confluent Cloud インテグレーションタイルを使用して、インテグレーションをインストールします。

構成

  1. インテグレーションタイルで、Configuration タブに移動します。
  2. Confluent Cloud API Key と API Secret を入力し、+ Add API Key をクリックします。
  3. Save をクリックします。Datadog は、これらの資格情報に関連するアカウントを検索します。
  4. Confluent Cloud の Cluster ID または Connector ID を追加します。Datadog は Confluent Cloud のメトリクスをクロールし、数分以内にメトリクスをロードします。
  5. Cloud Cost Management を使用し、コストデータの収集を有効にする場合

API Key と Secret

Confluent Cloud API Key と Secret を作成するには、UI で MetricsViewer ロールを新しいサービスアカウントに追加するを参照してください。

Cluster ID

Confluent Cloud Cluster ID を検索するには

  1. Confluent Cloud で、Environment Overview に移動し、監視したいクラスターを選択します。
  2. 左側のナビゲーションで、Cluster overview > Cluster settings をクリックします。
  3. Identification の下にある、lkc で始まる Cluster ID をコピーします。

Connector ID

Confluent Cloud Connector ID を検索するには

  1. Confluent Cloud で、Environment Overview に移動し、監視したいクラスターを選択します。
  2. 左側のナビゲーションで、Data integration > Connectors をクリックします。
  3. Connectors の下にある、lcc で始まる Connector ID をコピーします。

ダッシュボード

インテグレーションの構成後、すぐに使える Confluent Cloud ダッシュボードで Kafka クラスターとコネクタのメトリクスの概要をご覧ください。

デフォルトでは、Confluent Cloud 全体で収集されたすべてのメトリクスが表示されます。

収集データ

メトリクス

confluent_cloud.kafka.received_bytes
(count)
The delta count of bytes received from the network. Each sample is the number of bytes received since the previous data sample. The count is sampled every 60 seconds.
Shown as byte
confluent_cloud.kafka.sent_bytes
(count)
The delta count of bytes sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.
Shown as byte
confluent_cloud.kafka.received_records
(count)
The delta count of records received. Each sample is the number of records received since the previous data sample. The count is sampled every 60 seconds.
Shown as record
confluent_cloud.kafka.sent_records
(count)
The delta count of records sent. Each sample is the number of records sent since the previous data point. The count is sampled every 60 seconds.
Shown as record
confluent_cloud.kafka.retained_bytes
(gauge)
The current count of bytes retained by the cluster. The count is sampled every 60 seconds.
Shown as byte
confluent_cloud.kafka.active_connection_count
(gauge)
The count of active authenticated connections.
Shown as connection
confluent_cloud.kafka.request_count
(count)
The delta count of requests received over the network. Each sample is the number of requests received since the previous data point. The count is sampled every 60 seconds.
Shown as request
confluent_cloud.kafka.partition_count
(gauge)
The number of partitions.
confluent_cloud.kafka.successful_authentication_count
(count)
The delta count of successful authentications. Each sample is the number of successful authentications since the previous data point. The count is sampled every 60 seconds.
Shown as attempt
confluent_cloud.kafka.cluster_link_destination_response_bytes
(count)
The delta count of cluster linking response bytes from all request types. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.
Shown as byte
confluent_cloud.kafka.cluster_link_source_response_bytes
(count)
The delta count of cluster linking source response bytes from all request types. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.
Shown as byte
confluent_cloud.kafka.cluster_active_link_count
(gauge)
The current count of active cluster links. The count is sampled every 60 seconds. The implicit time aggregation for this metric is MAX.
confluent_cloud.kafka.cluster_load_percent
(gauge)
A measure of the utilization of the cluster. The value is between 0.0 and 1.0.
Shown as percent
confluent_cloud.kafka.consumer_lag_offsets
(gauge)
The lag between a group member's committed offset and the partition's high watermark.
confluent_cloud.kafka.cluster_link_count
(gauge)
The current count of cluster links. The count is sampled every 60 seconds. The implicit time aggregation for this metric is MAX.
confluent_cloud.kafka.cluster_link_mirror_topic_bytes
(count)
The delta count of cluster linking mirror topic bytes. The count is sampled every 60 seconds.
confluent_cloud.kafka.cluster_link_mirror_topic_count
(gauge)
The cluster linking mirror topic count for a link. The count is sampled every 60 seconds.
confluent_cloud.kafka.cluster_link_mirror_topic_offset_lag
(gauge)
TThe cluster linking mirror topic offset lag maximum across all partitions. The lag is sampled every 60 seconds.
confluent_cloud.kafka.request_bytes
(gauge)
The delta count of total request bytes from the specified request types sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.
confluent_cloud.kafka.response_bytes
(gauge)
The delta count of total response bytes from the specified response types sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.
confluent_cloud.kafka.rest_produce_request_bytes
(count)
The delta count of total request bytes from Kafka REST produce calls sent over the network requested by Kafka REST.
confluent_cloud.connect.sent_records
(count)
The delta count of total number of records sent from the transformations and written to Kafka for the source connector. Each sample is the number of records sent since the previous data point. The count is sampled every 60 seconds.
Shown as record
confluent_cloud.connect.received_records
(count)
The delta count of total number of records received by the sink connector. Each sample is the number of records received since the previous data point. The count is sampled every 60 seconds.
Shown as record
confluent_cloud.connect.sent_bytes
(count)
The delta count of total bytes sent from the transformations and written to Kafka for the source connector. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.
Shown as byte
confluent_cloud.connect.received_bytes
(count)
The delta count of total bytes received by the sink connector. Each sample is the number of bytes received since the previous data point. The count is sampled every 60 seconds.
Shown as byte
confluent_cloud.connect.dead_letter_queue_records
(count)
The delta count of dead letter queue records written to Kafka for the sink connector. The count is sampled every 60 seconds.
Shown as record
confluent_cloud.ksql.streaming_unit_count
(gauge)
The count of Confluent Streaming Units (CSUs) for this KSQL instance. The count is sampled every 60 seconds. The implicit time aggregation for this metric is MAX.
Shown as unit
confluent_cloud.ksql.query_saturation
(gauge)
The maximum saturation for a given ksqlDB query across all nodes. Returns a value between 0 and 1. A value close to 1 indicates that ksqlDB query processing is bottlenecked on available resources.
confluent_cloud.ksql.task_stored_bytes
(gauge)
The size of a given task's state stores in bytes.
Shown as byte
confluent_cloud.ksql.storage_utilization
(gauge)
The total storage utilization for a given ksqlDB application.
confluent_cloud.schema_registry.schema_count
(gauge)
The number of registered schemas.
confluent_cloud.schema_registry.request_count
(count)
The delta count of requests received by the schema registry server. Each sample is the number of requests received since the previous data point. The count sampled every 60 seconds.
confluent_cloud.schema_registry.schema_operations_count
(count)
The delta count of schema related operations. Each sample is the number of requests received since the previous data point. The count sampled every 60 seconds.
confluent_cloud.flink.num_records_in
(count)
Total number of records all Flink SQL statements leveraging a Flink compute pool have received.
confluent_cloud.flink.num_records_out
(count)
Total number of records all Flink SQL statements leveraging a Flink compute pool have emitted.
confluent_cloud.flink.pending_records
(gauge)
Total backlog of all Flink SQL statements leveraging a Flink compute pool.
confluent_cloud.custom.kafka.consumer_lag_offsets
(gauge)
The lag between a group member's committed offset and the partition's high watermark.

イベント

Confluent Cloud インテグレーションには、イベントは含まれません。

サービスチェック

Confluent Cloud インテグレーションには、サービスのチェック機能は含まれません。

トラブルシューティング

ご不明な点は、Datadog のサポートチームまでお問い合わせください。

参考資料