Confluent Cloud

Confluent Cloud ダッシュボード概要

概要

Confluent Cloud はフルマネージドの、クラウドホスティングのストリーミングデータサービスです。Datadog と Confluent Cloud を接続することで、Confluent Cloud リソースの主要メトリクスを視覚化し、アラートを発します。

Datadog のすぐに使える Confluent Cloud ダッシュボードには、アクティブな接続の変化率や、平均消費レコードと生成レコードの比率などの情報を含め、環境の健全性とパフォーマンスをモニタリングするための主要なクラスターメトリクスが表示されます。

推奨モニターを使用して、トピックのラグが大きくなりすぎた場合にチームに通知してアラートを出すことも、これらのメトリクスを使用して独自のメトリクスを作成することもできます。

セットアップ

インストール

Datadog の Confluent Cloud インテグレーションタイルを使用して、インテグレーションをインストールします。

構成

Confluent Cloud で + Add API Key をクリックし、Confluent Cloud API Key と API Secret を入力します。
- Cloud Resource Management API キーとシークレットを作成します。
- Save をクリックします。Datadog は、これらの資格情報に関連するアカウントを検索します。
- Datadog インテグレーション構成で、API キーとシークレットを API Key and API Secret フィールドに追加します。
Confluent Cloud の Cluster ID または Connector ID を追加します。Datadog は Confluent Cloud のメトリクスをクロールし、数分以内にメトリクスをロードします。
Confluent Cloud で定義されたタグを収集するには (オプション)
- Schema Registry API キーとシークレットを作成します。Confluent Cloud 上のスキーマ管理についてさらに詳しくは、こちらをご覧ください。
- Save をクリックします。これにより、Datadog は Confluent Cloud で定義されたタグを収集します。
- Datadog インテグレーション構成で、API キーとシークレットを Schema Registry API Key and Secret フィールドに追加します。
Cloud Cost Management を使用し、コストデータの収集を有効にする場合
- API キーが BillingAdmin ロールを有効にしていることを確認してください。
- 24 時間以内に Cloud Cost Management に表示されます。(収集データ)

クラスターやコネクターなどの構成リソースに関する詳細は、Confluent Cloud インテグレーションのドキュメントをご参照ください。

API Key と Secret

Confluent Cloud API Key と Secret を作成するには、UI で MetricsViewer ロールを新しいサービスアカウントに追加するを参照してください。

Cluster ID

Confluent Cloud Cluster ID を検索するには

Confluent Cloud で、Environment Overview に移動し、監視したいクラスターを選択します。
左側のナビゲーションで、Cluster overview > Cluster settings をクリックします。
Identification の下にある、lkc で始まる Cluster ID をコピーします。

Connector ID

Confluent Cloud Connector ID を検索するには

Confluent Cloud で、Environment Overview に移動し、監視したいクラスターを選択します。
左側のナビゲーションで、Data integration > Connectors をクリックします。
Connectors の下にある、lcc で始まる Connector ID をコピーします。

ダッシュボード

インテグレーションの構成後、すぐに使える Confluent Cloud ダッシュボードで Kafka クラスターとコネクタのメトリクスの概要をご覧ください。

デフォルトでは、Confluent Cloud 全体で収集されたすべてのメトリクスが表示されます。

収集データ

メトリクス


confluent_cloud.kafka.received_bytes (count)	The delta count of bytes received from the network. Each sample is the number of bytes received since the previous data sample. The count is sampled every 60 seconds. Shown as byte
confluent_cloud.kafka.sent_bytes (count)	The delta count of bytes sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds. Shown as byte
confluent_cloud.kafka.received_records (count)	The delta count of records received. Each sample is the number of records received since the previous data sample. The count is sampled every 60 seconds. Shown as record
confluent_cloud.kafka.sent_records (count)	The delta count of records sent. Each sample is the number of records sent since the previous data point. The count is sampled every 60 seconds. Shown as record
confluent_cloud.kafka.retained_bytes (gauge)	The current count of bytes retained by the cluster. The count is sampled every 60 seconds. Shown as byte
confluent_cloud.kafka.active_connection_count (gauge)	The count of active authenticated connections. Shown as connection
confluent_cloud.kafka.request_count (count)	The delta count of requests received over the network. Each sample is the number of requests received since the previous data point. The count is sampled every 60 seconds. Shown as request
confluent_cloud.kafka.partition_count (gauge)	The number of partitions.
confluent_cloud.kafka.successful_authentication_count (count)	The delta count of successful authentications. Each sample is the number of successful authentications since the previous data point. The count is sampled every 60 seconds. Shown as attempt
confluent_cloud.kafka.cluster_link_destination_response_bytes (count)	The delta count of cluster linking response bytes from all request types. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds. Shown as byte
confluent_cloud.kafka.cluster_link_source_response_bytes (count)	The delta count of cluster linking source response bytes from all request types. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds. Shown as byte
confluent_cloud.kafka.cluster_active_link_count (gauge)	The current count of active cluster links. The count is sampled every 60 seconds. The implicit time aggregation for this metric is MAX.
confluent_cloud.kafka.cluster_load_percent (gauge)	A measure of the utilization of the cluster. The value is between 0.0 and 1.0. Shown as percent
confluent_cloud.kafka.cluster_load_percent_max (gauge)	A measure of the maximum broker utilization across the cluster. The value is between 0.0 and 1.0. Shown as percent
confluent_cloud.kafka.cluster_load_percent_avg (gauge)	A measure of the average utilization across the cluster. The value is between 0.0 and 1.0. Shown as percent
confluent_cloud.kafka.consumer_lag_offsets (gauge)	The lag between a group member’s committed offset and the partition’s high watermark. Tagged with `consumer_group_id` and topic.
confluent_cloud.kafka.cluster_link_count (gauge)	The current count of cluster links. The count is sampled every 60 seconds. The implicit time aggregation for this metric is MAX.
confluent_cloud.kafka.cluster_link_task_count (gauge)	The current count of cluster links tasks. The count is sampled every 60 seconds. The implicit time aggregation for this metric is MAX.
confluent_cloud.kafka.cluster_link_mirror_transition_in_error (gauge)	The cluster linking mirror topic state transition error count for a link. The count is sampled every 60 seconds.
confluent_cloud.kafka.cluster_link_mirror_topic_bytes (count)	The delta count of cluster linking mirror topic bytes. The count is sampled every 60 seconds.
confluent_cloud.kafka.cluster_link_mirror_topic_count (gauge)	The cluster linking mirror topic count for a link. The count is sampled every 60 seconds.
confluent_cloud.kafka.cluster_link_mirror_topic_offset_lag (gauge)	TThe cluster linking mirror topic offset lag maximum across all partitions. The lag is sampled every 60 seconds.
confluent_cloud.kafka.request_bytes (gauge)	The delta count of total request bytes from the specified request types sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.
confluent_cloud.kafka.response_bytes (gauge)	The delta count of total response bytes from the specified response types sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.
confluent_cloud.kafka.rest_produce_request_bytes (count)	The delta count of total request bytes from Kafka REST produce calls sent over the network requested by Kafka REST.
confluent_cloud.kafka.dedicated_cku_count (count)	CKU count of a dedicated cluster
confluent_cloud.kafka.producer_latency_avg_milliseconds (gauge)	The average latency of client producer request. Shown as millisecond
confluent_cloud.connect.sent_records (count)	The delta count of total number of records sent from the transformations and written to Kafka for the source connector. Each sample is the number of records sent since the previous data point. The count is sampled every 60 seconds. Shown as record
confluent_cloud.connect.received_records (count)	The delta count of total number of records received by the sink connector. Each sample is the number of records received since the previous data point. The count is sampled every 60 seconds. Shown as record
confluent_cloud.connect.sent_bytes (count)	The delta count of total bytes sent from the transformations and written to Kafka for the source connector. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds. Shown as byte
confluent_cloud.connect.received_bytes (count)	The delta count of total bytes received by the sink connector. Each sample is the number of bytes received since the previous data point. The count is sampled every 60 seconds. Shown as byte
confluent_cloud.connect.dead_letter_queue_records (count)	The delta count of dead letter queue records written to Kafka for the sink connector. The count is sampled every 60 seconds. Shown as record
confluent_cloud.connect.connector_status (count)	This metric monitors the status of a connector within the system. Its value is always set to 1 which signifies connector presence. The current operational state of the connector is identified through the status tag. Shown as record
confluent_cloud.connect.sql_server_cdc_source_connector_snapshot_running (gauge)	It represents whether the Snapshot is running. The values will incorporate any differences between the clocks on the machines where the database server and the connector are running.
confluent_cloud.connect.sql_server_cdc_source_connector_snapshot_completed (gauge)	It represents whether the Snapshot is completed. The values will incorporate any differences between the clocks on the machines where the database server and the connector are running.
confluent_cloud.connect.sql_server_cdc_source_connector_schema_history_status (gauge)	It represents the status of the schema history of the connector. The values will incorporate any differences between the clocks on the machines where the database server and the connector are running.
confluent_cloud.connect.mysql_cdc_source_connector_snapshot_running (gauge)	It represents whether the Snapshot is running. The values will incorporate any differences between the clocks on the machines where the database server and the connector are running.
confluent_cloud.connect.mysql_cdc_source_connector_snapshot_completed (gauge)	It represents whether the Snapshot is completed. The values will incorporate any differences between the clocks on the machines where the database server and the connector are running.
confluent_cloud.connect.mysql_cdc_source_connector_schema_history_status (gauge)	It represents the status of the schema history of the connector. The values will incorporate any differences between the clocks on the machines where the database server and the connector are running.
confluent_cloud.connect.postgres_cdc_source_connector_snapshot_running (gauge)	It represents whether the Snapshot is running. The values will incorporate any differences between the clocks on the machines where the database server and the connector are running.
confluent_cloud.connect.postgres_cdc_source_connector_snapshot_completed (gauge)	It represents whether the Snapshot is completed. The values will incorporate any differences between the clocks on the machines where the database server and the connector are running.
confluent_cloud.connect.connector_task_status (gauge)	Monitors the status of a connector’s task within the system. Its value is always set to 1 signifying the connector task’s presence.
confluent_cloud.connect.connector_task_batch_size_avg (gauge)	Monitors the average batch size (measured by record count) per minute. For a source connector it indicates the average batch size sent to Kafka. Shown as percent
confluent_cloud.connect.connector_task_batch_size_max (gauge)	Monitors the maximum batch size (measured by record count) per minute. For a source connector it indicates the max batch size sent to Kafka. Shown as percent
confluent_cloud.ksql.streaming_unit_count (gauge)	The count of Confluent Streaming Units (CSUs) for this KSQL instance. The count is sampled every 60 seconds. The implicit time aggregation for this metric is MAX. Shown as unit
confluent_cloud.ksql.query_saturation (gauge)	The maximum saturation for a given ksqlDB query across all nodes. Returns a value between 0 and 1. A value close to 1 indicates that ksqlDB query processing is bottlenecked on available resources.
confluent_cloud.ksql.task_stored_bytes (gauge)	The size of a given task’s state stores in bytes. Shown as byte
confluent_cloud.ksql.storage_utilization (gauge)	The total storage utilization for a given ksqlDB application.
confluent_cloud.schema_registry.schema_count (gauge)	The number of registered schemas.
confluent_cloud.schema_registry.request_count (count)	The delta count of requests received by the schema registry server. Each sample is the number of requests received since the previous data point. The count sampled every 60 seconds.
confluent_cloud.kafka.deprecated_request_count (count)	The delta count of deprecated requests received over the network. Each sample is the number of requests received since the previous data point. The count is sampled every 60 seconds. Shown as request
confluent_cloud.schema_registry.schema_operations_count (count)	The delta count of schema related operations. Each sample is the number of requests received since the previous data point. The count sampled every 60 seconds.
confluent_cloud.flink.num_records_in (count)	Total number of records all Flink SQL statements leveraging a Flink compute pool have received.
confluent_cloud.flink.num_records_out (count)	Total number of records all Flink SQL statements leveraging a Flink compute pool have emitted.
confluent_cloud.flink.pending_records (gauge)	Total backlog of all Flink SQL statements leveraging a Flink compute pool.
confluent_cloud.flink.compute_pool_utilization.current_cfus (gauge)	The absolute number of CFUs at a given moment.
confluent_cloud.flink.compute_pool_utilization.cfu_minutes_consumed (count)	The number of how many CFUs consumed since the last measurement.
confluent_cloud.flink.compute_pool_utilization.cfu_limit (gauge)	The possible max number of CFUs for the pool.
confluent_cloud.flink.current_input_watermark_ms (gauge)	The last watermark this statement has received (in milliseconds) for the given table.
confluent_cloud.flink.current_output_watermark_ms (gauge)	The last watermark this statement has produced (in milliseconds) to the given table.
confluent_cloud.custom.kafka.consumer_lag_offsets (gauge)	The lag between a group member’s committed offset and the partition’s high watermark. Tagged with `consumer_group_id`, topic, partition, `consumer_group_id` and `client_id`.

イベント

Confluent Cloud インテグレーションには、イベントは含まれません。

サービスチェック

Confluent Cloud インテグレーションには、サービスのチェック機能は含まれません。

トラブルシューティング

ご不明な点は、Datadog のサポートチームまでお問い合わせください。