Confluent Cloud

Confluent Cloud 대시보드 개요

개요

Confluent Cloud는 완전 관리형 클라우드 호스팅 스트리밍 데이터 서비스입니다. Datadog과 Confluent Cloud를 연결하여 Confluent Cloud 리소스의 주요 메트릭을 시각화하고 알림를 생성합니다.

Datadog의 즉시 사용 가능한 Confluent Cloud 대시보드는 활성 연결의 변화율, 평균 소비된 레코드와 생산된 레코드의 비율과 같은 정보를 포함하여 환경의 상태와 성능을 모니터링하기 위한 주요 클러스터 메트릭을 보여줍니다.

권장 모니터를 사용하여 토픽 랙(topic lag)이 너무 높아지면 팀에 알리거나 이러한 메트릭을 사용하여 직접 만들 수 있습니다.

설정

설치

Datadog Confluent Cloud 통합 타일을 사용하여 통합을 설치합니다.

설정

Confluent Cloudd에서 + Add API Key를 클릭하여 Confluent Cloud API Key 및 API Secret을 입력합니다.
- 클라우드 리소스 관리 API 키와 시크릿을 생성합니다.
- Save를 클릭합니다. Datadog은 해당 크리덴셜과 연결된 계정을 검색합니다.
- Datadog 통합 설정에서 API 키 및 API 시크릿 필드에 API 키와 시크릿을 추가합니다.
Confluent Cloud Cluster ID 또는 Connector ID를 추가합니다. Datadog은 Confluent Cloud 메트릭을 크롤링하고 몇 분 내에 메트릭을 로드합니다.
다음에 따라 Confluent Cloud(옵션)에 정의된 태그를 수집합니다.
- Schema Registry API 키 및 시크릿을 생성합니다. Confluent Cloud의 스키마 관리에 대해 자세히 알아보세요.
- 저장을 클릭합니다. Datadog은 Confluent Cloud에 정의된 태그를 수집합니다.
- Datadog 통합 설정에서 Schema Registry API 키 및 시크릿 필드에 API 키와 시크릿을 추가합니다.
클라우드 비용 관리(Cloud Cost Management)를 사용하고 비용 데이터 수집을 활성화한 경우
- API 키에 BillingAdmin 역할이 활성화되어 있는지 확인합니다.
- 24시간 내에 클라우드 비용 관리(Cloud Cost Management)에서 확인할 수 있습니다. (수집된 데이터)

설정 리소스(예: 클러스터 및 커넥터)에 대한 자세한 내용은 Confluent Cloud 통합 설명서를 참조하세요.

API Key 및 Secret

Confluent Cloud API 키와 Secret을 생성하려면 UI에서 새 서비스 계정에 MetricsViewer 역할 추가를 참조하세요.

Cluster ID

Confluent Cloud Cluster ID 찾는 방법:

Confluent Cloud에서 Environment Overview로 이동한 다음 모니터링하려는 클러스터를 선택합니다.
왼쪽 탐색에서 Cluster overview > Cluster settings를 클릭합니다.
**식별(Identification)**에서 lkc로 시작하는 Cluster ID를 복사합니다.

Connector ID

다음에 따라 Confluent Cloud Connector ID를 찾습니다.

Confluent Cloud에서 Environment Overview로 이동한 다음 모니터링하려는 클러스터를 선택합니다.
왼쪽 탐색에서 Data integration > Connectors를 클릭합니다.
**커넥터(Connectors)**에서 lcc로 시작하는 Connector ID를 복사합니다.

대시보드

통합 구성 후 Kafka 클러스터 및 커넥터 메트릭에 대한 개요를 보려면 즉시 사용 가능한 Confluent Cloud 대시보드를 참조하세요.

기본적으로 Confluent Cloud에서 수집한 모든 메트릭이 표시됩니다.

수집한 데이터

메트릭


confluent_cloud.kafka.received_bytes (count)	The delta count of bytes received from the network. Each sample is the number of bytes received since the previous data sample. The count is sampled every 60 seconds. Shown as byte
confluent_cloud.kafka.sent_bytes (count)	The delta count of bytes sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds. Shown as byte
confluent_cloud.kafka.received_records (count)	The delta count of records received. Each sample is the number of records received since the previous data sample. The count is sampled every 60 seconds. Shown as record
confluent_cloud.kafka.sent_records (count)	The delta count of records sent. Each sample is the number of records sent since the previous data point. The count is sampled every 60 seconds. Shown as record
confluent_cloud.kafka.retained_bytes (gauge)	The current count of bytes retained by the cluster. The count is sampled every 60 seconds. Shown as byte
confluent_cloud.kafka.active_connection_count (gauge)	The count of active authenticated connections. Shown as connection
confluent_cloud.kafka.request_count (count)	The delta count of requests received over the network. Each sample is the number of requests received since the previous data point. The count is sampled every 60 seconds. Shown as request
confluent_cloud.kafka.partition_count (gauge)	The number of partitions.
confluent_cloud.kafka.successful_authentication_count (count)	The delta count of successful authentications. Each sample is the number of successful authentications since the previous data point. The count is sampled every 60 seconds. Shown as attempt
confluent_cloud.kafka.cluster_link_destination_response_bytes (count)	The delta count of cluster linking response bytes from all request types. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds. Shown as byte
confluent_cloud.kafka.cluster_link_source_response_bytes (count)	The delta count of cluster linking source response bytes from all request types. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds. Shown as byte
confluent_cloud.kafka.cluster_active_link_count (gauge)	The current count of active cluster links. The count is sampled every 60 seconds. The implicit time aggregation for this metric is MAX.
confluent_cloud.kafka.cluster_load_percent (gauge)	A measure of the utilization of the cluster. The value is between 0.0 and 1.0. Shown as percent
confluent_cloud.kafka.cluster_load_percent_max (gauge)	A measure of the maximum broker utilization across the cluster. The value is between 0.0 and 1.0. Shown as percent
confluent_cloud.kafka.cluster_load_percent_avg (gauge)	A measure of the average utilization across the cluster. The value is between 0.0 and 1.0. Shown as percent
confluent_cloud.kafka.consumer_lag_offsets (gauge)	The lag between a group member’s committed offset and the partition’s high watermark. Tagged with `consumer_group_id` and topic.
confluent_cloud.kafka.cluster_link_count (gauge)	The current count of cluster links. The count is sampled every 60 seconds. The implicit time aggregation for this metric is MAX.
confluent_cloud.kafka.cluster_link_task_count (gauge)	The current count of cluster links tasks. The count is sampled every 60 seconds. The implicit time aggregation for this metric is MAX.
confluent_cloud.kafka.cluster_link_mirror_transition_in_error (gauge)	The cluster linking mirror topic state transition error count for a link. The count is sampled every 60 seconds.
confluent_cloud.kafka.cluster_link_mirror_topic_bytes (count)	The delta count of cluster linking mirror topic bytes. The count is sampled every 60 seconds.
confluent_cloud.kafka.cluster_link_mirror_topic_count (gauge)	The cluster linking mirror topic count for a link. The count is sampled every 60 seconds.
confluent_cloud.kafka.cluster_link_mirror_topic_offset_lag (gauge)	TThe cluster linking mirror topic offset lag maximum across all partitions. The lag is sampled every 60 seconds.
confluent_cloud.kafka.request_bytes (gauge)	The delta count of total request bytes from the specified request types sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.
confluent_cloud.kafka.response_bytes (gauge)	The delta count of total response bytes from the specified response types sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.
confluent_cloud.kafka.rest_produce_request_bytes (count)	The delta count of total request bytes from Kafka REST produce calls sent over the network requested by Kafka REST.
confluent_cloud.kafka.dedicated_cku_count (count)	CKU count of a dedicated cluster
confluent_cloud.kafka.producer_latency_avg_milliseconds (gauge)	The average latency of client producer request. Shown as millisecond
confluent_cloud.connect.sent_records (count)	The delta count of total number of records sent from the transformations and written to Kafka for the source connector. Each sample is the number of records sent since the previous data point. The count is sampled every 60 seconds. Shown as record
confluent_cloud.connect.received_records (count)	The delta count of total number of records received by the sink connector. Each sample is the number of records received since the previous data point. The count is sampled every 60 seconds. Shown as record
confluent_cloud.connect.sent_bytes (count)	The delta count of total bytes sent from the transformations and written to Kafka for the source connector. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds. Shown as byte
confluent_cloud.connect.received_bytes (count)	The delta count of total bytes received by the sink connector. Each sample is the number of bytes received since the previous data point. The count is sampled every 60 seconds. Shown as byte
confluent_cloud.connect.dead_letter_queue_records (count)	The delta count of dead letter queue records written to Kafka for the sink connector. The count is sampled every 60 seconds. Shown as record
confluent_cloud.connect.connector_status (count)	This metric monitors the status of a connector within the system. Its value is always set to 1 which signifies connector presence. The current operational state of the connector is identified through the status tag. Shown as record
confluent_cloud.connect.sql_server_cdc_source_connector_snapshot_running (gauge)	It represents whether the Snapshot is running. The values will incorporate any differences between the clocks on the machines where the database server and the connector are running.
confluent_cloud.connect.sql_server_cdc_source_connector_snapshot_completed (gauge)	It represents whether the Snapshot is completed. The values will incorporate any differences between the clocks on the machines where the database server and the connector are running.
confluent_cloud.connect.sql_server_cdc_source_connector_schema_history_status (gauge)	It represents the status of the schema history of the connector. The values will incorporate any differences between the clocks on the machines where the database server and the connector are running.
confluent_cloud.connect.mysql_cdc_source_connector_snapshot_running (gauge)	It represents whether the Snapshot is running. The values will incorporate any differences between the clocks on the machines where the database server and the connector are running.
confluent_cloud.connect.mysql_cdc_source_connector_snapshot_completed (gauge)	It represents whether the Snapshot is completed. The values will incorporate any differences between the clocks on the machines where the database server and the connector are running.
confluent_cloud.connect.mysql_cdc_source_connector_schema_history_status (gauge)	It represents the status of the schema history of the connector. The values will incorporate any differences between the clocks on the machines where the database server and the connector are running.
confluent_cloud.connect.postgres_cdc_source_connector_snapshot_running (gauge)	It represents whether the Snapshot is running. The values will incorporate any differences between the clocks on the machines where the database server and the connector are running.
confluent_cloud.connect.postgres_cdc_source_connector_snapshot_completed (gauge)	It represents whether the Snapshot is completed. The values will incorporate any differences between the clocks on the machines where the database server and the connector are running.
confluent_cloud.connect.connector_task_status (gauge)	Monitors the status of a connector’s task within the system. Its value is always set to 1 signifying the connector task’s presence.
confluent_cloud.connect.connector_task_batch_size_avg (gauge)	Monitors the average batch size (measured by record count) per minute. For a source connector it indicates the average batch size sent to Kafka. Shown as percent
confluent_cloud.connect.connector_task_batch_size_max (gauge)	Monitors the maximum batch size (measured by record count) per minute. For a source connector it indicates the max batch size sent to Kafka. Shown as percent
confluent_cloud.ksql.streaming_unit_count (gauge)	The count of Confluent Streaming Units (CSUs) for this KSQL instance. The count is sampled every 60 seconds. The implicit time aggregation for this metric is MAX. Shown as unit
confluent_cloud.ksql.query_saturation (gauge)	The maximum saturation for a given ksqlDB query across all nodes. Returns a value between 0 and 1. A value close to 1 indicates that ksqlDB query processing is bottlenecked on available resources.
confluent_cloud.ksql.task_stored_bytes (gauge)	The size of a given task’s state stores in bytes. Shown as byte
confluent_cloud.ksql.storage_utilization (gauge)	The total storage utilization for a given ksqlDB application.
confluent_cloud.schema_registry.schema_count (gauge)	The number of registered schemas.
confluent_cloud.schema_registry.request_count (count)	The delta count of requests received by the schema registry server. Each sample is the number of requests received since the previous data point. The count sampled every 60 seconds.
confluent_cloud.kafka.deprecated_request_count (count)	The delta count of deprecated requests received over the network. Each sample is the number of requests received since the previous data point. The count is sampled every 60 seconds. Shown as request
confluent_cloud.schema_registry.schema_operations_count (count)	The delta count of schema related operations. Each sample is the number of requests received since the previous data point. The count sampled every 60 seconds.
confluent_cloud.flink.num_records_in (count)	Total number of records all Flink SQL statements leveraging a Flink compute pool have received.
confluent_cloud.flink.num_records_out (count)	Total number of records all Flink SQL statements leveraging a Flink compute pool have emitted.
confluent_cloud.flink.pending_records (gauge)	Total backlog of all Flink SQL statements leveraging a Flink compute pool.
confluent_cloud.flink.compute_pool_utilization.current_cfus (gauge)	The absolute number of CFUs at a given moment.
confluent_cloud.flink.compute_pool_utilization.cfu_minutes_consumed (count)	The number of how many CFUs consumed since the last measurement.
confluent_cloud.flink.compute_pool_utilization.cfu_limit (gauge)	The possible max number of CFUs for the pool.
confluent_cloud.flink.current_input_watermark_ms (gauge)	The last watermark this statement has received (in milliseconds) for the given table.
confluent_cloud.flink.current_output_watermark_ms (gauge)	The last watermark this statement has produced (in milliseconds) to the given table.
confluent_cloud.custom.kafka.consumer_lag_offsets (gauge)	The lag between a group member’s committed offset and the partition’s high watermark. Tagged with `consumer_group_id`, topic, partition, `consumer_group_id` and `client_id`.

이벤트

Confluent Cloud 통합에는 이벤트가 포함되지 않습니다.

서비스 점검

Confluent Cloud 통합에는 서비스 점검이 포함되지 않습니다.

트러블슈팅

도움이 필요하신가요? Datadog 지원팀에 문의하세요.