Confluent Cloud

개요

Confluent Cloud 통합은 Datadog 사이트에서 지원되지 않습니다.

Confluent Cloud는 완전 관리형 클라우드 호스팅 스트리밍 데이터 서비스입니다. Datadog과 Confluent Cloud를 연결하여 Confluent Cloud 리소스의 주요 메트릭을 시각화하고 알림를 생성합니다.

Datadog의 즉시 사용 가능한 Confluent Cloud 대시보드는 활성 연결의 변화율, 평균 소비된 레코드와 생산된 레코드의 비율과 같은 정보를 포함하여 환경의 상태와 성능을 모니터링하기 위한 주요 클러스터 메트릭을 보여줍니다.

권장 모니터를 사용하여 토픽 랙(topic lag)이 너무 높아지면 팀에 알리거나 이러한 메트릭을 사용하여 직접 만들 수 있습니다.

스트리밍 데이터 파이프라인의 토폴로지를 시각화하거나 데이터 스트림 설정에서 로컬 병목 현상을 조사하는 것이 유용한 경우 Data Streams Monitoring을 참조하세요.

설정

설치

Datadog Confluent Cloud 통합 타일을 사용하여 통합을 설치합니다.

구성

  1. 통합 타일에서 Configuration 탭으로 이동합니다.
  2. + Add API Key를 클릭하여 Confluent Cloud API Key 및 API Secret을 입력합니다.
  3. Save를 클릭합니다. Datadog은 해당 크리덴셜과 연결된 계정을 검색합니다.
  4. Confluent Cloud Cluster ID 또는 Connector ID를 추가합니다. Datadog은 Confluent Cloud 메트릭을 크롤링하고 몇 분 내에 메트릭을 로드합니다.

API Key 및 Secret

Confluent Cloud API 키와 Secret을 생성하려면 UI에서 새 서비스 계정에 MetricsViewer 역할 추가를 참조하세요.

Cluster ID

Confluent Cloud Cluster ID 찾는 방법:

  1. Confluent Cloud에서 Environment Overview로 이동한 다음 모니터링하려는 클러스터를 선택합니다.
  2. 왼쪽 탐색에서 Cluster overview > Cluster settings를 클릭합니다.
  3. Identification에서 lkc로 시작하는 Cluster ID를 복사합니다.

Connector ID

Confluent Cloud Connector ID 찾는 방법:

  1. Confluent Cloud에서 Environment Overview로 이동한 다음 모니터링하려는 클러스터를 선택합니다.
  2. 왼쪽 탐색에서 Data integration > Connectors를 클릭합니다.
  3. Connectors에서 lcc로 시작하는 Connector ID를 복사합니다.

대시보드

통합 구성 후 Kafka 클러스터 및 커넥터 메트릭에 대한 개요를 보려면 즉시 사용 가능한 Confluent Cloud 대시보드를 참조하세요.

기본적으로 Confluent Cloud에서 수집된 모든 메트릭이 표시됩니다.

수집한 데이터

메트릭

confluent_cloud.kafka.received_bytes
(count)
The delta count of bytes received from the network. Each sample is the number of bytes received since the previous data sample. The count is sampled every 60 seconds.
Shown as byte
confluent_cloud.kafka.sent_bytes
(count)
The delta count of bytes sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.
Shown as byte
confluent_cloud.kafka.received_records
(count)
The delta count of records received. Each sample is the number of records received since the previous data sample. The count is sampled every 60 seconds.
Shown as record
confluent_cloud.kafka.sent_records
(count)
The delta count of records sent. Each sample is the number of records sent since the previous data point. The count is sampled every 60 seconds.
Shown as record
confluent_cloud.kafka.retained_bytes
(gauge)
The current count of bytes retained by the cluster. The count is sampled every 60 seconds.
Shown as byte
confluent_cloud.kafka.active_connection_count
(gauge)
The count of active authenticated connections.
Shown as connection
confluent_cloud.kafka.request_count
(count)
The delta count of requests received over the network. Each sample is the number of requests received since the previous data point. The count is sampled every 60 seconds.
Shown as request
confluent_cloud.kafka.partition_count
(gauge)
The number of partitions.
confluent_cloud.kafka.successful_authentication_count
(count)
The delta count of successful authentications. Each sample is the number of successful authentications since the previous data point. The count is sampled every 60 seconds.
Shown as attempt
confluent_cloud.kafka.cluster_link_destination_response_bytes
(count)
The delta count of cluster linking response bytes from all request types. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.
Shown as byte
confluent_cloud.kafka.cluster_link_source_response_bytes
(count)
The delta count of cluster linking source response bytes from all request types. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.
Shown as byte
confluent_cloud.kafka.cluster_active_link_count
(gauge)
The current count of active cluster links. The count is sampled every 60 seconds. The implicit time aggregation for this metric is MAX.
confluent_cloud.kafka.cluster_load_percent
(gauge)
A measure of the utilization of the cluster. The value is between 0.0 and 1.0.
Shown as percent
confluent_cloud.kafka.cluster_load_percent_max
(gauge)
A measure of the maximum broker utilization across the cluster. The value is between 0.0 and 1.0.
Shown as percent
confluent_cloud.kafka.cluster_load_percent_avg
(gauge)
A measure of the average utilization across the cluster. The value is between 0.0 and 1.0.
Shown as percent
confluent_cloud.kafka.consumer_lag_offsets
(gauge)
The lag between a group member's committed offset and the partition's high watermark.
confluent_cloud.kafka.cluster_link_count
(gauge)
The current count of cluster links. The count is sampled every 60 seconds. The implicit time aggregation for this metric is MAX.
confluent_cloud.kafka.cluster_link_mirror_topic_bytes
(count)
The delta count of cluster linking mirror topic bytes. The count is sampled every 60 seconds.
confluent_cloud.kafka.cluster_link_mirror_topic_count
(gauge)
The cluster linking mirror topic count for a link. The count is sampled every 60 seconds.
confluent_cloud.kafka.cluster_link_mirror_topic_offset_lag
(gauge)
TThe cluster linking mirror topic offset lag maximum across all partitions. The lag is sampled every 60 seconds.
confluent_cloud.kafka.request_bytes
(gauge)
The delta count of total request bytes from the specified request types sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.
confluent_cloud.kafka.response_bytes
(gauge)
The delta count of total response bytes from the specified response types sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.
confluent_cloud.kafka.rest_produce_request_bytes
(count)
The delta count of total request bytes from Kafka REST produce calls sent over the network requested by Kafka REST.
confluent_cloud.connect.sent_records
(count)
The delta count of total number of records sent from the transformations and written to Kafka for the source connector. Each sample is the number of records sent since the previous data point. The count is sampled every 60 seconds.
Shown as record
confluent_cloud.connect.received_records
(count)
The delta count of total number of records received by the sink connector. Each sample is the number of records received since the previous data point. The count is sampled every 60 seconds.
Shown as record
confluent_cloud.connect.sent_bytes
(count)
The delta count of total bytes sent from the transformations and written to Kafka for the source connector. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.
Shown as byte
confluent_cloud.connect.received_bytes
(count)
The delta count of total bytes received by the sink connector. Each sample is the number of bytes received since the previous data point. The count is sampled every 60 seconds.
Shown as byte
confluent_cloud.connect.dead_letter_queue_records
(count)
The delta count of dead letter queue records written to Kafka for the sink connector. The count is sampled every 60 seconds.
Shown as record
confluent_cloud.ksql.streaming_unit_count
(gauge)
The count of Confluent Streaming Units (CSUs) for this KSQL instance. The count is sampled every 60 seconds. The implicit time aggregation for this metric is MAX.
Shown as unit
confluent_cloud.ksql.query_saturation
(gauge)
The maximum saturation for a given ksqlDB query across all nodes. Returns a value between 0 and 1. A value close to 1 indicates that ksqlDB query processing is bottlenecked on available resources.
confluent_cloud.ksql.task_stored_bytes
(gauge)
The size of a given task's state stores in bytes.
Shown as byte
confluent_cloud.ksql.storage_utilization
(gauge)
The total storage utilization for a given ksqlDB application.
confluent_cloud.schema_registry.schema_count
(gauge)
The number of registered schemas.
confluent_cloud.schema_registry.request_count
(count)
The delta count of requests received by the schema registry server. Each sample is the number of requests received since the previous data point. The count sampled every 60 seconds.
confluent_cloud.schema_registry.schema_operations_count
(count)
The delta count of schema related operations. Each sample is the number of requests received since the previous data point. The count sampled every 60 seconds.
confluent_cloud.flink.num_records_in
(count)
Total number of records all Flink SQL statements leveraging a Flink compute pool have received.
confluent_cloud.flink.num_records_out
(count)
Total number of records all Flink SQL statements leveraging a Flink compute pool have emitted.
confluent_cloud.flink.pending_records
(gauge)
Total backlog of all Flink SQL statements leveraging a Flink compute pool.
confluent_cloud.flink.compute_pool_utilization.current_cfus
(gauge)
The absolute number of CFUs at a given moment.
confluent_cloud.flink.compute_pool_utilization.cfu_minutes_consumed
(count)
The number of how many CFUs consumed since the last measurement.
confluent_cloud.flink.compute_pool_utilization.cfu_limit
(gauge)
The possible max number of CFUs for the pool.
confluent_cloud.custom.kafka.consumer_lag_offsets
(gauge)
The lag between a group member's committed offset and the partition's high watermark.

이벤트

Confluent Cloud 통합에는 이벤트가 포함되지 않습니다.

서비스 점검

Confluent Cloud 통합에는 서비스 점검이 포함되지 않습니다.

트러블슈팅

도움이 필요하신가요? Datadog 지원팀에 문의하세요.

참고 자료