Amazon Managed Streaming for Apache Kafka

概要

Amazon Managed Streaming for Apache Kafka (MSK) は、Apache Kafka を使用してストリーミングデータを処理するアプリケーションを、簡単に構築して実行できるフルマネージド型のサービスです。

このインテグレーションでは、CloudWatch からメトリクスを収集するクローラーを使用します。Datadog Agent による MSK の監視については、Amazon MSK (Agent) のページをお読みください。

セットアップ

Amazon MSK クローラーを有効にして、CloudWatch からの MSK メトリクスを Datadog で確認できるようにします。

APM に Datadog Agent を構成する

Amazon Web Services インテグレーションをまだセットアップしていない場合は、最初にセットアップします。

メトリクスの収集

  1. AWS インテグレーションページで、Metric Collection タブの下にある Kafka が有効になっていることを確認します。

  2. Amazon MSK インテグレーションをインストールします。

ログの収集

ログの有効化

Amazon MSK から S3 バケットまたは CloudWatch のいずれかにログを送信するよう構成します。

:

  • S3 バケットにログを送る場合は、Target prefixamazon_msk に設定されているかを確認してください。
  • CloudWatch のロググループにログを送る場合は、その名前に msk という部分文字列が含まれていることを確認してください。

ログを Datadog に送信する方法

  1. Datadog Forwarder Lambda 関数をまだセットアップしていない場合は、セットアップします。

  2. Lambda 関数がインストールされたら、AWS コンソールから、Amazon MSK ログを含む S3 バケットまたは CloudWatch のロググループに手動でトリガーを追加します。

収集データ

メトリクス

aws.kafka.zookeeper_request_latency_ms_mean
(gauge)
Mean latency in milliseconds for ZooKeeper requests from broker.
aws.kafka.active_controller_count
(gauge)
Only one controller per cluster should be active at any given time.
aws.kafka.active_controller_count.maximum
(gauge)
Only one controller per cluster should be active at any given time.
aws.kafka.global_partition_count
(gauge)
Total number of partitions across all brokers in the cluster.
aws.kafka.global_partition_count.maximum
(gauge)
Maximum total number of partitions across all brokers in the cluster.
aws.kafka.global_topic_count
(gauge)
Total number of topics averaged by the number of brokers in the cluster.
aws.kafka.global_topic_count.maximum
(gauge)
Maximum total number of topics averaged by the number of brokers in the cluster.
aws.kafka.offline_partitions_count
(gauge)
Total number of partitions that are offline in the cluster.
aws.kafka.swap_used
(gauge)
The size in bytes of swap memory that is in use for the broker.
Shown as byte
aws.kafka.swap_free
(gauge)
The size in bytes of swap memory that is available for the broker.
Shown as byte
aws.kafka.memory_used
(gauge)
The size in bytes of memory that is in use for the broker.
Shown as byte
aws.kafka.memory_buffered
(gauge)
The size in bytes of buffered memory for the broker.
Shown as byte
aws.kafka.memory_free
(gauge)
The size in bytes of memory that is free and available for the broker.
Shown as byte
aws.kafka.memory_cached
(gauge)
The size in bytes of cached memory for the broker.
Shown as byte
aws.kafka.cpu_user
(gauge)
The percentage of CPU in user space.
Shown as percent
aws.kafka.cpu_system
(gauge)
The percentage of CPU in kernel space.
Shown as percent
aws.kafka.cpu_idle
(gauge)
The percentage of CPU idle time.
Shown as percent
aws.kafka.root_disk_used
(gauge)
The percentage of the root disk used by the broker.
Shown as percent
aws.kafka.kafka_app_logs_disk_used
(gauge)
The percentage of disk space used for application logs.
Shown as percent
aws.kafka.kafka_data_logs_disk_used
(gauge)
The percentage of disk space used for data logs.
Shown as percent
aws.kafka.network_rx_errors
(count)
The number of network receive errors for the broker.
aws.kafka.network_tx_errors
(count)
The number of network transmit errors for the broker.
aws.kafka.network_rx_dropped
(count)
The number of dropped receive packages.
aws.kafka.network_tx_dropped
(count)
The number of dropped transmit packages.
aws.kafka.network_rx_packets
(count)
The number of packets recieved by the broker.
aws.kafka.network_tx_packets
(count)
The number of packets transmitted by the broker.
aws.kafka.messages_in_per_sec
(gauge)
The number of incoming messages per second for the broker.
aws.kafka.network_processor_avg_idle_percent
(gauge)
The average percentage of the time the network processors are idle.
Shown as percent
aws.kafka.request_handler_avg_idle_percent
(gauge)
The average percentage of the time the request handler threads are idle.
aws.kafka.leader_count
(gauge)
The number of leader replicas.
aws.kafka.partition_count
(gauge)
The number of partitions for the broker.
aws.kafka.produce_local_time_ms_mean
(gauge)
The mean time in milliseconds for the follower to send a response.
Shown as millisecond
aws.kafka.produce_message_conversions_time_ms_mean
(gauge)
The mean time in milliseconds spent on message format conversions.
Shown as millisecond
aws.kafka.produce_request_queue_time_ms_mean
(gauge)
The mean time in milliseconds that request messages spend in the queue.
Shown as millisecond
aws.kafka.produce_response_queue_time_ms_mean
(gauge)
The mean time in milliseconds that response messages spend in the queue.
Shown as millisecond
aws.kafka.produce_response_send_time_ms_mean
(gauge)
The mean time in milliseconds spent on sending response messages.
Shown as millisecond
aws.kafka.produce_total_time_ms_mean
(gauge)
The mean produce time in milliseconds.
Shown as millisecond
aws.kafka.request_bytes_mean
(gauge)
The mean number of request bytes for the broker.
aws.kafka.under_minlsr_partition_count
(gauge)
The number of under minlsr partitions for the broker
aws.kafka.under_replicated_partitions
(gauge)
The number of under-replicated partitions for the broker.
aws.kafka.bytes_in_per_sec
(rate)
The number of bytes per second received from clients.
Shown as byte
aws.kafka.bytes_out_per_sec
(rate)
The number of bytes per second sent to clients.
Shown as byte
aws.kafka.messages_in_per_sec
(rate)
The number of messages received from clients per second.
aws.kafka.fetch_message_conversions_per_sec
(rate)
The number of fetch message conversions per second for the broker.
aws.kafka.produce_message_conversions_per_sec
(rate)
The number of produce message conversions per second for the broker.
aws.kafka.fetch_consumer_total_time_ms_mean
(gauge)
The mean total time in milliseconds that consumers spend on fetching data from the broker.
Shown as millisecond
aws.kafka.fetch_follower_total_time_ms_mean
(gauge)
The mean total time in milliseconds that followers spend on fetching data from the broker.
Shown as millisecond
aws.kafka.fetch_consumer_request_queue_time_ms_mean
(gauge)
The mean time in milliseconds that the consumer request waits in the request queue.
Shown as millisecond
aws.kafka.fetch_follower_request_queue_time_ms_mean
(gauge)
The mean time in milliseconds that the follower request waits in the request queue.
Shown as millisecond
aws.kafka.fetch_consumer_local_time_ms_mean
(gauge)
The mean time in milliseconds that the consumer request is processed at the leader.
Shown as millisecond
aws.kafka.fetch_follower_local_time_ms_mean
(gauge)
The mean time in milliseconds that the follower request is processed at the leader.
Shown as millisecond
aws.kafka.fetch_consumer_response_queue_time_ms_mean
(gauge)
The mean time in milliseconds that the consumer request waits in the response queue.
Shown as millisecond
aws.kafka.fetch_follower_response_queue_time_ms_mean
(gauge)
The mean time in milliseconds that the follower request waits in the response queue.
Shown as millisecond
aws.kafka.consumer_response_send_time_ms_mean
(gauge)
The mean time in milliseconds for the consumer to send a response.
Shown as millisecond
aws.kafka.fetch_follower_response_send_time_ms_mean
(gauge)
The mean time in milliseconds for the follower to send a response.
Shown as millisecond
aws.kafka.produce_throttle_time
(gauge)
The average produce throttle time in milliseconds.
Shown as millisecond
aws.kafka.produce_throttle_byte_rate
(gauge)
The number of throttled bytes per second.
aws.kafka.produce_throttle_queue_size
(gauge)
The number of messages in the throttle queue.
aws.kafka.fetch_throttle_time
(gauge)
The average fetch throttle time in milliseconds.
Shown as millisecond
aws.kafka.fetch_throttle_byte_rate
(gauge)
The number of throttled bytes per second.
aws.kafka.fetch_throttle_queue_size
(gauge)
The number of messages in the throttle queue.
aws.kafka.request_throttle_time
(gauge)
The average request throttle time in milliseconds.
Shown as millisecond
aws.kafka.request_time
(gauge)
The average time spent in broker network and I/O threads to process requests that are exempt throttled.
aws.kafka.request_throttle_queue_size
(gauge)
The number of messages in the throttle queue.
aws.kafka.request_exempt_from_throttle_time
(gauge)
The average time spent in broker network and I/O threads to process requests that are exempt from throttling.
aws.kafka.estimated_max_time_lag
(gauge)
Time estimate (in seconds) to drain MaxOffsetLag.
Shown as second
aws.kafka.estimated_time_lag
(gauge)
Time estimate (in seconds) to drain the partition offset lag.
Shown as second
aws.kafka.max_offset_lag
(gauge)
The maximum offset lag across all partitions in a topic.
aws.kafka.offset_lag
(gauge)
Partition-level consumer lag in numberofoffsets.
aws.kafka.sum_offset_lag
(gauge)
The aggregated offset lag for all the partitions in a topic.
aws.kafka.cpu_credit_balance
(gauge)
This metric can help you monitor CPU credit balance on the brokers.
aws.kafka.memory_heap_after_gc
(gauge)
The percentage of total heap memory available after garbage collection.
Shown as percent
aws.kafka.bw_in_allowance_exceeded
(count)
The number of packets shaped because the inbound aggregate bandwidth exceeded the maximum for the broker.
aws.kafka.bw_out_allowance_exceeded
(count)
The number of packets shaped because the outbound aggregate bandwidth exceeded the maximum for the broker.
aws.kafka.conn_track_allowance_exceeded
(count)
The number of packets shaped because the connection tracking exceeded the maximum for the broker. Connection tracking is related to security groups that track each connection established to ensure that return packets are delivered as expected.
aws.kafka.connection_close_rate
(rate)
The number of connections closed per second per listener. This number is aggregated per listener and filtered for the client listeners.
aws.kafka.connection_creation_rate
(rate)
The number of new connections established per second per listener. This number is aggregated per listener and filtered for the client listeners.
aws.kafka.cpu_credit_usage
(gauge)
This metric can help you monitor CPU credit usage on the instances. If your CPU usage is sustained above the baseline level of 20% you can run out of the CPU credit balance which can have a negative impact on cluster performance. You can monitor and alarm on this metric to take corrective actions.
aws.kafka.pps_allowance_exceeded
(count)
The number of packets shaped because the bidirectional PPS exceeded the maximum for the broker.
aws.kafka.replication_bytes_in_per_sec
(rate)
The number of bytes per second received from other brokers.
Shown as byte
aws.kafka.replication_bytes_out_per_sec
(rate)
The number of bytes per second sent to other brokers.
Shown as byte
aws.kafka.tcp_connections
(gauge)
Shows number of incoming and outgoing TCP segments with the SYN flag set.
aws.kafka.traffic_bytes
(gauge)
Shows network traffic in overall bytes between clients (producers and consumers) and brokers. Traffic between brokers isn't reported.
Shown as byte
aws.kafka.volume_queue_length
(gauge)
The number of read and write operation requests waiting to be completed in a specified time period.
aws.kafka.volume_read_bytes
(gauge)
The number of bytes read in a specified time period.
Shown as byte
aws.kafka.volume_read_ops
(gauge)
The number of read operations in a specified time period.
aws.kafka.volume_total_read_time
(gauge)
The total number of seconds spent by all read operations that completed in a specified time period.
Shown as second
aws.kafka.volume_total_write_time
(gauge)
The total number of seconds spent by all write operations that completed in a specified time period.
Shown as second
aws.kafka.volume_write_bytes
(gauge)
The number of bytes written in a specified time period.
Shown as byte
aws.kafka.volume_write_ops
(gauge)
The number of write operations in a specified time period.

イベント

Amazon MSK クローラーには、イベントは含まれません。

サービスのチェック

Amazon MSK インテグレーションには、サービスのチェック機能は含まれません。

トラブルシューティング

ご不明な点は、Datadog のサポートチームまでお問い合わせください。

お役に立つドキュメント、リンクや記事:

お役に立つドキュメント、リンクや記事: