Pulsar

Supported OS Linux Windows

Integrationv1.2.0

Overview

This check monitors Pulsar through the Datadog Agent.

Setup

Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the Autodiscovery Integration Templates for guidance on applying these instructions.

Installation

The Pulsar check is included in the Datadog Agent package. No additional installation is needed on your server.

Configuration

  1. Edit the pulsar.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting your pulsar performance data. See the sample pulsar.d/conf.yaml for all available configuration options.

  2. Restart the Agent.

Validation

Run the Agent’s status subcommand and look for pulsar under the Checks section.

Data Collected

Metrics

pulsar.bookie_SERVER_STATUS
(gauge)
The server status for bookie server. 1: the bookie is running in writable mode.0: the bookie is running in readonly mode.
pulsar.bookkeeper_server_ADD_ENTRY_count.count
(count)
The total number of ADD_ENTRY requests received at the bookie. The success label is used to distinguish successes and failures.
Shown as request
pulsar.bookkeeper_server_READ_ENTRY_count.count
(count)
The total number of READ_ENTRY requests received at the bookie. The success label is used to distinguish successes and failures.
Shown as request
pulsar.bookie_WRITE_BYTES.count
(count)
The total number of bytes written to the bookie.
Shown as byte
pulsar.bookie_READ_BYTES.count
(count)
The total number of bytes read from the bookie.
Shown as byte
pulsar.bookie_journal_JOURNAL_SYNC_count.count
(count)
The total number of journal fsync operations happening at the bookie. The success label is used to distinguish successes and failures.
pulsar.bookie_journal_JOURNAL_QUEUE_SIZE
(gauge)
The total number of requests pending in the journal queue.
Shown as request
pulsar.bookie_journal_JOURNAL_FORCE_WRITE_QUEUE_SIZE
(gauge)
The total number of force write (fsync) requests pending in the force-write queue.
Shown as request
pulsar.bookie_journal_JOURNAL_CB_QUEUE_SIZE
(gauge)
The total number of callbacks pending in the callback queue.
pulsar.bookie_ledgers_count
(gauge)
The total number of ledgers stored in the bookie.
pulsar.bookie_entries_count
(gauge)
The total number of entries stored in the bookie.
pulsar.bookie_write_cache_size
(gauge)
The bookie write cache size (in bytes).
Shown as byte
pulsar.bookie_read_cache_size
(gauge)
The bookie read cache size (in bytes).
Shown as byte
pulsar.bookie_DELETED_LEDGER_COUNT.count
(count)
The total number of ledgers deleted since the bookie has started.
pulsar.bookie_ledger_writable_dirs
(gauge)
The number of writable directories in the bookie.
pulsar.bookie_flush
(gauge)
The table flush latency of bookie memory.
pulsar.bookie_throttled_write_requests.count
(count)
The number of write requests to be throttled.
Shown as request
pulsar.bookkeeper_server_BOOKIE_QUARANTINE_count.count
(count)
The number of bookie clients to be quarantined.
pulsar.topics_count
(gauge)
The number of Pulsar topics of the namespace owned by this broker.
pulsar.subscriptions_count
(gauge)
The number of Pulsar subscriptions of the namespace served by this broker.
pulsar.producers_count
(gauge)
The number of active producers of the namespace connected to this broker.
pulsar.consumers_count
(gauge)
The number of active consumers of the namespace connected to this broker.
pulsar.rate_in
(gauge)
The total message rate of the namespace coming into this broker (messages/second).
pulsar.rate_out
(gauge)
The total message rate of the namespace going out from this broker (messages/second).
pulsar.throughput_in
(gauge)
The total throughput of the namespace coming into this broker (bytes/second).
pulsar.throughput_out
(gauge)
The total throughput of the namespace going out from this broker (bytes/second).
pulsar.storage_size
(gauge)
The total storage size of the topics in this namespace owned by this broker (bytes).
pulsar.storage_logical_size
(gauge)
The storage size of topics in the namespace owned by the broker without replicas (in bytes).
pulsar.storage_backlog_size
(gauge)
The total backlog size of the topics of this namespace owned by this broker (messages).
pulsar.storage_offloaded_size
(gauge)
The total amount of the data in this namespace offloaded to the tiered storage (bytes).
pulsar.storage_write_rate
(gauge)
The total message batches (entries) written to the storage for this namespace (message batches / second).
pulsar.storage_read_rate
(gauge)
The total message batches (entries) read from the storage for this namespace (message batches / second).
pulsar.subscription_delayed
(gauge)
The total message batches (entries) are delayed for dispatching.
pulsar.replication_rate_in
(gauge)
The total message rate of the namespace replicating from remote cluster (messages/second).
pulsar.replication_rate_out
(gauge)
The total message rate of the namespace replicating to remote cluster (messages/second).
pulsar.replication_throughput_in
(gauge)
The total throughput of the namespace replicating from remote cluster (bytes/second).
pulsar.replication_throughput_out
(gauge)
The total throughput of the namespace replicating to remote cluster (bytes/second).
pulsar.replication_backlog
(gauge)
The total backlog of the namespace replicating to remote cluster (messages).
Shown as message
pulsar.replication_rate_expired
(gauge)
Total rate of messages expired (messages/second).
pulsar.replication_connected_count
(gauge)
The count of replication-subscriber up and running to replicate to remote cluster.
pulsar.replication_delay_in_seconds
(gauge)
Time in seconds from the time a message was produced to the time when it is about to be replicated.
Shown as second
pulsar.storage_backlog_quota_limit
(gauge)
The total amount of the data in this topic that limit the backlog quota (bytes).
Shown as byte
pulsar.in_bytes_total
(gauge)
The number of messages in bytes received for this topic.
Shown as byte
pulsar.in_messages_total
(gauge)
The total number of messages received for this topic.
Shown as message
pulsar.out_bytes_total
(gauge)
The total number of messages in bytes read from this topic.
Shown as byte
pulsar.out_messages_total
(gauge)
The total number of messages read from this topic.
Shown as message
pulsar.compaction_removed_event_count
(gauge)
The total number of removed events of the compaction.
pulsar.compaction_succeed_count
(gauge)
The total number of successes of the compaction.
pulsar.compaction_failed_count
(gauge)
The total number of failures of the compaction.
pulsar.compaction_duration_time_in_mills
(gauge)
The duration time of the compaction.
pulsar.compaction_read_throughput
(gauge)
The read throughput of the compaction.
pulsar.compaction_write_throughput
(gauge)
The write throughput of the compaction.
pulsar.compaction_compacted_entries_count
(gauge)
The total number of the compacted entries.
pulsar.compaction_compacted_entries_size
(gauge)
The total size of the compacted entries.
pulsar.broker_load_manager_bundle_assignment
(gauge)
The summary of latency of bundles ownership operations.
pulsar.broker_lookup.count
(count)
Number of samples of the latency of all lookup operations.
pulsar.broker_lookup.sum
(count)
Total latency of all lookup operations.
pulsar.broker_lookup.quantle
(count)
Latency of all lookup operations.
pulsar.broker_lookup_redirects.count
(count)
The number of lookup redirected requests.
Shown as request
pulsar.broker_lookup_answers.count
(count)
The number of lookup responses (i.e. not redirected requests).
Shown as response
pulsar.broker_lookup_failures.count
(count)
The number of lookup failures.
pulsar.broker_lookup_pending_requests
(gauge)
The number of pending lookups in broker. When it is up to the threshold, new requests are rejected.
Shown as request
pulsar.broker_topic_load_pending_requests
(gauge)
The load of pending topic operations.
pulsar.ml_cache_evictions
(gauge)
The number of cache evictions during the last minute.
Shown as eviction
pulsar.ml_cache_hits_rate
(gauge)
The number of cache hits per second on the broker side.
Shown as hit
pulsar.ml_cache_hits_throughput
(gauge)
The amount of data is retrieved from the cache on the broker side (in byte/s).
pulsar.ml_cache_misses_rate
(gauge)
The number of cache misses per second on the broker side.
Shown as miss
pulsar.ml_cache_misses_throughput
(gauge)
The amount of data is not retrieved from the cache on the broker side (in byte/s).
pulsar.ml_cache_pool_active_allocations
(gauge)
The number of currently active allocations in direct arena
pulsar.ml_cache_pool_active_allocations_huge
(gauge)
The number of currently active huge allocation in direct arena
pulsar.ml_cache_pool_active_allocations_normal
(gauge)
The number of currently active normal allocations in direct arena
pulsar.ml_cache_pool_active_allocations_small
(gauge)
The number of currently active small allocations in direct arena
pulsar.ml_cache_pool_allocated
(gauge)
The total allocated memory of chunk lists in direct arena
pulsar.ml_cache_pool_used
(gauge)
The total used memory of chunk lists in direct arena
pulsar.ml_cache_used_size
(gauge)
The size in byte used to store the entries payloads
Shown as byte
pulsar.ml_count
(gauge)
The number of currently opened managed ledgers
pulsar.ml_AddEntryBytesRate
(gauge)
The bytes/s rate of messages added
pulsar.ml_AddEntryWithReplicasBytesRate
(gauge)
The bytes/s rate of messages added with replicas
pulsar.ml_AddEntryErrors
(gauge)
The number of addEntry requests that failed
Shown as request
pulsar.ml_AddEntryMessagesRate
(gauge)
The msg/s rate of messages added
pulsar.ml_AddEntrySucceed
(gauge)
The number of addEntry requests that succeeded
Shown as request
pulsar.ml_MarkDeleteRate
(gauge)
The rate of mark-delete ops/s
pulsar.ml_NumberOfMessagesInBacklog
(gauge)
The number of backlog messages for all the consumers
Shown as message
pulsar.ml_ReadEntriesBytesRate
(gauge)
The bytes/s rate of messages read
pulsar.ml_ReadEntriesErrors
(gauge)
The number of readEntries requests that failed
Shown as request
pulsar.ml_ReadEntriesRate
(gauge)
The msg/s rate of messages read
pulsar.ml_ReadEntriesSucceeded
(gauge)
The number of readEntries requests that succeeded
Shown as request
pulsar.ml_StoredMessagesSize
(gauge)
The total size of the messages in active ledgers (accounting for the multiple copies stored)
pulsar.brk_ml_cursor_persistLedgerSucceed
(gauge)
The number of acknowledgment states that is persistent to a ledger.
pulsar.brk_ml_cursor_persistLedgerErrors
(gauge)
The number of ledger errors occurred when acknowledgment states fail to be persistent to the ledger.
Shown as error
pulsar.brk_ml_cursor_persistZookeeperSucceed
(gauge)
The number of acknowledgment states that is persistent to ZooKeeper.
pulsar.brk_ml_cursor_persistZookeeperErrors
(gauge)
The number of ledger errors occurred when acknowledgment states fail to be persistent to ZooKeeper.
Shown as error
pulsar.brk_ml_cursor_nonContiguousDeletedMessagesRange
(gauge)
The number of non-contiguous deleted messages ranges.
pulsar.brk_ml_cursor_writeLedgerSize
(gauge)
The size of write to ledger.
pulsar.brk_ml_cursor_writeLedgerLogicalSize
(gauge)
The size of write to ledger (accounting for without replicas).
pulsar.brk_ml_cursor_readLedgerSize
(gauge)
The size of read from ledger.
pulsar.lb_bandwidth_in_usage
(gauge)
The broker inbound bandwith usage (in percent).
pulsar.lb_bandwidth_out_usage
(gauge)
The broker outbound bandwith usage (in percent).
pulsar.lb_cpu_usage
(gauge)
The broker cpu usage (in percent).
pulsar.lb_directMemory_usage
(gauge)
The broker process direct memory usage (in percent).
pulsar.lb_memory_usage
(gauge)
The broker process memory usage (in percent).
pulsar.lb_unload_broker_count.count
(count)
Unload broker count in this bundle unloading
pulsar.lb_unload_bundle_count.count
(count)
Bundle unload count in this bundle unloading
pulsar.lb_bundles_split_count.count
(count)
bundle split count in this bundle splitting check interval
pulsar.bundle_msg_rate_in
(gauge)
The total message rate coming into the topics in this bundle (messages/second).
pulsar.bundle_msg_rate_out
(gauge)
The total message rate going out from the topics in this bundle (messages/second).
pulsar.bundle_topics_count
(gauge)
The topic count in this bundle.
pulsar.bundle_consumer_count
(gauge)
The consumer count of the topics in this bundle.
pulsar.bundle_producer_count
(gauge)
The producer count of the topics in this bundle.
pulsar.bundle_msg_throughput_in
(gauge)
The total throughput coming into the topics in this bundle (bytes/second).
pulsar.bundle_msg_throughput_out
(gauge)
The total throughput going out from the topics in this bundle (bytes/second).
pulsar.subscription_back_log
(gauge)
The total backlog of a subscription (messages).
Shown as message
pulsar.subscription_msg_rate_redeliver
(gauge)
The total message rate for message being redelivered (messages/second).
pulsar.subscription_unacked_messages
(gauge)
The total number of unacknowledged messages of a subscription (messages).
Shown as message
pulsar.subscription_blocked_on_unacked_messages
(gauge)
Indicate whether a subscription is blocked on unacknowledged messages or not. 1 means the subscription is blocked on waiting unacknowledged messages to be acked.0 means the subscription is not blocked on waiting unacknowledged messages to be acked.
pulsar.subscription_msg_rate_out
(gauge)
The total message dispatch rate for a subscription (messages/second).
pulsar.subscription_msg_throughput_out
(gauge)
The total message dispatch throughput for a subscription (bytes/second).
pulsar.consumer_msg_rate_redeliver
(gauge)
The total message rate for message being redelivered (messages/second).
pulsar.consumer_unacked_messages
(gauge)
The total number of unacknowledged messages of a consumer (messages).
Shown as message
pulsar.consumer_blocked_on_unacked_messages
(gauge)
Indicate whether a consumer is blocked on unacknowledged messages or not. 1 means the consumer is blocked on waiting unacknowledged messages to be acked.0 means the consumer is not blocked on waiting unacknowledged messages to be acked.
pulsar.consumer_msg_rate_out
(gauge)
The total message dispatch rate for a consumer (messages/second).
pulsar.consumer_msg_throughput_out
(gauge)
The total message dispatch throughput for a consumer (bytes/second).
pulsar.consumer_available_permits
(gauge)
The available permits for for a consumer.
pulsar.expired_token_count.count
(count)
The number of expired tokens in Pulsar.
pulsar.authentication_success_count.count
(count)
The number of successful authentication operations.
pulsar.authentication_failures_count.count
(count)
The number of failing authentication operations.
pulsar.active_connections
(gauge)
The number of active connections.
Shown as connection
pulsar.connection_created_total_count
(gauge)
The total number of connections.
Shown as connection
pulsar.connection_create_success_count
(gauge)
The number of successfully created connections.
Shown as connection
pulsar.connection_create_fail_count
(gauge)
The number of failed connections.
Shown as connection
pulsar.connection_closed_total_count
(gauge)
The total number of closed connections.
Shown as connection
pulsar.broker_throttled_connections
(gauge)
The number of throttled connections.
Shown as connection
pulsar.broker_throttled_connections_global_limit
(gauge)
The number of throttled connections because of per-connection limit.
Shown as connection
pulsar.jetty_requests_total.count
(count)
Number of requests.
Shown as request
pulsar.jetty_requests_active
(gauge)
Number of requests currently active.
Shown as request
pulsar.jetty_requests_active_max
(gauge)
Maximum number of requests that have been active at once.
Shown as request
pulsar.jetty_request_time_max_seconds
(gauge)
Maximum time spent handling requests.
Shown as second
pulsar.jetty_request_time_seconds_total.count
(count)
Total time spent in all request handling.
Shown as second
pulsar.jetty_dispatched_total.count
(count)
Number of dispatches.
pulsar.jetty_dispatched_active
(gauge)
Number of dispatches currently active.
pulsar.jetty_dispatched_active_max
(gauge)
Maximum number of active dispatches being handled.
pulsar.jetty_dispatched_time_max
(gauge)
Maximum time spent in dispatch handling.
pulsar.jetty_dispatched_time_seconds_total.count
(count)
Total time spent in dispatch handling.
Shown as second
pulsar.jetty_async_requests_total.count
(count)
Total number of async requests.
Shown as request
pulsar.jetty_async_requests_waiting
(gauge)
Currently waiting async requests.
Shown as request
pulsar.jetty_async_requests_waiting_max
(gauge)
Maximum number of waiting async requests.
Shown as request
pulsar.jetty_async_dispatches_total.count
(count)
Number of requested that have been asynchronously dispatched.
pulsar.jetty_expires_total.count
(count)
Number of async requests requests that have expired.
Shown as request
pulsar.jetty_responses_total.count
(count)
Number of responses, labeled by status code. The code label can be "1xx", "2xx", "3xx", "4xx", or "5xx".
Shown as response
pulsar.jetty_stats_seconds
(gauge)
Time in seconds stats have been collected for.
Shown as second
pulsar.jetty_responses_bytes_total.count
(count)
Total number of bytes across all responses.
Shown as byte
pulsar.function_processed_successfully_total.count
(count)
The total number of messages processed successfully.
Shown as message
pulsar.function_processed_successfully_total_1min.count
(count)
The total number of messages processed successfully in the last 1 minute.
Shown as message
pulsar.function_system_exceptions_total.count
(count)
The total number of system exceptions.
pulsar.function_system_exceptions_total_1min.count
(count)
The total number of system exceptions in the last 1 minute.
pulsar.function_user_exceptions_total.count
(count)
The total number of user exceptions.
pulsar.function_user_exceptions_total_1min.count
(count)
The total number of user exceptions in the last 1 minute.
pulsar.function_last_invocation
(gauge)
The timestamp of the last invocation of the function.
pulsar.function_received_total.count
(count)
The total number of messages received from source.
Shown as message
pulsar.function_received_total_1min.count
(count)
The total number of messages received from source in the last 1 minute.
Shown as message
pulsar.source_written_total.count
(count)
The total number of records written to a Pulsar topic.
pulsar.source_written_total_1min.count
(count)
The total number of records written to a Pulsar topic in the last 1 minute.
pulsar.source_received_total.count
(count)
The total number of records received from source.
pulsar.source_received_total_1min.count
(count)
The total number of records received from source in the last 1 minute.
pulsar.source_last_invocation
(gauge)
The timestamp of the last invocation of the source.
pulsar.source_source_exception
(gauge)
The exception from a source.
pulsar.source_source_exceptions_total.count
(count)
The total number of source exceptions.
pulsar.source_source_exceptions_total_1min.count
(count)
The total number of source exceptions in the last 1 minute.
pulsar.source_system_exception
(gauge)
The exception from system code.
pulsar.source_system_exceptions_total.count
(count)
The total number of system exceptions.
pulsar.source_system_exceptions_total_1min.count
(count)
The total number of system exceptions in the last 1 minute.
pulsar.sink_written_total.count
(count)
The total number of records processed by a sink.
pulsar.sink_written_total_1min.count
(count)
The total number of records processed by a sink in the last 1 minute.
pulsar.sink_received_total_1min.count
(count)
The total number of messages that a sink has received from Pulsar topics in the last 1 minute.
Shown as message
pulsar.sink_received_total.count
(count)
The total number of records that a sink has received from Pulsar topics.
pulsar.sink_last_invocation
(gauge)
The timestamp of the last invocation of the sink.
pulsar.sink_sink_exception
(gauge)
The exception from a sink.
pulsar.sink_sink_exceptions_total.count
(count)
The total number of sink exceptions.
pulsar.sink_sink_exceptions_total_1min.count
(count)
The total number of sink exceptions in the last 1 minute.
pulsar.sink_system_exception
(gauge)
The exception from system code.
pulsar.sink_system_exceptions_total.count
(count)
The total number of system exceptions.
pulsar.sink_system_exceptions_total_1min.count
(count)
The total number of system exceptions in the last 1 minute.
pulsar.proxy_active_connections
(gauge)
Number of connections currently active in the proxy.
Shown as connection
pulsar.proxy_new_connections.count
(count)
Counter of connections being opened in the proxy.
Shown as connection
pulsar.proxy_rejected_connections.count
(count)
Counter for connections rejected due to throttling.
Shown as connection
pulsar.proxy_binary_ops.count
(count)
Counter of proxy operations.
pulsar.proxy_binary_bytes.count
(count)
Counter of proxy bytes.
Shown as byte
pulsar.split_bytes_read.count
(count)
Number of bytes read from BookKeeper.
Shown as byte
pulsar.split_num_messages_deserialized.count
(count)
Number of messages deserialized.
Shown as message
pulsar.split_num_record_deserialized.count
(count)
Number of records deserialized.
pulsar.txn_active_count
(gauge)
Number of active transactions.
Shown as transaction
pulsar.txn_created_count.count
(count)
Number of created transactions.
Shown as transaction
pulsar.txn_committed_count.count
(count)
Number of committed transactions.
Shown as transaction
pulsar.txn_aborted_count.count
(count)
Number of aborted transactions of this coordinator.
Shown as transaction
pulsar.txn_timeout_count.count
(count)
Number of timeout transactions.
Shown as transaction
pulsar.txn_append_log_count.count
(count)
Number of append transaction logs.

Log collection

  1. The Pulsar log integration supports Pulsar’s default log format. Clone and edit the integration pipeline if you have a different format.

  2. Collecting logs is disabled by default in the Datadog Agent. Enable it in your datadog.yaml file:

    logs_enabled: true
    
  3. Uncomment and edit the logs configuration block in your pulsar.d/conf.yaml file. Change the path parameter value based on your environment. See the sample pulsar.d/conf.yaml for all available configuration options.

     logs:
       - type: file
         path: /pulsar/logs/pulsar.log
         source: pulsar
    
  4. Restart the Agent

Events

The Pulsar integration does not include any events.

Service Checks

pulsar.openmetrics.health
Returns CRITICAL if the Agent is unable to connect to the OpenMetrics endpoint, otherwise returns OK.
Statuses: ok, critical

Troubleshooting

Need help? Contact Datadog support.