---
title: Kafka Consumer
description: Collect metrics for Kafka consumers.
breadcrumbs: Docs > Integrations > Kafka Consumer
---

> For the complete documentation index, see [llms.txt](https://docs.datadoghq.com/llms.txt).

# Kafka Consumer
Supported OS Integration version8.1.0
{% callout %}
# Important note for users on the following Datadog sites: us2.ddog-gov.com

{% alert level="info" %}
To find out if this integration is available in your organization, see your [Datadog Integrations](https://app.datadoghq.com/integrations) page or ask your organization administrator.

To initiate an exception request to enable this integration for your organization, email [support@ddog-gov.com](mailto:support@ddog-gov.com).
{% /alert %}

{% /callout %}


## Overview{% #overview %}

This Agent integration collects message offset metrics from your Kafka consumers. This check fetches the highwater offsets from the Kafka brokers, consumer offsets that are stored in Kafka (or Zookeeper for old-style consumers), and then calculates consumer lag (which is the difference between the broker offset and the consumer offset).

**Note:**

- This integration ensures that consumer offsets are checked before broker offsets; in the worst case, consumer lag may be a little overstated. Checking these offsets in the reverse order can understate consumer lag to the point of having negative values, which is a dire scenario usually indicating messages are being skipped.
- If you want to collect JMX metrics from your Kafka brokers or Java-based consumers/producers, see the [Kafka Broker integration](https://app.datadoghq.com/integrations/kafka?search=kafka).

**Minimum Agent version:** 6.0.0

## Setup{% #setup %}

### Installation{% #installation %}

The Agent's Kafka consumer check is included in the [Datadog Agent](https://app.datadoghq.com/account/settings/agent/latest) package. No additional installation is needed on your Kafka nodes.

### Configuration{% #configuration %}

{% tab title="Containerized" %}
#### Containerized{% #containerized %}

Configure this check on a container running the Kafka Consumer. See the [Autodiscovery Integration Templates](https://docs.datadoghq.com/containers/kubernetes/integrations.md) for guidance on applying the parameters below. In Kubernetes, if a single consumer is running on many containers, you can set up this check as a [Cluster Check](https://app.datadoghq.com/containers/cluster_agent/clusterchecks/) to avoid having multiple checks collecting the same metrics.

| Parameter            | Value                                                                                                                                                                                       |
| -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `<INTEGRATION_NAME>` | `kafka_consumer`                                                                                                                                                                            |
| `<INIT_CONFIG>`      | blank or `{}`                                                                                                                                                                               |
| `<INSTANCE_CONFIG>`  | `{"kafka_connect_str": "<KAFKA_CONNECT_STR>", "consumer_groups": {"<CONSUMER_NAME>": {}}}`For example, `{"kafka_connect_str": "server:9092", "consumer_groups": {"my_consumer_group": {}}}` |

{% /tab %}

{% tab title="Host" %}
Configure this check on a host running the Kafka Consumer. Avoid having multiple Agents running with the same check configuration, as this puts additional pressure on your Kafka cluster.

1. Edit the `kafka_consumer.d/conf.yaml` file, in the `conf.d/` folder at the root of your [Agent's configuration directory](https://docs.datadoghq.com/agent/guide/agent-configuration-files.md#agent-configuration-directory). See the [sample kafka_consumer.d/conf.yaml](https://github.com/DataDog/integrations-core/blob/master/kafka_consumer/datadog_checks/kafka_consumer/data/conf.yaml.example) for all available configuration options. A minimal setup is:

```
instances:
  - kafka_connect_str: <KAFKA_CONNECT_STR>
    consumer_groups:
      # Monitor all topics for consumer <CONSUMER_NAME>
      <CONSUMER_NAME>: {}
```
[Restart the Agent](https://docs.datadoghq.com/agent/guide/agent-commands.md#start-stop-and-restart-the-agent).
{% /tab %}

### Cluster Monitoring (Preview){% #cluster-monitoring-preview %}

When `enable_cluster_monitoring` is enabled, the integration collects cluster-wide metrics for [Data Streams Monitoring](https://app.datadoghq.com/data-streams) in addition to consumer lag:

- **Brokers**: Configuration and health metrics.
- **Topics and partitions**: Sizes, offsets, and replication status.
- **Consumer groups**: Member details, group state, rebalance detection, membership-change counting, and metadata exposed as tags (`partition_assignor`, `consumer_group_type`, `is_simple_consumer_group`, and `group_instance_id`). Empty groups are visible through the `consumer_group_state:EMPTY` tag on `kafka.consumer_group.members`.
- **Schema registry**: Schema metadata (requires `schema_registry_url`).

#### Batched collection{% #batched-collection %}

Broker configurations, topic configurations, and schema registry version checks are collected in batches across multiple agent runs rather than all at once. This reduces load on large Kafka clusters but means that not all metrics are emitted in every check run. On a cluster with many brokers, topics, or schema subjects, the integration spreads the work over successive runs so that each run stays fast and does not overload the cluster.

Example configuration:

```yaml
instances:
  - kafka_connect_str: localhost:9092
    enable_cluster_monitoring: true
    schema_registry_url: http://localhost:8081  # optional
```

### Kafka ACL Permissions{% #kafka-acl-permissions %}

**Cluster** (`kafka-cluster`)

- DESCRIBE
- DESCRIBE_CONFIGS (cluster monitoring only)

**Topic** (`*`)

- DESCRIBE
- DESCRIBE_CONFIGS (cluster monitoring only)
- READ, WRITE ([Kafka messages](https://app.datadoghq.com/data_streams/messages/) only)

**Consumer group** (`*`)

- DESCRIBE
- READ

### Validation{% #validation %}

1. [Run the Agent's status subcommand](https://docs.datadoghq.com/agent/guide/agent-commands.md#agent-status-and-information) and look for `kafka_consumer` under the Checks section.
1. Ensure the metric `kafka.consumer_lag` is generated for the appropriate `consumer_group`.

## Data Collected{% #data-collected %}

### Metrics{% #metrics %}

|  |
|  |
| **kafka.broker.config.default\_replication\_factor**(gauge) | Broker configuration for default replication factor. (DSM only)*Shown as item*                                                                      |
| **kafka.broker.config.log\_retention\_bytes**(gauge)        | Broker configuration for log retention in bytes. (DSM only)*Shown as byte*                                                                          |
| **kafka.broker.config.log\_retention\_ms**(gauge)           | Broker configuration for log retention in milliseconds. (DSM only)*Shown as millisecond*                                                            |
| **kafka.broker.config.log\_segment\_bytes**(gauge)          | Broker configuration for log segment size in bytes. (DSM only)*Shown as byte*                                                                       |
| **kafka.broker.config.min\_insync\_replicas**(gauge)        | Broker configuration for minimum in-sync replicas. (DSM only)*Shown as item*                                                                        |
| **kafka.broker.config.num\_io\_threads**(gauge)             | Broker configuration for number of I/O threads. (DSM only)*Shown as thread*                                                                         |
| **kafka.broker.config.num\_network\_threads**(gauge)        | Broker configuration for number of network threads. (DSM only)*Shown as thread*                                                                     |
| **kafka.broker.config.num\_partitions**(gauge)              | Broker configuration for default number of partitions. (DSM only)*Shown as item*                                                                    |
| **kafka.broker.count**(gauge)                               | Total number of brokers in the cluster. (DSM only)*Shown as instance*                                                                               |
| **kafka.broker.leader\_count**(gauge)                       | Number of partitions for which this broker is the leader. (DSM only)*Shown as item*                                                                 |
| **kafka.broker.partition\_count**(gauge)                    | Total number of partitions on this broker including replicas. (DSM only)*Shown as item*                                                             |
| **kafka.broker\_offset**(gauge)                             | Current message offset on broker.*Shown as offset*                                                                                                  |
| **kafka.cluster.controller\_id**(gauge)                     | ID of the broker acting as the cluster controller. (DSM only)*Shown as instance*                                                                    |
| **kafka.connector.count**(gauge)                            | Total number of Kafka Connect connectors. (DSM only)*Shown as item*                                                                                 |
| **kafka.connector.task.count**(gauge)                       | Number of tasks configured for this connector. (DSM only)*Shown as item*                                                                            |
| **kafka.connector.task.running**(gauge)                     | Whether this connector task is running (1) or not (0). (DSM only)                                                                                   |
| **kafka.connector.tasks**(gauge)                            | Number of tasks for this connector in the given state (tag: `task_state`). (DSM only)*Shown as item*                                                |
| **kafka.consumer\_group.count**(gauge)                      | Total number of consumer groups. (DSM only)*Shown as item*                                                                                          |
| **kafka.consumer\_group.member.partitions**(gauge)          | Number of partitions assigned to this consumer group member. (DSM only)*Shown as item*                                                              |
| **kafka.consumer\_group.members**(gauge)                    | Number of members in the consumer group. (DSM only)*Shown as item*                                                                                  |
| **kafka.consumer\_group.membership\_changes**(count)        | Number of times the consumer group membership changed between check runs. (DSM only)                                                                |
| **kafka.consumer\_group.rebalancing**(gauge)                | Whether the consumer group is rebalancing (1) or stable (0). (DSM only)                                                                             |
| **kafka.consumer\_lag**(gauge)                              | Lag in messages between consumer and broker.*Shown as message*                                                                                      |
| **kafka.consumer\_offset**(gauge)                           | Current message offset on consumer.*Shown as offset*                                                                                                |
| **kafka.estimated\_consumer\_lag**(gauge)                   | Lag in seconds between consumer and broker. This metric is provided through Data Streams Monitoring. Additional charges may apply.*Shown as second* |
| **kafka.partition.beginning\_offset**(gauge)                | The earliest offset in the partition. (DSM only)*Shown as offset*                                                                                   |
| **kafka.partition.isr**(gauge)                              | Number of in-sync replicas for this partition. (DSM only)*Shown as item*                                                                            |
| **kafka.partition.offline**(gauge)                          | Whether this partition is offline (1) or not (0). (DSM only)                                                                                        |
| **kafka.partition.replicas**(gauge)                         | Number of replicas for this partition. (DSM only)*Shown as item*                                                                                    |
| **kafka.partition.size**(gauge)                             | Number of messages in the partition. (DSM only)*Shown as message*                                                                                   |
| **kafka.partition.under\_replicated**(gauge)                | Whether this partition is under-replicated (1) or not (0). (DSM only)                                                                               |
| **kafka.schema\_registry.subjects**(gauge)                  | Total number of schema subjects in the registry. (DSM only)*Shown as item*                                                                          |
| **kafka.topic.config.max\_message\_bytes**(gauge)           | Topic configuration for maximum message size in bytes. (DSM only)*Shown as byte*                                                                    |
| **kafka.topic.config.retention\_bytes**(gauge)              | Topic configuration for retention size in bytes. (DSM only)*Shown as byte*                                                                          |
| **kafka.topic.config.retention\_ms**(gauge)                 | Topic configuration for retention time in milliseconds. (DSM only)*Shown as millisecond*                                                            |
| **kafka.topic.count**(gauge)                                | Total number of topics in the cluster. (DSM only)*Shown as item*                                                                                    |
| **kafka.topic.message\_rate**(gauge)                        | Message production rate for this topic. (DSM only)*Shown as message*                                                                                |
| **kafka.topic.partitions**(gauge)                           | Number of partitions for this topic. (DSM only)*Shown as item*                                                                                      |
| **kafka.topic.size**(gauge)                                 | Total number of messages in the topic. (DSM only)*Shown as message*                                                                                 |

### Kafka messages{% #kafka-messages %}

This integration is used by [Data Streams Monitoring](https://app.datadoghq.com/data-streams) to [retrieve messages from Kafka on demand](https://app.datadoghq.com/data_streams/messages/).

### Events{% #events %}

**consumer\_lag**: The Datadog Agent emits an event when the value of the `consumer_lag` metric goes below 0, tagging it with `topic`, `partition` and `consumer_group`.

### Service Checks{% #service-checks %}

The Kafka-consumer check does not include any service checks.

## Troubleshooting{% #troubleshooting %}

- [Troubleshooting and Deep Dive for Kafka](https://docs.datadoghq.com/integrations/faq/troubleshooting-and-deep-dive-for-kafka.md)
- [Agent failed to retrieve RMIServer stub](https://docs.datadoghq.com/integrations/guide/agent-failed-to-retrieve-rmiserver-stub.md)

### Kerberos GSSAPI Authentication{% #kerberos-gssapi-authentication %}

Depending on your Kafka cluster's Kerberos setup, you may need to configure the following:

- Kafka client configured for the Datadog Agent to connect to the Kafka broker. The Kafka client should be added as a Kerberos principal and added to a Kerberos keytab. The Kafka client should also have a valid kerberos ticket.
- TLS certificate to authenticate a secure connection to the Kafka broker.
  - If JKS keystore is used, a certificate needs to be exported from the keystore and the file path should be configured with the applicable `tls_cert` and/or `tls_ca_cert` options.
  - If a private key is required to authenticate the certificate, it should be configured with the `tls_private_key` option. If applicable, the private key password should be configured with the `tls_private_key_password`.
- `KRB5_CLIENT_KTNAME` environment variable pointing to the Kafka client's Kerberos keytab location if it differs from the default path (for example, `KRB5_CLIENT_KTNAME=/etc/krb5.keytab`)
- `KRB5CCNAME` environment variable pointing to the Kafka client's Kerberos credentials ticket cache if it differs from the default path (for example, `KRB5CCNAME=/tmp/krb5cc_xxx`)
- If the Datadog Agent is unable to access the environment variables, configure the environment variables in a Datadog Agent service configuration override file for your operating system. The procedure for modifying the Datadog Agent service unit file may vary for different Linux operating systems. For example, in a Linux `systemd` environment:

### Linux Systemd Example{% #linux-systemd-example %}

1. Configure the environment variables in an environment file. For example: `/path/to/environment/file`

   ```
   KRB5_CLIENT_KTNAME=/etc/krb5.keytab
   KRB5CCNAME=/tmp/krb5cc_xxx
   ```

1. Create a Datadog Agent service configuration override file: `sudo systemctl edit datadog-agent.service`

1. Configure the following in the override file:

   ```
   [Service]
   EnvironmentFile=/path/to/environment/file
   ```

1. Run the following commands to reload the systemd daemon, datadog-agent service, and Datadog Agent:

   ```gdscript3
   sudo systemctl daemon-reload
   sudo systemctl restart datadog-agent.service
   sudo service datadog-agent restart
   ```

## Further Reading{% #further-reading %}

- [Monitoring Kafka performance metrics](https://www.datadoghq.com/blog/monitoring-kafka-performance-metrics)
- [Collecting Kafka performance metrics](https://www.datadoghq.com/blog/collecting-kafka-performance-metrics)
- [Monitoring Kafka with Datadog](https://www.datadoghq.com/blog/monitor-kafka-with-datadog)