---
title: Hudi
description: Track metrics for your Hudi configuration.
breadcrumbs: Docs > Integrations > Hudi
---

# Hudi
Supported OS Integration version4.3.0
## Overview{% #overview %}

This check monitors [Hudi](https://hudi.apache.org/). It is compatible with Hudi [versions](https://github.com/apache/hudi/releases) `0.10.0` and above.

**Minimum Agent version:** 7.32.0

## Setup{% #setup %}

### Installation{% #installation %}

The Hudi check is included in the [Datadog Agent](https://app.datadoghq.com/account/settings/agent/latest) package. No additional installation is needed on your server.

### Configuration{% #configuration %}

1. [Configure](https://hudi.apache.org/docs/configurations#Metrics-Configurations) the [JMX Metrics Reporter](https://hudi.apache.org/docs/metrics/#jmxmetricsreporter) in Hudi:

   ```
   hoodie.metrics.on=true
   hoodie.metrics.reporter.type=JMX
   hoodie.metrics.jmx.host=<JMX_HOST>
   hoodie.metrics.jmx.port=<JMX_PORT>
   ```

1. Edit the `hudi.d/conf.yaml` file, in the `conf.d/` folder at the root of your Agent's configuration directory to start collecting your hudi performance data. See the [sample hudi.d/conf.yaml](https://github.com/DataDog/integrations-core/blob/master/hudi/datadog_checks/hudi/data/conf.yaml.example) for all available configuration options.

This check has a limit of 350 metrics per instance. The number of returned metrics is indicated when running the Datadog Agent [status command](https://github.com/DataDog/integrations-core/blob/master/hudi/assets/service_checks.json). You can specify the metrics you are interested in by editing the [configuration](https://github.com/DataDog/integrations-core/blob/master/hudi/datadog_checks/hudi/data/conf.yaml.example). To learn how to customize the metrics to collect see the [JMX Checks documentation](https://docs.datadoghq.com/integrations/java/) for more detailed instructions. If you need to monitor more metrics, contact [Datadog support](https://docs.datadoghq.com/help/).

1. [Restart the Agent](https://docs.datadoghq.com/agent/guide/agent-commands/#start-stop-and-restart-the-agent)

### Validation{% #validation %}

[Run the Agent's `status` subcommand](https://docs.datadoghq.com/agent/guide/agent-commands/#agent-status-and-information) and look for `hudi` under the Checks section.

## Data Collected{% #data-collected %}

### Metrics{% #metrics %}

|  |
|  |
| **hudi.action.bytes\_written**(rate)              | The total amount of bytes written in an action (commit, deltacommit, replacecommit, compaction, etc)*Shown as byte*                                            |
| **hudi.action.commit\_time**(gauge)               | The commit time of an action (commit, deltacommit, replacecommit, compaction, etc)*Shown as millisecond*                                                       |
| **hudi.action.compacted\_records\_updated**(rate) | The amount of compacted records updated in an action (commit, deltacommit, replacecommit, compaction, etc)*Shown as record*                                    |
| **hudi.action.create\_time**(rate)                | The creation time of an action (commit, deltacommit, replacecommit, compaction, etc)*Shown as millisecond*                                                     |
| **hudi.action.duration**(gauge)                   | The amount of time it took to successfully perform an action on a batch of records (commit, deltacommit, replacecommit, compaction, etc)*Shown as millisecond* |
| **hudi.action.files\_inserted**(rate)             | The amount of files inserted (commit, deltacommit, replacecommit, compaction, etc)*Shown as file*                                                              |
| **hudi.action.files\_updated**(rate)              | The amount of files updated (commit, deltacommit, replacecommit, compaction, etc)*Shown as file*                                                               |
| **hudi.action.insert\_records\_written**(rate)    | The number of insert records written in an action (commit, deltacommit, replacecommit, compaction, etc)*Shown as record*                                       |
| **hudi.action.log\_files\_compacted**(rate)       | The number of log files compacted in an action (commit, deltacommit, replacecommit, compaction, etc)*Shown as file*                                            |
| **hudi.action.log\_files\_size**(rate)            | The size of all the log files in an action (commit, deltacommit, replacecommit, compaction, etc)*Shown as byte*                                                |
| **hudi.action.partitions\_written**(rate)         | The number of partitions written in an action (commit, deltacommit, replacecommit, compaction, etc)                                                            |
| **hudi.action.records\_written**(rate)            | The number of records written in an action (commit, deltacommit, replacecommit, compaction, etc)*Shown as record*                                              |
| **hudi.action.scan\_time**(rate)                  | The total time spent scanned in an action (commit, deltacommit, replacecommit, compaction, etc)*Shown as millisecond*                                          |
| **hudi.action.time.50th\_percentile**(gauge)      | Measures 50th percentile of time to complete the action (commit, deltacommit, replacecommit, compaction, etc)*Shown as nanosecond*                             |
| **hudi.action.time.75th\_percentile**(gauge)      | Measures 75th percentile of time to complete an action (commit, deltacommit, replacecommit, compaction, etc)*Shown as nanosecond*                              |
| **hudi.action.time.95th\_percentile**(gauge)      | Measures 95th percentile of time to complete an action (commit, deltacommit, replacecommit, compaction, etc)*Shown as nanosecond*                              |
| **hudi.action.time.98th\_percentile**(gauge)      | Measures 98th percentile of time to complete an action (commit, deltacommit, replacecommit, compaction, etc)*Shown as nanosecond*                              |
| **hudi.action.time.999th\_percentile**(gauge)     | Measures 999th percentile of time to complete an action (commit, deltacommit, replacecommit, compaction, etc)*Shown as nanosecond*                             |
| **hudi.action.time.99th\_percentile**(gauge)      | Measures 99th percentile of time to complete an action (commit, deltacommit, replacecommit, compaction, etc)*Shown as nanosecond*                              |
| **hudi.action.time.count**(rate)                  | Measures count of times to complete an action (commit, deltacommit, replacecommit, compaction, etc)                                                            |
| **hudi.action.time.max**(gauge)                   | Measures maximum amount of time to complete an action (commit, deltacommit, replacecommit, compaction, etc)*Shown as nanosecond*                               |
| **hudi.action.time.mean**(gauge)                  | Measures mean amount of time to complete an action (commit, deltacommit, replacecommit, compaction, etc)*Shown as nanosecond*                                  |
| **hudi.action.time.min**(gauge)                   | Measures minimum amount of time to complete an action (commit, deltacommit, replacecommit, compaction, etc)*Shown as nanosecond*                               |
| **hudi.action.time.std\_dev**(gauge)              | Measures standard deviation of time to complete an action (commit, deltacommit, replacecommit, compaction, etc)*Shown as nanosecond*                           |
| **hudi.action.update\_records\_written**(rate)    | The amount of update records written in an action (commit, deltacommit, replacecommit, compaction, etc)*Shown as record*                                       |
| **hudi.action.upsert\_time**(rate)                | The upsert time of an action (commit, deltacommit, replacecommit, compaction, etc)*Shown as millisecond*                                                       |
| **hudi.clean.duration**(gauge)                    | The total time spent cleaning*Shown as millisecond*                                                                                                            |
| **hudi.clean.files\_deleted**(gauge)              | The number of files deleted in cleans*Shown as file*                                                                                                           |
| **hudi.finalize.duration**(gauge)                 | The total time spent finalizing*Shown as millisecond*                                                                                                          |
| **hudi.finalize.files\_finalized**(gauge)         | The number of files finalized"*Shown as file*                                                                                                                  |
| **hudi.index.command.duration**(gauge)            | The time spent performing an index command (UPSERT, INSERT_OVERWRITE, etc)*Shown as millisecond*                                                               |
| **hudi.rollback.duration**(gauge)                 | The total time spent in rollback*Shown as millisecond*                                                                                                         |
| **hudi.rollback.files\_deleted**(gauge)           | The number of files deleted in rollback*Shown as file*                                                                                                         |

### Log collection{% #log-collection %}

*Available for Agent versions >6.0*

1. Hudi uses the `log4j` logger by default. To customize the format, edit the `log4j.properties` file in either your [Flink](https://github.com/apache/flink/tree/release-1.11.4/flink-dist/src/main/flink-bin/conf) or [Spark](https://github.com/apache/spark/tree/v3.1.2/conf) `conf` directory. An example `log4j.properties` file is:

   ```gdscript3
    log4j.rootCategory=INFO, file
    log4j.appender.file=org.apache.log4j.FileAppender
    log4j.appender.file.File=/var/log/hudi.log
    log4j.appender.file.append=false
    log4j.appender.file.layout=org.apache.log4j.PatternLayout
    log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n
   ```

1. By default, Datadog's integration pipeline supports the following conversion pattern:

   ```text
   %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n
   ```

An example of a valid timestamp is: `2020-02-03 18:43:12,251`.

Clone and edit the [integration pipeline](https://docs.datadoghq.com/logs/processing/#integration-pipelines) if you have a different format.

1. Collecting logs is disabled by default in the Datadog Agent, enable it in your `datadog.yaml` file:

   ```yaml
   logs_enabled: true
   ```

1. Uncomment and edit the logs configuration block in your `hudi.d/conf.yaml` file. Change the `path` and `service` parameter values based on your environment. See the [sample hudi.d/conf.yaml](https://github.com/DataDog/integrations-core/blob/master/hudi/datadog_checks/hudi/data/conf.yaml.example) for all available configuration options.

   ```yaml
   logs:
     - type: file
       path: /var/log/hudi.log
       source: hudi
       log_processing_rules:
         - type: multi_line
           pattern: \d{4}\-(0?[1-9]|1[012])\-(0?[1-9]|[12][0-9]|3[01])
           name: new_log_start_with_date
   ```

### Events{% #events %}

The Hudi integration does not include any events.

### Service Checks{% #service-checks %}

**hudi.can\_connect**

Returns `CRITICAL` if the Agent is unable to connect to and collect metrics from the monitored Hudi instance, `WARNING` if no metrics are collected, and `OK` otherwise.

*Statuses: ok, critical, warning*

## Troubleshooting{% #troubleshooting %}

Need help? Contact [Datadog support](https://docs.datadoghq.com/help/).
