Hudi

Supported OS Linux Windows Mac OS

Integration version4.0.0

Overview

This check monitors Hudi. It is compatible with Hudi versions 0.10.0 and above.

Setup

Installation

The Hudi check is included in the Datadog Agent package. No additional installation is needed on your server.

Configuration

  1. Configure the JMX Metrics Reporter in Hudi:

    hoodie.metrics.on=true
    hoodie.metrics.reporter.type=JMX
    hoodie.metrics.jmx.host=<JMX_HOST>
    hoodie.metrics.jmx.port=<JMX_PORT>
    
  2. Edit the hudi.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting your hudi performance data. See the sample hudi.d/conf.yaml for all available configuration options.

    This check has a limit of 350 metrics per instance. The number of returned metrics is indicated when running the Datadog Agent status command. You can specify the metrics you are interested in by editing the configuration. To learn how to customize the metrics to collect see the JMX Checks documentation for more detailed instructions. If you need to monitor more metrics, contact Datadog support.

  3. Restart the Agent

Validation

Run the Agent’s status subcommand and look for hudi under the Checks section.

Data Collected

Metrics

hudi.action.bytes_written
(rate)
The total amount of bytes written in an action (commit, deltacommit, replacecommit, compaction, etc)
Shown as byte
hudi.action.commit_time
(gauge)
The commit time of an action (commit, deltacommit, replacecommit, compaction, etc)
Shown as millisecond
hudi.action.compacted_records_updated
(rate)
The amount of compacted records updated in an action (commit, deltacommit, replacecommit, compaction, etc)
Shown as record
hudi.action.create_time
(rate)
The creation time of an action (commit, deltacommit, replacecommit, compaction, etc)
Shown as millisecond
hudi.action.duration
(gauge)
The amount of time it took to successfully perform an action on a batch of records (commit, deltacommit, replacecommit, compaction, etc)
Shown as millisecond
hudi.action.files_inserted
(rate)
The amount of files inserted (commit, deltacommit, replacecommit, compaction, etc)
Shown as file
hudi.action.files_updated
(rate)
The amount of files updated (commit, deltacommit, replacecommit, compaction, etc)
Shown as file
hudi.action.insert_records_written
(rate)
The number of insert records written in an action (commit, deltacommit, replacecommit, compaction, etc)
Shown as record
hudi.action.log_files_compacted
(rate)
The number of log files compacted in an action (commit, deltacommit, replacecommit, compaction, etc)
Shown as file
hudi.action.log_files_size
(rate)
The size of all the log files in an action (commit, deltacommit, replacecommit, compaction, etc)
Shown as byte
hudi.action.partitions_written
(rate)
The number of partitions written in an action (commit, deltacommit, replacecommit, compaction, etc)
hudi.action.records_written
(rate)
The number of records written in an action (commit, deltacommit, replacecommit, compaction, etc)
Shown as record
hudi.action.scan_time
(rate)
The total time spent scanned in an action (commit, deltacommit, replacecommit, compaction, etc)
Shown as millisecond
hudi.action.time.50th_percentile
(gauge)
Measures 50th percentile of time to complete the action (commit, deltacommit, replacecommit, compaction, etc)
Shown as nanosecond
hudi.action.time.75th_percentile
(gauge)
Measures 75th percentile of time to complete an action (commit, deltacommit, replacecommit, compaction, etc)
Shown as nanosecond
hudi.action.time.95th_percentile
(gauge)
Measures 95th percentile of time to complete an action (commit, deltacommit, replacecommit, compaction, etc)
Shown as nanosecond
hudi.action.time.98th_percentile
(gauge)
Measures 98th percentile of time to complete an action (commit, deltacommit, replacecommit, compaction, etc)
Shown as nanosecond
hudi.action.time.999th_percentile
(gauge)
Measures 999th percentile of time to complete an action (commit, deltacommit, replacecommit, compaction, etc)
Shown as nanosecond
hudi.action.time.99th_percentile
(gauge)
Measures 99th percentile of time to complete an action (commit, deltacommit, replacecommit, compaction, etc)
Shown as nanosecond
hudi.action.time.count
(rate)
Measures count of times to complete an action (commit, deltacommit, replacecommit, compaction, etc)
hudi.action.time.max
(gauge)
Measures maximum amount of time to complete an action (commit, deltacommit, replacecommit, compaction, etc)
Shown as nanosecond
hudi.action.time.mean
(gauge)
Measures mean amount of time to complete an action (commit, deltacommit, replacecommit, compaction, etc)
Shown as nanosecond
hudi.action.time.min
(gauge)
Measures minimum amount of time to complete an action (commit, deltacommit, replacecommit, compaction, etc)
Shown as nanosecond
hudi.action.time.std_dev
(gauge)
Measures standard deviation of time to complete an action (commit, deltacommit, replacecommit, compaction, etc)
Shown as nanosecond
hudi.action.update_records_written
(rate)
The amount of update records written in an action (commit, deltacommit, replacecommit, compaction, etc)
Shown as record
hudi.action.upsert_time
(rate)
The upsert time of an action (commit, deltacommit, replacecommit, compaction, etc)
Shown as millisecond
hudi.clean.duration
(gauge)
The total time spent cleaning
Shown as millisecond
hudi.clean.files_deleted
(gauge)
The number of files deleted in cleans
Shown as file
hudi.finalize.duration
(gauge)
The total time spent finalizing
Shown as millisecond
hudi.finalize.files_finalized
(gauge)
The number of files finalized"
Shown as file
hudi.index.command.duration
(gauge)
The time spent performing an index command (UPSERT, INSERT_OVERWRITE, etc)
Shown as millisecond
hudi.rollback.duration
(gauge)
The total time spent in rollback
Shown as millisecond
hudi.rollback.files_deleted
(gauge)
The number of files deleted in rollback
Shown as file

Log collection

Available for Agent versions >6.0

  1. Hudi uses the log4j logger by default. To customize the format, edit the log4j.properties file in either your Flink or Spark conf directory. An example log4j.properties file is:

     log4j.rootCategory=INFO, file
     log4j.appender.file=org.apache.log4j.FileAppender
     log4j.appender.file.File=/var/log/hudi.log
     log4j.appender.file.append=false
     log4j.appender.file.layout=org.apache.log4j.PatternLayout
     log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n
    
  2. By default, Datadog’s integration pipeline supports the following conversion pattern:

    %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n
    

    An example of a valid timestamp is: 2020-02-03 18:43:12,251.

    Clone and edit the integration pipeline if you have a different format.

  3. Collecting logs is disabled by default in the Datadog Agent, enable it in your datadog.yaml file:

    logs_enabled: true
    
  4. Uncomment and edit the logs configuration block in your hudi.d/conf.yaml file. Change the path and service parameter values based on your environment. See the sample hudi.d/conf.yaml for all available configuration options.

    logs:
      - type: file
        path: /var/log/hudi.log
        source: hudi
        log_processing_rules:
          - type: multi_line
            pattern: \d{4}\-(0?[1-9]|1[012])\-(0?[1-9]|[12][0-9]|3[01])
            name: new_log_start_with_date
    

Events

The Hudi integration does not include any events.

Service Checks

hudi.can_connect
Returns CRITICAL if the Agent is unable to connect to and collect metrics from the monitored Hudi instance, WARNING if no metrics are collected, and OK otherwise.
Statuses: ok, critical, warning

Troubleshooting

Need help? Contact Datadog support.