Amazon Glue

Présentation

AWS Glue est un service d’extraction, de transformation et de chargement (ETL) entièrement géré qui simplifie et réduit les coûts associés à la catégorisation, au nettoyage, à l’enrichissement et au déplacement sans erreur de vos données d’un data store à l’autre.

Activez cette intégration pour visualiser dans Datadog toutes vos métriques de Glue.

Configuration

Installation

Si vous ne l’avez pas déjà fait, configurez d’abord l’intégration Amazon Web Services.

Collecte de métriques

  1. Dans le carré d’intégration AWS, assurez-vous que l’option Glue est cochée dans la section concernant la collecte des métriques.
  2. Installez l’intégration Datadog/Amazon Glue.

Collecte de logs

Activer le logging

Configurez Amazon Glue de façon à ce que ses logs soient envoyés vers un compartiment S3 ou vers CloudWatch.

Remarque : si vous envoyez vos logs vers un compartiment S3, assurez-vous que amazon_glue est défini en tant que Target prefix.

Envoyer des logs à Datadog

  1. Si vous ne l’avez pas déjà fait, configurez la fonction Lambda de collecte de logs AWS avec Datadog.

  2. Une fois la fonction Lambda installée, ajoutez manuellement un déclencheur sur le compartiment S3 ou sur le groupe de logs CloudWatch qui contient vos logs Amazon Glue dans la console AWS :

Données collectées

Métriques

aws.glue.driver.executor_allocation_manager.executors.number_all_executors
(gauge)
The number of actively running job executors.
aws.glue.driver.executor_allocation_manager.executors.number_max_needed_executors
(gauge)
The number of maximum (actively running and pending) job executors needed to satisfy the current load.
aws.glue.glue_alljvm_heap_usage
(gauge)
The average fraction of memory used by the JVM heap for this driver (scale: 0-1) for all executors.
Shown as percent
aws.glue.glue_alljvm_heap_used
(gauge)
The number of memory bytes used by the JVM heap for all executors.
Shown as byte
aws.glue.glue_alls_3filesystem_readbytes
(gauge)
The average number of bytes read from Amazon S3 all executors since the previous report.
aws.glue.glue_allsystem_cpu_system_load
(gauge)
The average fraction of CPU system load used (scale: 0-1) by all executors.
Shown as percent
aws.glue.glue_driver_aggregate_bytes_read
(count)
The number of bytes read from all data sources by all completed Spark tasks running in all executors.
Shown as byte
aws.glue.glue_driver_aggregate_elapsed_time
(count)
The ETL elapsed time in milliseconds (does not include the job bootstrap times).
Shown as millisecond
aws.glue.glue_driver_aggregate_num_completed_stages
(count)
The number of completed stages in the job.
aws.glue.glue_driver_aggregate_num_completed_tasks
(count)
The number of completed tasks in the job.
aws.glue.glue_driver_aggregate_num_failed_tasks
(count)
The number of failed tasks.
aws.glue.glue_driver_aggregate_num_killed_tasks
(count)
The number of tasks killed.
aws.glue.glue_driver_aggregate_records_read
(count)
The number of records read from all data sources by all completed Spark tasks running in all executors.
aws.glue.glue_driver_aggregate_shuffle_bytes_written
(count)
The number of bytes written by all executors to shuffle data between them since the previous report.
aws.glue.glue_driver_aggregate_shuffle_local_bytes_read
(count)
The number of bytes read by all executors to shuffle data between them since the previous report.
aws.glue.glue_driver_block_manager_disk_disk_space_used_mb
(gauge)
The average number of megabytes of disk spaced used across all executors.
aws.glue.glue_driver_jvm_heap_usage
(gauge)
The average fraction of memory used by the JVM heap for this driver (scale: 0-1) for driver.
Shown as percent
aws.glue.glue_driver_jvm_heap_used
(gauge)
The number of memory bytes used by the JVM heap for the driver.
Shown as byte
aws.glue.glue_driver_s3_filesystem_readbytes
(gauge)
The average number of bytes read from Amazon S3 by the driver since the previous report.
aws.glue.glue_driver_s3_filesystem_writebytes
(gauge)
The average number of bytes written to Amazon S3 by the driver since the previous report.
aws.glue.glue_driver_system_cpu_system_load
(gauge)
The average fraction of CPU system load used (scale: 0-1) by the driver.
Shown as percent
aws.glue.glue_executor_id_jvm_heap_usage
(gauge)
The average fraction of memory used by the JVM heap for this driver (scale: 0-1) for executor identified.
Shown as percent
aws.glue.glue_executor_id_jvm_heap_used
(gauge)
The number of memory bytes used by the JVM heap for the executor identified.
Shown as byte
aws.glue.glue_executor_id_system_cpu_system_load
(gauge)
The average fraction of CPU system load used (scale: 0-1) by the executor identified.
Shown as percent
aws.glue.glue_executor_ids_3_filesystem_readbytes
(gauge)
The average number of bytes read from Amazon S3 by the executor identified since the previous report.
aws.glue.glue_executor_ids_3_filesystem_writebytes
(gauge)
The average number of bytes written to Amazon S3 by the executor identified since the previous report.

Événements

L’intégration Amazon Glue n’inclut aucun événement.

Checks de service

L’intégration Amazon Glue n’inclut aucun check de service.

Dépannage

Besoin d’aide ? Contactez l’assistance Datadog.