Supported OS Linux Windows Mac OS

Versión de la integración7.1.0

Hadoop Yarn

Información general

Este check recopila métricas de tu YARN ResourceManager, incluyendo (pero no limitado a):

  • Métricas de todo el clúster, como el número de aplicaciones en ejecución, de contenedores en ejecución y de nodos insalubres, etc.
  • Métricas por aplicación, como el progreso de la aplicación, el tiempo de ejecución transcurrido, los contenedores en ejecución, el uso de memoria, etc.
  • Métricas de nodo, como los vCores disponibles, la hora de la última actualización de estado, etc.

Aviso de obsolescencia

Las métricas yarn.apps.<METRIC> quedan obsoletas en favor de las métricas yarn.apps.<METRIC>_gauge, ya que las métricas yarn.apps se informan incorrectamente como RATE en lugar de GAUGE.

Configuración

Instalación

El check de YARN está incluido en el paquete del Datadog Agent, por lo que no necesitas instalar nada más en tu YARN ResourceManager.

Configuración

Host

Para configurar este check para un Agent que se ejecuta en un host:

  1. Edita el archivo yarn.d/conf.yaml, que se encuentra en la carpeta conf.d/ en la raíz del directorio de configuración de tu Agent.

    init_config:
    
    instances:
      ## @param resourcemanager_uri - string - required
      ## The YARN check retrieves metrics from YARNS's ResourceManager. This
      ## check must be run from the Master Node and the ResourceManager URI must
      ## be specified below. The ResourceManager URI is composed of the
      ## ResourceManager's hostname and port.
      ## The ResourceManager hostname can be found in the yarn-site.xml conf file
      ## under the property yarn.resourcemanager.address
      ##
      ## The ResourceManager port can be found in the yarn-site.xml conf file under
      ## the property yarn.resourcemanager.webapp.address
      #
      - resourcemanager_uri: http://localhost:8088
    
        ## @param cluster_name - string - required - default: default_cluster
        ## A friendly name for the cluster.
        #
        cluster_name: default_cluster
    

    Consulta la configuración de check de ejemplo para obtener listas y descripciones completas de otras opciones de check.

  2. Reinicia el Agent para empezar a enviar métricas de YARN a Datadog.

En contenedores

Para entornos en contenedores, consulta las plantillas de integración de Autodiscovery para obtener orientación sobre la aplicación de los parámetros que se indican a continuación.

ParámetroValor
<INTEGRATION_NAME>yarn
<INIT_CONFIG>en blanco o {}
<INSTANCE_CONFIG>{"resourcemanager_uri": "http://%%host%%:%%port%%", "cluster_name": "<CLUSTER_NAME>"}
Recopilación de logs
  1. La recopilación de logs se encuentra deshabilitada de manera predeterminada en el Datadog Agent. Habilítala en tu archivo datadog.yaml:

    logs_enabled: true
    
  2. Descomenta y edita el bloque de configuración de logs en tu archivo yarn.d/conf.yaml. Cambia los valores de los parámetros type, path y service en función de tu entorno. Consulta el yarn.d/conf.yaml de ejemplo para conocer todas las opciones de configuración disponibles.

    logs:
      - type: file
        path: <LOG_FILE_PATH>
        source: yarn
        service: <SERVICE_NAME>
        # To handle multi line that starts with yyyy-mm-dd use the following pattern
        # log_processing_rules:
        #   - type: multi_line
        #     pattern: \d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2},\d{3}
        #     name: new_log_start_with_date
    
  3. Reinicia el Agent.

Para habilitar los logs para entornos de Docker, consulta Recopilación de logs de Docker.

Validación

Ejecuta el subcomando de estado del Agent y busca yarn en la sección Checks.

Datos recopilados

Métricas

yarn.apps.allocated_mb
(rate)
Deprecated use yarn.apps.allocated_mb_gauge instead
Shown as mebibyte
yarn.apps.allocated_mb_gauge
(gauge)
The sum of memory in MB allocated to the applications running containers
Shown as mebibyte
yarn.apps.allocated_vcores
(rate)
Deprecated use yarn.apps.allocated_vcores_gauge instead
Shown as core
yarn.apps.allocated_vcores_gauge
(gauge)
The sum of virtual cores allocated to the applications running containers
Shown as core
yarn.apps.elapsed_time
(rate)
Deprecated use yarn.apps.elapsed_time_gauge instead
Shown as second
yarn.apps.elapsed_time_gauge
(gauge)
The elapsed time since the application started (in ms)
Shown as millisecond
yarn.apps.finished_time
(rate)
Deprecated use yarn.apps.finished_time_gauge instead
Shown as second
yarn.apps.finished_time_gauge
(gauge)
The time in which the application finished (in ms since epoch)
Shown as millisecond
yarn.apps.memory_seconds
(rate)
Deprecated use yarn.apps.memory_seconds_gauge instead
Shown as second
yarn.apps.memory_seconds_gauge
(gauge)
The amount of memory the application has allocated (megabyte-seconds)
Shown as mebibyte
yarn.apps.progress
(rate)
Deprecated use yarn.apps.progress_gauge instead
Shown as percent
yarn.apps.progress_gauge
(gauge)
The progress of the application, displayed as 0, 10, & 100, which represent the 3 states: hasn’t started, in progress, & completed
Shown as percent
yarn.apps.running_containers
(rate)
Deprecated use yarn.apps.running_containers_gauge instead
yarn.apps.running_containers_gauge
(gauge)
The number of containers currently running for the application
Shown as container
yarn.apps.started_time
(rate)
Deprecated use yarn.apps.started_time_gauge instead
Shown as second
yarn.apps.started_time_gauge
(gauge)
The time in which application started (in ms since epoch)
Shown as millisecond
yarn.apps.vcore_seconds
(rate)
Deprecated use yarn.apps.vcore_seconds_gauge instead
Shown as second
yarn.apps.vcore_seconds_gauge
(gauge)
The amount of CPU resources the application has allocated (virtual core-seconds)
Shown as core
yarn.metrics.active_nodes
(gauge)
The number of active nodes
Shown as node
yarn.metrics.allocated_mb
(gauge)
The amount of allocated memory
Shown as mebibyte
yarn.metrics.allocated_virtual_cores
(gauge)
The number of allocated virtual cores
Shown as core
yarn.metrics.apps_completed
(gauge)
The number of completed apps
Shown as task
yarn.metrics.apps_failed
(gauge)
The number of failed apps
Shown as task
yarn.metrics.apps_killed
(gauge)
The number of killed apps
Shown as task
yarn.metrics.apps_pending
(gauge)
The number of pending apps
Shown as task
yarn.metrics.apps_running
(gauge)
The number of running apps
Shown as task
yarn.metrics.apps_submitted
(gauge)
The number of submitted apps
Shown as task
yarn.metrics.available_mb
(gauge)
The amount of available memory
Shown as mebibyte
yarn.metrics.available_virtual_cores
(gauge)
The number of available virtual cores
Shown as core
yarn.metrics.containers_allocated
(gauge)
The number of containers allocated
yarn.metrics.containers_pending
(gauge)
The number of containers pending
yarn.metrics.containers_reserved
(gauge)
The number of containers reserved
yarn.metrics.decommissioned_nodes
(gauge)
The number of decommissioned nodes
Shown as node
yarn.metrics.decommissioning_nodes
(gauge)
The number of decommissioning nodes
Shown as node
yarn.metrics.lost_nodes
(gauge)
The number of lost nodes
Shown as node
yarn.metrics.rebooted_nodes
(gauge)
The number of rebooted nodes
Shown as node
yarn.metrics.reserved_mb
(gauge)
The size of reserved memory
Shown as mebibyte
yarn.metrics.reserved_virtual_cores
(gauge)
The number of reserved virtual cores
Shown as core
yarn.metrics.total_mb
(gauge)
The amount of total memory
Shown as mebibyte
yarn.metrics.total_nodes
(gauge)
The total number of nodes
Shown as node
yarn.metrics.total_virtual_cores
(gauge)
The total number of virtual cores
Shown as core
yarn.metrics.unhealthy_nodes
(gauge)
The number of unhealthy nodes
Shown as node
yarn.node.avail_memory_mb
(gauge)
The total amount of memory currently available on the node (in MB)
Shown as mebibyte
yarn.node.available_virtual_cores
(gauge)
The total number of vCores available on the node
Shown as core
yarn.node.last_health_update
(gauge)
The last time the node reported its health (in ms since epoch)
Shown as millisecond
yarn.node.num_containers
(gauge)
The total number of containers currently running on the node
yarn.node.used_memory_mb
(gauge)
The total amount of memory currently used on the node (in MB)
Shown as mebibyte
yarn.node.used_virtual_cores
(gauge)
The total number of vCores currently used on the node
Shown as core
yarn.queue.absolute_capacity
(gauge)
The absolute capacity percentage this queue can use of entire cluster
Shown as percent
yarn.queue.absolute_max_capacity
(gauge)
The absolute maximum capacity percentage this queue can use of the entire cluster
Shown as percent
yarn.queue.absolute_used_capacity
(gauge)
The absolute used capacity percentage this queue is using of the entire cluster
Shown as percent
yarn.queue.am_resource_limit.memory
(gauge)
The maximum memory resources this queue can use for Application Masters (in MB)
Shown as mebibyte
yarn.queue.am_resource_limit.vcores
(gauge)
The maximum vCpus this queue can use for Application Masters
Shown as core
yarn.queue.capacity
(gauge)
The configured queue capacity in percentage relative to its parent queue
Shown as percent
yarn.queue.max_active_applications
(gauge)
The maximum number of active applications this queue can have
Shown as task
yarn.queue.max_active_applications_per_user
(gauge)
The maximum number of active applications per user this queue can have
Shown as task
yarn.queue.max_applications
(gauge)
The maximum number of applications this queue can have
Shown as task
yarn.queue.max_applications_per_user
(gauge)
The maximum number of applications per user this queue can have
Shown as task
yarn.queue.max_capacity
(gauge)
The configured maximum queue capacity in percentage relative to its parent queue
Shown as percent
yarn.queue.num_active_applications
(gauge)
The number of active applications in this queue
Shown as task
yarn.queue.num_applications
(gauge)
The number of applications currently in the queue
Shown as task
yarn.queue.num_containers
(gauge)
The number of containers being used
yarn.queue.num_pending_applications
(gauge)
The number of pending applications in this queue
Shown as task
yarn.queue.resources_used.memory
(gauge)
The total memory resources this queue is using (in MB)
Shown as mebibyte
yarn.queue.resources_used.vcores
(gauge)
The total vCpus this queue is using
Shown as core
yarn.queue.root.capacity
(gauge)
The configured queue capacity in percentage for root queue
Shown as percent
yarn.queue.root.max_capacity
(gauge)
The configured maximum queue capacity in percentage for root queue
Shown as percent
yarn.queue.root.used_capacity
(gauge)
The used queue capacity in percentage for root queue
Shown as percent
yarn.queue.used_am_resource.memory
(gauge)
The memory resources used for Application Masters (in MB)
Shown as mebibyte
yarn.queue.used_am_resource.vcores
(gauge)
The vCpus used for Application Masters
Shown as core
yarn.queue.used_capacity
(gauge)
The used queue capacity in percentage
Shown as percent
yarn.queue.user_am_resource_limit.memory
(gauge)
The maximum memory resources a user can use for Application Masters (in MB)
Shown as mebibyte
yarn.queue.user_am_resource_limit.vcores
(gauge)
The maximum vCpus a user can use for Application Masters
Shown as core
yarn.queue.user_limit
(gauge)
The user limit factor set in the configuration
yarn.queue.user_limit_factor
(gauge)
The minimum user limit percent set in the configuration

Eventos

El check de Yarn no incluye eventos.

Checks de servicio

yarn.can_connect

Returns CRITICAL if the Agent cannot connect to the ResourceManager URI to collect metrics, otherwise OK.

Statuses: ok, critical

yarn.application.status

By default, returns OK if the Yarn application state is NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, or FINISHED; UNKNOWN if the application state is ALL; and CRITICAL if the Yarn application state is FAILED or KILLED.

Statuses: ok, unknown, critical

Solucionar problemas

¿Necesitas ayuda? Ponte en contacto con el servicio de asistencia de Datadog.

Referencias adicionales