Yarn
セキュリティモニタリングが使用可能です セキュリティモニタリングが使用可能です

Yarn

Agent Check Agentチェック

Supported OS: Linux Mac OS Windows

Hadoop Yarn

概要

このチェックは、YARN ResourceManager からメトリクスを収集します。以下は、メトリクスの一例です。

  • クラスター全体のメトリクス (実行中のアプリ、実行中のコンテナ、異常なノードの数など)
  • アプリケーションごとのメトリクス (アプリの進捗状況、経過した実行時間、実行中のコンテナ数、メモリ使用量など)
  • ノードメトリクス (使用可能な vCores、最新の健全性更新時間など)

非推奨のお知らせ

yarn.apps メトリクスは GAUGE ではなく RATE として誤って報告されるため、yarn.apps.<メトリクス> メトリクスは非推奨になりました。yarn.apps.<メトリクス>_gauge メトリクスを使用してください。

セットアップ

インストール

YARN チェックは Datadog Agent パッケージに含まれています。YARN ResourceManager に追加でインストールする必要はありません。

構成

ホスト

ホストで実行中の Agent でこのチェックを構成する場合は、以下の手順に従ってください。コンテナ環境の場合は、コンテナ化セクションを参照してください。

  1. Agent のコンフィギュレーションディレクトリのルートにある conf.d/ フォルダーの yarn.d/conf.yaml ファイルを編集します。

    init_config:
    
    instances:
     ## @param resourcemanager_uri - string - required
     ## The YARN check retrieves metrics from YARNS's ResourceManager. This
     ## check must be run from the Master Node and the ResourceManager URI must
     ## be specified below. The ResourceManager URI is composed of the
     ## ResourceManager's hostname and port.
     ## The ResourceManager hostname can be found in the yarn-site.xml conf file
     ## under the property yarn.resourcemanager.address
     ##
     ## The ResourceManager port can be found in the yarn-site.xml conf file under
     ## the property yarn.resourcemanager.webapp.address
     #
     - resourcemanager_uri: http://localhost:8088
    
       ## @param cluster_name - string - required - default: default_cluster
       ## A friendly name for the cluster.
       #
       cluster_name: default_cluster

    すべてのチェックオプションの一覧と説明については、チェックコンフィギュレーションの例を参照してください。

  2. Agent を再起動すると、Datadog への YARN メトリクスの送信が開始されます。

コンテナ化

コンテナ環境の場合は、オートディスカバリーのインテグレーションテンプレートのガイドを参照して、次のパラメーターを適用してください。

パラメーター
<インテグレーション名>yarn
<初期コンフィギュレーション>空白または {}
<インスタンスコンフィギュレーション>{"resourcemanager_uri": "http://%%host%%:%%port%%", "cluster_name": "<クラスター名>"}
ログの収集
  1. Datadog Agent で、ログの収集はデフォルトで無効になっています。以下のように、datadog.yaml ファイルでこれを有効にします。

    logs_enabled: true
  2. yarn.d/conf.yaml ファイルのコメントを解除して、ログコンフィギュレーションブロックを編集します。環境に基づいて、 typepathservice パラメーターの値を変更してください。使用可能なすべての構成オプションの詳細については、サンプル yarn.d/conf.yaml を参照してください。

    logs:
      - type: file
        path: <LOG_FILE_PATH>
        source: yarn
        service: <SERVICE_NAME>
        # To handle multi line that starts with yyyy-mm-dd use the following pattern
        # log_processing_rules:
        #   - type: multi_line
        #     pattern: \d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2},\d{3}
        #     name: new_log_start_with_date
  3. Agent を再起動します

Docker環境でログを収集する Agent を構成する追加の情報に関しては、[Datadog ドキュメント][7]を参照してください。

検証

Agent の status サブコマンドを実行し、Checks セクションで yarn を探します。

収集データ

メトリクス

yarn.metrics.apps_submitted
(gauge)
The number of submitted apps
Shown as task
yarn.metrics.apps_completed
(gauge)
The number of completed apps
Shown as task
yarn.metrics.apps_pending
(gauge)
The number of pending apps
Shown as task
yarn.metrics.apps_running
(gauge)
The number of running apps
Shown as task
yarn.metrics.apps_failed
(gauge)
The number of failed apps
Shown as task
yarn.metrics.apps_killed
(gauge)
The number of killed apps
Shown as task
yarn.metrics.reserved_mb
(gauge)
The size of reserved memory
Shown as mebibyte
yarn.metrics.available_mb
(gauge)
The amount of available memory
Shown as mebibyte
yarn.metrics.allocated_mb
(gauge)
The amount of allocated memory
Shown as mebibyte
yarn.metrics.total_mb
(gauge)
The amount of total memory
Shown as mebibyte
yarn.metrics.reserved_virtual_cores
(gauge)
The number of reserved virtual cores
Shown as core
yarn.metrics.available_virtual_cores
(gauge)
The number of available virtual cores
Shown as core
yarn.metrics.allocated_virtual_cores
(gauge)
The number of allocated virtual cores
Shown as core
yarn.metrics.total_virtual_cores
(gauge)
The total number of virtual cores
Shown as core
yarn.metrics.containers_allocated
(gauge)
The number of containers allocated
yarn.metrics.containers_reserved
(gauge)
The number of containers reserved
yarn.metrics.containers_pending
(gauge)
The number of containers pending
yarn.metrics.total_nodes
(gauge)
The total number of nodes
Shown as node
yarn.metrics.active_nodes
(gauge)
The number of active nodes
Shown as node
yarn.metrics.lost_nodes
(gauge)
The number of lost nodes
Shown as node
yarn.metrics.unhealthy_nodes
(gauge)
The number of unhealthy nodes
Shown as node
yarn.metrics.decommissioned_nodes
(gauge)
The number of decommissioned nodes
Shown as node
yarn.metrics.rebooted_nodes
(gauge)
The number of rebooted nodes
Shown as node
yarn.apps.progress_gauge
(gauge)
The progress of the application as a percent
Shown as percent
yarn.apps.started_time_gauge
(gauge)
The time in which application started (in ms since epoch)
Shown as millisecond
yarn.apps.finished_time_gauge
(gauge)
The time in which the application finished (in ms since epoch)
Shown as millisecond
yarn.apps.elapsed_time_gauge
(gauge)
The elapsed time since the application started (in ms)
Shown as millisecond
yarn.apps.allocated_mb_gauge
(gauge)
The sum of memory in MB allocated to the applications running containers
Shown as mebibyte
yarn.apps.allocated_vcores_gauge
(gauge)
The sum of virtual cores allocated to the applications running containers
Shown as core
yarn.apps.running_containers_gauge
(gauge)
The number of containers currently running for the application
Shown as container
yarn.apps.memory_seconds_gauge
(gauge)
The amount of memory the application has allocated (megabyte-seconds)
Shown as mebibyte
yarn.apps.vcore_seconds_gauge
(gauge)
The amount of CPU resources the application has allocated (virtual core-seconds)
Shown as core
yarn.apps.progress
(rate)
Deprecated use yarn.apps.progress_gauge instead
Shown as percent
yarn.apps.started_time
(rate)
Deprecated use yarn.apps.started_time_gauge instead
Shown as second
yarn.apps.finished_time
(rate)
Deprecated use yarn.apps.finished_time_gauge instead
Shown as second
yarn.apps.elapsed_time
(rate)
Deprecated use yarn.apps.elapsed_time_gauge instead
Shown as second
yarn.apps.allocated_mb
(rate)
Deprecated use yarn.apps.allocated_mb_gauge instead
Shown as mebibyte
yarn.apps.allocated_vcores
(rate)
Deprecated use yarn.apps.allocated_vcores_gauge instead
Shown as core
yarn.apps.running_containers
(rate)
Deprecated use yarn.apps.running_containers_gauge instead
yarn.apps.memory_seconds
(rate)
Deprecated use yarn.apps.memory_seconds_gauge instead
Shown as second
yarn.apps.vcore_seconds
(rate)
Deprecated use yarn.apps.vcore_seconds_gauge instead
Shown as second
yarn.node.last_health_update
(gauge)
The last time the node reported its health (in ms since epoch)
Shown as millisecond
yarn.node.used_memory_mb
(gauge)
The total amount of memory currently used on the node (in MB)
Shown as mebibyte
yarn.node.avail_memory_mb
(gauge)
The total amount of memory currently available on the node (in MB)
Shown as mebibyte
yarn.node.used_virtual_cores
(gauge)
The total number of vCores currently used on the node
Shown as core
yarn.node.available_virtual_cores
(gauge)
The total number of vCores available on the node
Shown as core
yarn.node.num_containers
(gauge)
The total number of containers currently running on the node
yarn.queue.root.max_capacity
(gauge)
The configured maximum queue capacity in percentage for root queue
Shown as percent
yarn.queue.root.used_capacity
(gauge)
The used queue capacity in percentage for root queue
Shown as percent
yarn.queue.root.capacity
(gauge)
The configured queue capacity in percentage for root queue
Shown as percent
yarn.queue.num_pending_applications
(gauge)
The number of pending applications in this queue
Shown as task
yarn.queue.user_am_resource_limit.memory
(gauge)
The maximum memory resources a user can use for Application Masters (in MB)
Shown as mebibyte
yarn.queue.user_am_resource_limit.vcores
(gauge)
The maximum vCpus a user can use for Application Masters
Shown as core
yarn.queue.absolute_capacity
(gauge)
The absolute capacity percentage this queue can use of entire cluster
Shown as percent
yarn.queue.user_limit_factor
(gauge)
The minimum user limit percent set in the configuration
yarn.queue.user_limit
(gauge)
The user limit factor set in the configuration
yarn.queue.num_applications
(gauge)
The number of applications currently in the queue
Shown as task
yarn.queue.used_am_resource.memory
(gauge)
The memory resources used for Application Masters (in MB)
Shown as mebibyte
yarn.queue.used_am_resource.vcores
(gauge)
The vCpus used for Application Masters
Shown as core
yarn.queue.absolute_used_capacity
(gauge)
The absolute used capacity percentage this queue is using of the entire cluster
Shown as percent
yarn.queue.resources_used.memory
(gauge)
The total memory resources this queue is using (in MB)
Shown as mebibyte
yarn.queue.resources_used.vcores
(gauge)
The total vCpus this queue is using
Shown as core
yarn.queue.am_resource_limit.vcores
(gauge)
The maximum vCpus this queue can use for Application Masters
Shown as core
yarn.queue.am_resource_limit.memory
(gauge)
The maximum memory resources this queue can use for Application Masters (in MB)
Shown as mebibyte
yarn.queue.capacity
(gauge)
The configured queue capacity in percentage relative to its parent queue
Shown as percent
yarn.queue.num_active_applications
(gauge)
The number of active applications in this queue
Shown as task
yarn.queue.absolute_max_capacity
(gauge)
The absolute maximum capacity percentage this queue can use of the entire cluster
Shown as percent
yarn.queue.used_capacity
(gauge)
The used queue capacity in percentage
Shown as percent
yarn.queue.num_containers
(gauge)
The number of containers being used
yarn.queue.max_capacity
(gauge)
The configured maximum queue capacity in percentage relative to its parent queue
Shown as percent
yarn.queue.max_applications
(gauge)
The maximum number of applications this queue can have
Shown as task
yarn.queue.max_applications_per_user
(gauge)
The maximum number of active applications per user this queue can have
Shown as task

イベント

Yarn チェックには、イベントは含まれません。

サービスのチェック

yarn.can_connect:

Agent が ResourceManager URI に接続してメトリクスを収集できない場合は、CRITICAL を返します。それ以外の場合は、OK を返します。

yarn.application.status:

conf.yaml ファイルで指定されたマッピングに応じて、アプリケーションのステータスごとに返します。

トラブルシューティング

ご不明な点は、Datadog のサポートチームまでお問合せください。

その他の参考資料

[7]: