- 重要な情報
- はじめに
- 用語集
- ガイド
- エージェント
- インテグレーション
- OpenTelemetry
- 開発者
- API
- CoScreen
- アプリ内
- Service Management
- インフラストラクチャー
- アプリケーションパフォーマンス
- 継続的インテグレーション
- ログ管理
- セキュリティ
- UX モニタリング
- 管理
Supported OS
このチェックは、Datadog Agent を通じて Spark を監視します。以下の Spark メトリクスを収集します。
Spark チェックは Datadog Agent パッケージに含まれています。Mesos マスター(Mesos の Spark)、YARN ResourceManager(YARN の Spark)、Spark マスター(Spark Standalone)に追加でインストールする必要はありません。
ホストで実行中の Agent に対してこのチェックを構成するには:
Agent のコンフィギュレーションディレクトリのルートにある conf.d/
フォルダーの spark.d/conf.yaml
ファイルを編集します。以下のパラメーターは、更新が必要な場合があります。使用可能なすべてのコンフィギュレーションオプションの詳細については、サンプル spark.d/conf.yaml を参照してください。
init_config:
instances:
- spark_url: http://localhost:8080 # Spark master web UI
# spark_url: http://<Mesos_master>:5050 # Mesos master web UI
# spark_url: http://<YARN_ResourceManager_address>:8088 # YARN ResourceManager address
spark_cluster_mode: spark_yarn_mode # default
# spark_cluster_mode: spark_mesos_mode
# spark_cluster_mode: spark_yarn_mode
# spark_cluster_mode: spark_driver_mode
# required; adds a tag 'cluster_name:<CLUSTER_NAME>' to all metrics
cluster_name: "<CLUSTER_NAME>"
# spark_pre_20_mode: true # if you use Standalone Spark < v2.0
# spark_proxy_enabled: true # if you have enabled the spark UI proxy
コンテナ環境の場合は、オートディスカバリーのインテグレーションテンプレートのガイドを参照して、次のパラメーターを適用してください。
パラメーター | 値 |
---|---|
<インテグレーション名> | spark |
<初期コンフィギュレーション> | 空白または {} |
<インスタンスコンフィギュレーション> | {"spark_url": "%%host%%:8080", "cluster_name":"<CLUSTER_NAME>"} |
Datadog Agent で、ログの収集はデフォルトで無効になっています。以下のように、datadog.yaml
ファイルでこれを有効にします。
logs_enabled: true
spark.d/conf.yaml
ファイルのコメントを解除して、ログコンフィギュレーションブロックを編集します。環境に基づいて、 type
、path
、service
パラメーターの値を変更してください。使用可能なすべての構成オプションの詳細については、サンプル spark.d/conf.yaml を参照してください。
logs:
- type: file
path: <LOG_FILE_PATH>
source: spark
service: <SERVICE_NAME>
# To handle multi line that starts with yyyy-mm-dd use the following pattern
# log_processing_rules:
# - type: multi_line
# pattern: \d{4}\-(0?[1-9]|1[012])\-(0?[1-9]|[12][0-9]|3[01])
# name: new_log_start_with_date
Docker 環境のログを有効にするには、Docker ログ収集を参照してください。
Agent の status サブコマンドを実行し、Checks セクションで spark
を探します。
spark.job.count (count) | Number of jobs Shown as task |
spark.job.num_tasks (count) | Number of tasks in the application Shown as task |
spark.job.num_active_tasks (count) | Number of active tasks in the application Shown as task |
spark.job.num_skipped_tasks (count) | Number of skipped tasks in the application Shown as task |
spark.job.num_failed_tasks (count) | Number of failed tasks in the application Shown as task |
spark.job.num_completed_tasks (count) | Number of completed tasks in the application Shown as task |
spark.job.num_active_stages (count) | Number of active stages in the application Shown as stage |
spark.job.num_completed_stages (count) | Number of completed stages in the application Shown as stage |
spark.job.num_skipped_stages (count) | Number of skipped stages in the application Shown as stage |
spark.job.num_failed_stages (count) | Number of failed stages in the application Shown as stage |
spark.stage.count (count) | Number of stages Shown as task |
spark.stage.num_active_tasks (count) | Number of active tasks in the application's stages Shown as task |
spark.stage.num_complete_tasks (count) | Number of complete tasks in the application's stages Shown as task |
spark.stage.num_failed_tasks (count) | Number of failed tasks in the application's stages Shown as task |
spark.stage.executor_run_time (count) | Time spent by the executor in the application's stages Shown as millisecond |
spark.stage.input_bytes (count) | Input bytes in the application's stages Shown as byte |
spark.stage.input_records (count) | Input records in the application's stages Shown as record |
spark.stage.output_bytes (count) | Output bytes in the application's stages Shown as byte |
spark.stage.output_records (count) | Output records in the application's stages Shown as record |
spark.stage.shuffle_read_bytes (count) | Number of bytes read during a shuffle in the application's stages Shown as byte |
spark.stage.shuffle_read_records (count) | Number of records read during a shuffle in the application's stages Shown as record |
spark.stage.shuffle_write_bytes (count) | Number of shuffled bytes in the application's stages Shown as byte |
spark.stage.shuffle_write_records (count) | Number of shuffled records in the application's stages Shown as record |
spark.stage.memory_bytes_spilled (count) | Number of bytes spilled to disk in the application's stages Shown as byte |
spark.stage.disk_bytes_spilled (count) | Max size on disk of the spilled bytes in the application's stages Shown as byte |
spark.driver.rdd_blocks (count) | Number of RDD blocks in the driver Shown as block |
spark.driver.memory_used (count) | Amount of memory used in the driver Shown as byte |
spark.driver.disk_used (count) | Amount of disk used in the driver Shown as byte |
spark.driver.active_tasks (count) | Number of active tasks in the driver Shown as task |
spark.driver.failed_tasks (count) | Number of failed tasks in the driver Shown as task |
spark.driver.completed_tasks (count) | Number of completed tasks in the driver Shown as task |
spark.driver.total_tasks (count) | Number of total tasks in the driver Shown as task |
spark.driver.total_duration (count) | Time spent in the driver Shown as millisecond |
spark.driver.total_input_bytes (count) | Number of input bytes in the driver Shown as byte |
spark.driver.total_shuffle_read (count) | Number of bytes read during a shuffle in the driver Shown as byte |
spark.driver.total_shuffle_write (count) | Number of shuffled bytes in the driver Shown as byte |
spark.driver.max_memory (count) | Maximum memory used in the driver Shown as byte |
spark.executor.count (count) | Number of executors Shown as task |
spark.executor.rdd_blocks (count) | Number of persisted RDD blocks in the application's executors Shown as block |
spark.executor.memory_used (count) | Amount of memory used for cached RDDs in the application's executors Shown as byte |
spark.executor.max_memory (count) | Max memory across all executors working for a particular application Shown as byte |
spark.executor.disk_used (count) | Amount of disk space used by persisted RDDs in the application's executors Shown as byte |
spark.executor.active_tasks (count) | Number of active tasks in the application's executors Shown as task |
spark.executor.failed_tasks (count) | Number of failed tasks in the application's executors Shown as task |
spark.executor.completed_tasks (count) | Number of completed tasks in the application's executors Shown as task |
spark.executor.total_tasks (count) | Total number of tasks in the application's executors Shown as task |
spark.executor.total_duration (count) | Time spent by the application's executors executing tasks Shown as millisecond |
spark.executor.total_input_bytes (count) | Total number of input bytes in the application's executors Shown as byte |
spark.executor.total_shuffle_read (count) | Total number of bytes read during a shuffle in the application's executors Shown as byte |
spark.executor.total_shuffle_write (count) | Total number of shuffled bytes in the application's executors Shown as byte |
spark.executor_memory (count) | Maximum memory available for caching RDD blocks in the application's executors Shown as byte |
spark.executor.id.rdd_blocks (count) | Number of persisted RDD blocks in this executor Shown as block |
spark.executor.id.memory_used (count) | Amount of memory used for cached RDDs in this executor. Shown as byte |
spark.executor.id.max_memory (count) | Total amount of memory available for storage for this executor Shown as byte |
spark.executor.id.disk_used (count) | Amount of disk space used by persisted RDDs in this executor Shown as byte |
spark.executor.id.active_tasks (count) | Number of active tasks in this executor Shown as task |
spark.executor.id.failed_tasks (count) | Number of failed tasks in this executor Shown as task |
spark.executor.id.completed_tasks (count) | Number of completed tasks in this executor Shown as task |
spark.executor.id.total_tasks (count) | Total number of tasks in this executor Shown as task |
spark.executor.id.total_duration (count) | Time spent by the executor executing tasks Shown as millisecond |
spark.executor.id.total_input_bytes (count) | Total number of input bytes in the executor Shown as byte |
spark.executor.id.total_shuffle_read (count) | Total number of bytes read during a shuffle in the executor Shown as byte |
spark.executor.id.total_shuffle_write (count) | Total number of shuffled bytes in the executor Shown as byte |
spark.rdd.count (count) | Number of RDDs |
spark.rdd.num_partitions (count) | Number of persisted RDD partitions in the application |
spark.rdd.num_cached_partitions (count) | Number of in-memory cached RDD partitions in the application |
spark.rdd.memory_used (count) | Amount of memory used in the application's persisted RDDs Shown as byte |
spark.rdd.disk_used (count) | Amount of disk space used by persisted RDDs in the application Shown as byte |
spark.streaming.statistics.avg_input_rate (gauge) | Average streaming input data rate Shown as byte |
spark.streaming.statistics.avg_processing_time (gauge) | Average application's streaming batch processing time Shown as millisecond |
spark.streaming.statistics.avg_scheduling_delay (gauge) | Average application's streaming batch scheduling delay Shown as millisecond |
spark.streaming.statistics.avg_total_delay (gauge) | Average application's streaming batch total delay Shown as millisecond |
spark.streaming.statistics.batch_duration (gauge) | Application's streaming batch duration Shown as millisecond |
spark.streaming.statistics.num_active_batches (gauge) | Number of active streaming batches Shown as job |
spark.streaming.statistics.num_active_receivers (gauge) | Number of active streaming receivers Shown as object |
spark.streaming.statistics.num_inactive_receivers (gauge) | Number of inactive streaming receivers Shown as object |
spark.streaming.statistics.num_processed_records (count) | Number of processed streaming records Shown as record |
spark.streaming.statistics.num_received_records (count) | Number of received streaming records Shown as record |
spark.streaming.statistics.num_receivers (gauge) | Number of streaming application's receivers Shown as object |
spark.streaming.statistics.num_retained_completed_batches (count) | Number of retained completed application's streaming batches Shown as job |
spark.streaming.statistics.num_total_completed_batches (count) | Total number of completed application's streaming batches Shown as job |
spark.structured_streaming.input_rate (gauge) | Average streaming input data rate Shown as record |
spark.structured_streaming.latency (gauge) | Average latency for the structured streaming application. Shown as millisecond |
spark.structured_streaming.processing_rate (gauge) | Number of received streaming records per second Shown as row |
spark.structured_streaming.rows_count (gauge) | Count of rows. Shown as row |
spark.structured_streaming.used_bytes (gauge) | Number of bytes used in memory. Shown as byte |
Spark チェックには、イベントは含まれません。
spark.resource_manager.can_connect
Agent が Spark インスタンスの ResourceManager に接続できない場合は、CRITICAL
を返します。それ以外の場合は、OK
を返します。
Statuses: ok, クリティカル
spark.application_master.can_connect
Agent が Spark インスタンスの ApplicationMaster に接続できない場合は、CRITICAL
を返します。それ以外の場合は、OK
を返します。
Statuses: ok, クリティカル
AWS EMR 上の Spark のメトリクスを受信するには、ブートストラップアクションを使用して Datadog Agent をインストールします。
Agent v5 の場合は、各 EMR ノードに正しい値が指定された /etc/dd-agent/conf.d/spark.yaml
構成ファイルを作成します。
Agent v6/7 の場合は、各 EMR ノードに正しい値が指定された /etc/datadog-agent/conf.d/spark.d/conf.yaml
構成ファイルを作成します。
Spark インテグレーションは、実行中のアプリに関するメトリクスのみを収集します。現在実行中のアプリがない場合、チェックはヘルスチェックを送信するだけです。
お役に立つドキュメント、リンクや記事: