To find out if this integration is available in your organization, see your Datadog Integrations page or ask your organization administrator.
To initiate an exception request to enable this integration for your organization, email support@ddog-gov.com.
Overview
This check monitors Control-M through the Datadog Agent.
Control-M is a workload automation platform that orchestrates batch jobs, file transfers, and application workflows across on-premises and cloud environments. This integration connects to the Control-M Automation API to collect server health, job execution metrics, and completion events, giving you visibility into your scheduling infrastructure from within Datadog.
The integration provides:
- Server health monitoring: Track which Control-M servers are up or disconnected.
- Job rollup metrics: Total, active, waiting, and per-status breakdowns across all servers.
- Per-job completion tracking: Run counts and durations for terminal jobs (ended OK, ended not OK, cancelled), with deduplication across check cycles.
- Events: Optional Datadog events for job failures, cancellations, slow runs, and (opt-in) successes.
Setup
Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the Autodiscovery Integration Templates for guidance on applying these instructions.
Installation
The Control-M check is included in the Datadog Agent package.
No additional installation is needed on your server.
Configuration
Edit the control_m.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory.
Minimum configuration (static token)
instances:
- control_m_api_endpoint: https://your-controlm-host:8443/automation-api
headers:
Authorization: Bearer <YOUR_API_TOKEN>
Session-login authentication
If your environment uses username and password authentication instead of a static token:
instances:
- control_m_api_endpoint: https://your-controlm-host:8443/automation-api
control_m_username: <USERNAME>
control_m_password: <PASSWORD>
When both headers (with an Authorization key) and credentials are configured, the check tries the static token first. If the API responds with a 401, it falls back to session login automatically.
Histogram: control_m.job.run.duration_ms
The job.run.duration_ms metric is submitted as a histogram. The Datadog Agent expands it into multiple aggregated metrics based on the histogram_aggregates and histogram_percentiles settings in the main datadog.yaml file:
| Generated metric | Type | Default |
|---|
control_m.job.run.duration_ms.avg | gauge | Enabled |
control_m.job.run.duration_ms.count | rate | Enabled |
control_m.job.run.duration_ms.max | gauge | Enabled |
control_m.job.run.duration_ms.median | gauge | Enabled |
control_m.job.run.duration_ms.95percentile | gauge | Enabled |
To customize which aggregations are produced, edit the histogram_aggregates and histogram_percentiles options in your datadog.yaml file:
histogram_aggregates:
- max
- median
- avg
- count
histogram_percentiles:
- "0.95"
These settings are Agent-level and apply to all histograms from all integrations.
Events
When emit_job_events is enabled, the check emits Datadog events for terminal job completions:
| Event type | Alert type | Trigger |
|---|
control_m.job.completion | error | Job ended not OK. |
control_m.job.completion | warning | Job cancelled. |
control_m.job.completion | success | Job ended OK (only when emit_success_events: true). |
control_m.job.slow_run | warning | Job duration exceeds slow_run_threshold_ms. |
Events include high-cardinality details in the body: job ID, run number, folder, type, start time, and duration.
Events respect deduplication - the same job and run combination only fires an event on the first check cycle it appears.
Optional settings
instances:
- control_m_api_endpoint: https://your-controlm-host:8443/automation-api
headers:
Authorization: Bearer <YOUR_API_TOKEN>
# Events
emit_job_events: true # Emit Datadog events for job completions (default: false)
emit_success_events: false # Include success events, not just failures/cancellations (default: false)
slow_run_threshold_ms: 3600000 # Flag jobs slower than this as slow_run events (default: none)
# Job filtering
job_status_limit: 10000 # Max jobs per API call (default: 10000, server max)
job_name_filter: '*' # Wildcard filter for job names (default: *)
# Session token tuning
token_lifetime_seconds: 1800 # Assumed token lifetime (default: 1800)
token_refresh_buffer_seconds: 300 # Refresh this many seconds before expiry (default: 300)
# Deduplication TTLs
finalized_ttl_seconds: 86400 # How long to remember completed jobs (default: 24h)
active_ttl_seconds: 21600 # How long to track active jobs (default: 6h)
See the sample control_m.d/conf.yaml for all available configuration options.
Restart the Agent after making changes.
Validation
Run the Agent’s status subcommand and look for control_m under the Checks section.
$ datadog-agent status
...
control_m (1.0.0)
-----------------
Instance ID: control_m:abc1234 [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/control_m.d/conf.yaml
Total Runs: 42
Metric Samples: Last Run: 15, Total: 630
Events: Last Run: 0, Total: 3
Service Checks: Last Run: 1, Total: 42
Average Execution Time: 245ms
Troubleshooting
The can_connect metric reports 0
- Verify the
control_m_api_endpoint is reachable from the Agent host: curl -s -o /dev/null -w '%{http_code}' https://your-host:8443/automation-api/config/servers -H 'Authorization: Bearer <TOKEN>' - Check that the API token or credentials are valid.
- If TLS verification is failing, set
tls_verify: false temporarily to confirm, then fix the certificate chain.
Metrics show fewer jobs than expected
The API has a server-enforced maximum of 10,000 jobs per request. If jobs.total exceeds jobs.returned, some jobs are being truncated. Consider using job_name_filter to narrow the scope.
Events are not appearing
Verify emit_job_events: true is set in the instance configuration. Success events require both emit_job_events: true and emit_success_events: true.
Events respect deduplication: a job reported in a previous check cycle does not fire again.
Duplicate metrics after Agent restart
The check persists dedup state to the Agent’s cache. If the cache was cleared (for example, after a clean reinstall), previously reported terminal jobs may be re-emitted once. Increase finalized_ttl_seconds if completed jobs remain visible in the Control-M status feed for longer than 24 hours.
Data Collected
Metrics
| |
|---|
control_m.can_connect (gauge) | Control-M API connectivity status (1 when API is reachable, 0 otherwise). |
control_m.can_login (gauge) | Control-M session login status (1 when authentication succeeds, 0 otherwise). Only emitted in session-login mode. |
control_m.job.overrun_ms (gauge) | How far past its estimated end time an actively executing job is running. Only emitted for jobs with an estimatedEndTime that have exceeded it. Shown as millisecond |
control_m.job.run.count (count) | Count of terminal job runs observed in the status feed. Shown as job |
control_m.job.run.duration_ms (gauge) | Submitted as a histogram. The Agent expands this into aggregated metrics (avg, count, max, median, 95percentile) controlled by histogram_aggregates and histogram_percentiles in datadog.yaml. Shown as millisecond |
control_m.job.run.overrun_ms (gauge) | Submitted as a histogram at job completion. How far past its estimated end time the job ran. Only emitted for terminal jobs that exceeded their estimatedEndTime. Shown as millisecond |
control_m.jobs.active (gauge) | Current number of active (non-terminal) jobs. Shown as job |
control_m.jobs.by_status (gauge) | Current number of jobs per normalized Control-M status. Shown as job |
control_m.jobs.returned (gauge) | Number of job entries returned in the current status response. Shown as job |
control_m.jobs.total (gauge) | Total number of jobs reported by the Control-M API (from the response total field). Shown as job |
control_m.jobs.waiting.by_server (gauge) | Number of waiting jobs per Control-M server. Shown as job |
control_m.jobs.waiting.total (gauge) | Total number of waiting jobs across all servers. Shown as job |
control_m.server.up (gauge) | Whether the Control-M server is up (1) or down (0). |
Uninstallation
To uninstall the Control-M integration, remove the control_m.d/conf.yaml file from your Agent’s conf.d/ directory and restart the Agent.
Support
Need help? Contact Datadog Support.