---
title: Mesos Master
description: >-
  Track cluster resource usage, master and slave counts, tasks statuses, and
  more.
breadcrumbs: Docs > Integrations > Mesos Master
---

# Mesos Master
Supported OS Integration version6.4.0
This check collects metrics for Mesos masters. For Mesos slave metrics, see the [Mesos Slave integration](https://docs.datadoghq.com/integrations/mesos.md#mesos-slave-integration).



## Overview{% #overview %}

This check collects metrics from Mesos masters for:

- Cluster resources
- Slaves registered, active, inactive, connected, disconnected, etc
- Number of tasks failed, finished, staged, running, etc
- Number of frameworks active, inactive, connected, and disconnected

And many more.

**Minimum Agent version:** 6.0.0

## Setup{% #setup %}

### Installation{% #installation %}

The installation is the same on Mesos with and without DC/OS. Run the datadog-agent container on each of your Mesos master nodes:

```shell
docker run -d --name datadog-agent \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -v /proc/:/host/proc/:ro \
  -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
  -e DD_API_KEY=<YOUR_DATADOG_API_KEY> \
  -e MESOS_MASTER=true \
  -e MARATHON_URL=http://leader.mesos:8080 \
  datadog/agent:latest
```

Substitute your Datadog API key and Mesos Master's API URL into the command above.

### Configuration{% #configuration %}

If you passed the correct Master URL when starting datadog-agent, the Agent is already using a default `mesos_master.d/conf.yaml` to collect metrics from your masters. See the [sample mesos_master.d/conf.yaml](https://github.com/DataDog/integrations-core/blob/master/mesos_master/datadog_checks/mesos_master/data/conf.yaml.example) for all available configuration options.

Unless your masters' API uses a self-signed certificate. In that case, set `disable_ssl_validation: true` in `mesos_master.d/conf.yaml`.

#### Log collection{% #log-collection %}

1. Collecting logs is disabled by default in the Datadog Agent, enable it in your `datadog.yaml` file:

   ```yaml
   logs_enabled: true
   ```

1. Add this configuration block to your `mesos_master.d/conf.yaml` file to start collecting your Mesos logs:

   ```yaml
   logs:
     - type: file
       path: /var/log/mesos/*
       source: mesos
   ```

Change the `path` parameter value based on your environment, or use the default docker stdout:

   ```yaml
   logs:
     - type: docker
       source: mesos
   ```

See the [sample mesos_master.d/conf.yaml](https://github.com/DataDog/integrations-core/blob/master/mesos_master/datadog_checks/mesos_master/data/conf.yaml.example) for all available configuration options.

1. [Restart the Agent](https://docs.datadoghq.com/agent/guide/agent-commands.md#start-stop-and-restart-the-agent).

To enable logs for Kubernetes environments, see [Kubernetes Log Collection](https://docs.datadoghq.com/agent/kubernetes/log.md).

### Validation{% #validation %}

In Datadog, search for `mesos.cluster` in the Metrics Explorer.

## Data Collected{% #data-collected %}

### Metrics{% #metrics %}

|  |
|  |
| **mesos.cluster.cpus\_percent**(gauge)                              | Percentage of allocated CPUs*Shown as percent*                                                                                  |
| **mesos.cluster.cpus\_total**(gauge)                                | Number of CPUs                                                                                                                  |
| **mesos.cluster.cpus\_used**(gauge)                                 | Number of allocated CPUs                                                                                                        |
| **mesos.cluster.disk\_percent**(gauge)                              | Percentage of allocated disk space*Shown as percent*                                                                            |
| **mesos.cluster.disk\_total**(gauge)                                | Disk space*Shown as mebibyte*                                                                                                   |
| **mesos.cluster.disk\_used**(gauge)                                 | Allocated disk space*Shown as mebibyte*                                                                                         |
| **mesos.cluster.dropped\_messages**(gauge)                          | Number of dropped messages*Shown as message*                                                                                    |
| **mesos.cluster.event\_queue\_dispatches**(gauge)                   | Number of dispatches in the event queue                                                                                         |
| **mesos.cluster.event\_queue\_http\_requests**(gauge)               | Number of HTTP requests in the event queue*Shown as request*                                                                    |
| **mesos.cluster.event\_queue\_messages**(gauge)                     | Number of messages in the event queue*Shown as message*                                                                         |
| **mesos.cluster.frameworks\_active**(gauge)                         | Number of active frameworks                                                                                                     |
| **mesos.cluster.frameworks\_connected**(gauge)                      | Number of connected frameworks                                                                                                  |
| **mesos.cluster.frameworks\_disconnected**(gauge)                   | Number of disconnected frameworks                                                                                               |
| **mesos.cluster.frameworks\_inactive**(gauge)                       | Number of inactive frameworks                                                                                                   |
| **mesos.cluster.gpus\_percent**(gauge)                              | Percentage of allocated GPUs*Shown as percent*                                                                                  |
| **mesos.cluster.gpus\_total**(gauge)                                | Number of GPUs                                                                                                                  |
| **mesos.cluster.gpus\_used**(gauge)                                 | Number of allocated GPUs                                                                                                        |
| **mesos.cluster.invalid\_framework\_to\_executor\_messages**(gauge) | Number of invalid framework messages*Shown as message*                                                                          |
| **mesos.cluster.invalid\_status\_update\_acknowledgements**(gauge)  | Number of invalid status update acknowledgements                                                                                |
| **mesos.cluster.invalid\_status\_updates**(gauge)                   | Number of invalid status updates                                                                                                |
| **mesos.cluster.mem\_percent**(gauge)                               | Percentage of allocated memory*Shown as percent*                                                                                |
| **mesos.cluster.mem\_total**(gauge)                                 | Total memory*Shown as mebibyte*                                                                                                 |
| **mesos.cluster.mem\_used**(gauge)                                  | Allocated memory*Shown as mebibyte*                                                                                             |
| **mesos.cluster.outstanding\_offers**(gauge)                        | Number of outstanding resource offers                                                                                           |
| **mesos.cluster.slave\_registrations**(gauge)                       | Number of slaves that were able to cleanly re-join the cluster and connect back to the master after the master is disconnected. |
| **mesos.cluster.slave\_removals**(gauge)                            | Number of slaves removed for various reasons, including maintenance                                                             |
| **mesos.cluster.slave\_reregistrations**(gauge)                     | Number of slave re-registrations                                                                                                |
| **mesos.cluster.slave\_shutdowns\_canceled**(gauge)                 | Number of cancelled slave shutdowns                                                                                             |
| **mesos.cluster.slave\_shutdowns\_scheduled**(gauge)                | Number of slaves which have failed their health check and are scheduled to be removed                                           |
| **mesos.cluster.slaves\_active**(gauge)                             | Number of active slaves                                                                                                         |
| **mesos.cluster.slaves\_connected**(gauge)                          | Number of connected slaves                                                                                                      |
| **mesos.cluster.slaves\_disconnected**(gauge)                       | Number of disconnected slaves                                                                                                   |
| **mesos.cluster.slaves\_inactive**(gauge)                           | Number of inactive slaves                                                                                                       |
| **mesos.cluster.tasks\_error**(gauge)                               | Number of tasks that were invalid*Shown as task*                                                                                |
| **mesos.cluster.tasks\_failed**(count)                              | Number of failed tasks*Shown as task*                                                                                           |
| **mesos.cluster.tasks\_finished**(count)                            | Number of finished tasks*Shown as task*                                                                                         |
| **mesos.cluster.tasks\_killed**(count)                              | Number of killed tasks*Shown as task*                                                                                           |
| **mesos.cluster.tasks\_lost**(count)                                | Number of lost tasks*Shown as task*                                                                                             |
| **mesos.cluster.tasks\_running**(gauge)                             | Number of running tasks*Shown as task*                                                                                          |
| **mesos.cluster.tasks\_staging**(gauge)                             | Number of staging tasks*Shown as task*                                                                                          |
| **mesos.cluster.tasks\_starting**(gauge)                            | Number of starting tasks*Shown as task*                                                                                         |
| **mesos.cluster.valid\_framework\_to\_executor\_messages**(gauge)   | Number of valid framework messages*Shown as message*                                                                            |
| **mesos.cluster.valid\_status\_update\_acknowledgements**(gauge)    | Number of valid status update acknowledgements                                                                                  |
| **mesos.cluster.valid\_status\_updates**(gauge)                     | Number of valid status updates                                                                                                  |
| **mesos.framework.cpu**(gauge)                                      | Framework cpu                                                                                                                   |
| **mesos.framework.disk**(gauge)                                     | Framework disk*Shown as mebibyte*                                                                                               |
| **mesos.framework.mem**(gauge)                                      | Framework mem*Shown as mebibyte*                                                                                                |
| **mesos.registrar.log.recovered**(gauge)                            | Registrar log recovered                                                                                                         |
| **mesos.registrar.queued\_operations**(gauge)                       | Number of queued operations                                                                                                     |
| **mesos.registrar.registry\_size\_bytes**(gauge)                    | Registry size*Shown as byte*                                                                                                    |
| **mesos.registrar.state\_fetch\_ms**(gauge)                         | Registry read latency*Shown as millisecond*                                                                                     |
| **mesos.registrar.state\_store\_ms**(gauge)                         | Registry write latency*Shown as millisecond*                                                                                    |
| **mesos.registrar.state\_store\_ms.count**(gauge)                   | Registry write count                                                                                                            |
| **mesos.registrar.state\_store\_ms.max**(gauge)                     | Maximum registry write latency*Shown as millisecond*                                                                            |
| **mesos.registrar.state\_store\_ms.min**(gauge)                     | Minimum registry write latency*Shown as millisecond*                                                                            |
| **mesos.registrar.state\_store\_ms.p50**(gauge)                     | Median registry write latency*Shown as millisecond*                                                                             |
| **mesos.registrar.state\_store\_ms.p90**(gauge)                     | 90th percentile registry write latency*Shown as millisecond*                                                                    |
| **mesos.registrar.state\_store\_ms.p95**(gauge)                     | 95th percentile registry write latency*Shown as millisecond*                                                                    |
| **mesos.registrar.state\_store\_ms.p99**(gauge)                     | 99th percentile registry write latency*Shown as millisecond*                                                                    |
| **mesos.registrar.state\_store\_ms.p999**(gauge)                    | 99.9th percentile registry write latency*Shown as millisecond*                                                                  |
| **mesos.registrar.state\_store\_ms.p9999**(gauge)                   | 99.99th percentile registry write latency*Shown as millisecond*                                                                 |
| **mesos.role.cpu**(gauge)                                           | Role cpu                                                                                                                        |
| **mesos.role.disk**(gauge)                                          | Role disk*Shown as mebibyte*                                                                                                    |
| **mesos.role.mem**(gauge)                                           | Role mem*Shown as mebibyte*                                                                                                     |
| **mesos.stats.elected**(gauge)                                      | Whether this is the elected master                                                                                              |
| **mesos.stats.registered**(gauge)                                   | Whether this slave is registered with a master                                                                                  |
| **mesos.stats.system.cpus\_total**(gauge)                           | Number of CPUs available                                                                                                        |
| **mesos.stats.system.load\_15min**(gauge)                           | Load average for the past 15 minutes                                                                                            |
| **mesos.stats.system.load\_1min**(gauge)                            | Load average for the past minutes                                                                                               |
| **mesos.stats.system.load\_5min**(gauge)                            | Load average for the past 5 minutes                                                                                             |
| **mesos.stats.system.mem\_free\_bytes**(gauge)                      | Free memory*Shown as byte*                                                                                                      |
| **mesos.stats.system.mem\_total\_bytes**(gauge)                     | Total memory*Shown as byte*                                                                                                     |
| **mesos.stats.uptime\_secs**(gauge)                                 | Uptime*Shown as second*                                                                                                         |

### Events{% #events %}

The Mesos-master check does not include any events.

### Service Checks{% #service-checks %}

**mesos\_master.can\_connect**

Returns CRITICAL if the Agent cannot connect to the Mesos Master API to collect metrics, UNKNOWN if the master is not detected as the leader, otherwise OK.

*Statuses: ok, critical, unknown*

## Troubleshooting{% #troubleshooting %}

Need help? Contact [Datadog support](https://docs.datadoghq.com/help/).

## Further Reading{% #further-reading %}

- [Installing Datadog on Mesos with DC/OS](https://www.datadoghq.com/blog/deploy-datadog-dcos)
