---
title: Mesos Slave
description: >-
  Track cluster resource usage, master and slave counts, tasks statuses, and
  more.
breadcrumbs: Docs > Integrations > Mesos Slave
---

# Mesos Slave
Supported OS Integration version6.4.0


## Overview{% #overview %}

This Agent check collects metrics from Mesos slaves for:

- System load
- Number of tasks failed, finished, staged, running, etc
- Number of executors running, terminated, etc

And many more.

This check also creates a service check for every executor task.

**Minimum Agent version:** 6.0.0

## Setup{% #setup %}

### Installation{% #installation %}

See [Installing Datadog on Mesos with DC/OS](https://www.datadoghq.com/blog/deploy-datadog-dcos) to install the Datadog Agent on each Mesos agent node with the DC/OS web UI.

### Configuration{% #configuration %}

#### DC/OS{% #dcos %}

1. In the DC/OS web UI, click on the **Universe** tab. Find the **datadog** package and click the Install button.
1. Click the **Advanced Installation** button.
1. Enter your Datadog API Key in the first field.
1. In the Instances field, enter the number of slave nodes in your cluster (You can determine the number of nodes in your cluster by clicking the Nodes tab on the left side of the DC/OS web ui).
1. Click **Review and Install** then **Install**

#### Marathon{% #marathon %}

If you are not using DC/OS, use the Marathon web UI or post to the API URL the following JSON to define the Datadog Agent. You must change `<YOUR_DATADOG_API_KEY>` with your API Key and the number of instances with the number of slave nodes on your cluster. You may also need to update the docker image used to more recent tag. You can find the latest [on Docker Hub](https://hub.docker.com/r/datadog/agent/tags)

```json
{
  "id": "/datadog-agent",
  "cmd": null,
  "cpus": 0.05,
  "mem": 256,
  "disk": 0,
  "instances": 1,
  "constraints": [
    ["hostname", "UNIQUE"],
    ["hostname", "GROUP_BY"]
  ],
  "acceptedResourceRoles": ["slave_public", "*"],
  "container": {
    "type": "DOCKER",
    "volumes": [
      {
        "containerPath": "/var/run/docker.sock",
        "hostPath": "/var/run/docker.sock",
        "mode": "RO"
      },
      { "containerPath": "/host/proc", "hostPath": "/proc", "mode": "RO" },
      {
        "containerPath": "/host/sys/fs/cgroup",
        "hostPath": "/sys/fs/cgroup",
        "mode": "RO"
      }
    ],
    "docker": {
      "image": "datadog/agent:latest",
      "network": "BRIDGE",
      "portMappings": [
        {
          "containerPort": 8125,
          "hostPort": 8125,
          "servicePort": 10000,
          "protocol": "udp",
          "labels": {}
        }
      ],
      "privileged": false,
      "parameters": [
        { "key": "name", "value": "datadog-agent" },
        { "key": "env", "value": "DD_API_KEY=<YOUR_DATADOG_API_KEY>" },
        { "key": "env", "value": "MESOS_SLAVE=true" }
      ],
      "forcePullImage": false
    }
  },
  "healthChecks": [
    {
      "protocol": "COMMAND",
      "command": { "value": "/probe.sh" },
      "gracePeriodSeconds": 300,
      "intervalSeconds": 60,
      "timeoutSeconds": 20,
      "maxConsecutiveFailures": 3
    }
  ],
  "portDefinitions": [
    { "port": 10000, "protocol": "tcp", "name": "default", "labels": {} },
    { "port": 10001, "protocol": "tcp", "labels": {} }
  ]
}
```

Unless you want to configure a custom `mesos_slave.d/conf.yaml`-perhaps you need to set `disable_ssl_validation: true`-you don't need to do anything after installing the Agent.

#### Log collection{% #log-collection %}

1. Collecting logs is disabled by default in the Datadog Agent, enable it in your `datadog.yaml` file:

   ```yaml
   logs_enabled: true
   ```

1. Add this configuration block to your `mesos_slave.d/conf.yaml` file to start collecting your Mesos logs:

   ```yaml
   logs:
     - type: file
       path: /var/log/mesos/*
       source: mesos
   ```

Change the `path` parameter value based on your environment, or use the default docker stdout:

   ```yaml
   logs:
     - type: docker
       source: mesos
   ```

See the [sample mesos_slave.d/conf.yaml](https://github.com/DataDog/integrations-core/blob/master/mesos_slave/datadog_checks/mesos_slave/data/conf.yaml.example) for all available configuration options.

1. [Restart the Agent](https://docs.datadoghq.com/agent/guide/agent-commands.md#start-stop-and-restart-the-agent).

To enable logs for Kubernetes environments, see [Kubernetes Log Collection](https://docs.datadoghq.com/agent/kubernetes/log.md).

### Validation{% #validation %}

#### DC/OS{% #dcos-1 %}

Under the Services tab in the DC/OS web UI you should see the Datadog Agent shown. In Datadog, search for `mesos.slave` in the Metrics Explorer.

#### Marathon{% #marathon-1 %}

If you are not using DC/OS, then datadog-agent is in the list of running applications with a healthy status. In Datadog, search for `mesos.slave` in the Metrics Explorer.

## Data Collected{% #data-collected %}

### Metrics{% #metrics %}

|  |
|  |
| **mesos.slave.cpus\_percent**(gauge)                | Percentage of allocated CPUs*Shown as percent*                     |
| **mesos.slave.cpus\_total**(gauge)                  | Number of CPUs                                                     |
| **mesos.slave.cpus\_used**(gauge)                   | Number of allocated CPUs                                           |
| **mesos.slave.disk\_percent**(gauge)                | Percentage of allocated disk space*Shown as percent*               |
| **mesos.slave.disk\_total**(gauge)                  | Disk space*Shown as mebibyte*                                      |
| **mesos.slave.disk\_used**(gauge)                   | Allocated disk space*Shown as mebibyte*                            |
| **mesos.slave.executors\_registering**(gauge)       | Number of executors registering                                    |
| **mesos.slave.executors\_running**(gauge)           | Number of executors running                                        |
| **mesos.slave.executors\_terminated**(gauge)        | Number of terminated executors                                     |
| **mesos.slave.executors\_terminating**(gauge)       | Number of terminating executors                                    |
| **mesos.slave.frameworks\_active**(gauge)           | Number of active frameworks                                        |
| **mesos.slave.gpus\_percent**(gauge)                | Percentage of allocated GPUs*Shown as percent*                     |
| **mesos.slave.gpus\_total**(gauge)                  | Number of GPUs                                                     |
| **mesos.slave.gpus\_used**(gauge)                   | Number of allocated GPUs                                           |
| **mesos.slave.invalid\_framework\_messages**(gauge) | Number of invalid framework messages*Shown as message*             |
| **mesos.slave.invalid\_status\_updates**(gauge)     | Number of invalid status updates                                   |
| **mesos.slave.mem\_percent**(gauge)                 | Percentage of allocated memory*Shown as percent*                   |
| **mesos.slave.mem\_total**(gauge)                   | Total memory*Shown as mebibyte*                                    |
| **mesos.slave.mem\_used**(gauge)                    | Allocated memory*Shown as mebibyte*                                |
| **mesos.slave.recovery\_errors**(gauge)             | Number of errors encountered during slave recovery*Shown as error* |
| **mesos.slave.tasks\_failed**(count)                | Number of failed tasks*Shown as task*                              |
| **mesos.slave.tasks\_finished**(count)              | Number of finished tasks*Shown as task*                            |
| **mesos.slave.tasks\_killed**(count)                | Number of killed tasks*Shown as task*                              |
| **mesos.slave.tasks\_lost**(count)                  | Number of lost tasks*Shown as task*                                |
| **mesos.slave.tasks\_running**(gauge)               | Number of running tasks*Shown as task*                             |
| **mesos.slave.tasks\_staging**(gauge)               | Number of staging tasks*Shown as task*                             |
| **mesos.slave.tasks\_starting**(gauge)              | Number of starting tasks*Shown as task*                            |
| **mesos.slave.valid\_framework\_messages**(gauge)   | Number of valid framework messages*Shown as message*               |
| **mesos.slave.valid\_status\_updates**(gauge)       | Number of valid status updates                                     |
| **mesos.state.task.cpu**(gauge)                     | Task cpu                                                           |
| **mesos.state.task.disk**(gauge)                    | Task disk*Shown as mebibyte*                                       |
| **mesos.state.task.mem**(gauge)                     | Task memory*Shown as mebibyte*                                     |
| **mesos.stats.registered**(gauge)                   | Whether this slave is registered with a master                     |
| **mesos.stats.system.cpus\_total**(gauge)           | Number of CPUs available                                           |
| **mesos.stats.system.load\_15min**(gauge)           | Load average for the past 15 minutes                               |
| **mesos.stats.system.load\_1min**(gauge)            | Load average for the past minutes                                  |
| **mesos.stats.system.load\_5min**(gauge)            | Load average for the past 5 minutes                                |
| **mesos.stats.system.mem\_free\_bytes**(gauge)      | Free memory*Shown as byte*                                         |
| **mesos.stats.system.mem\_total\_bytes**(gauge)     | Total memory*Shown as byte*                                        |
| **mesos.stats.uptime\_secs**(gauge)                 | Slave uptime                                                       |

### Events{% #events %}

The Mesos-slave check does not include any events.

### Service Checks{% #service-checks %}

**mesos\_slave.can\_connect**

Returns CRITICAL if the Agent cannot connect to the Mesos slave metrics endpoint, otherwise OK.

*Statuses: ok, critical*

## Troubleshooting{% #troubleshooting %}

Need help? Contact [Datadog support](https://docs.datadoghq.com/help/).

## Further Reading{% #further-reading %}

- [Installing Datadog on Mesos with DC/OS](https://www.datadoghq.com/blog/deploy-datadog-dcos)
