Create an Agent check for Datadog Agent 5
このページは日本語には対応しておりません。随時翻訳に取り組んでいます。
翻訳に関してご質問やご意見ございましたら、
お気軽にご連絡ください。
This documentation explains how to create an Agent check for Datadog Agent v5, which has been superseded by Agent v6. It is still possible to write your own local checks for v5, however no new integrations for v5 are considered upstream. To learn more about creating integrations for Agent v6, see Create a New Integration.
Requirements
You need a working Ruby environment. For more information on installing Ruby, see Installing Ruby.
You also need Wget. Wget is already installed on most Linux systems. Use Homebrew on Mac or Chocolatey on Windows.
Setup
There is a gem and a set of scripts to help you get set up, ease development, and provide testing. To begin:
- Fork the integrations-extras repository on Github and clone the repository to your dev environment.
- Run
gem install bundler
- Run
bundle install
Once the required Ruby gems have been installed by Bundler, create a Python environment.
Run rake setup_env
. This installs a Python virtual environment along with all the components necessary for integration development (including the core Agent used by the integrations). Some basic software might be needed to install the python dependencies like gcc
and libssl-dev
.
Run source venv/bin/activate
to activate the installed Python virtual environment. To exit the virtual environment, run deactivate
. Learn more about the Python virtual environment on the Virtualenv documentation.
Building an integration
Use rake to generate the skeleton for a new integration by running: rake generate:skeleton[my_integration]
, where my_integration is the name of your new integration (note: enclose your integration name in square brackets).
This creates a new directory, my_integration
, that contains all the files required for your new integration. This also creates an entry for your new integration in the .travis.yml
and circle.yml
continuous integration files to ensure that your tests are run whenever new builds are created.
Integration files
New integrations should contain the following files:
README.md
The README file must provide the following sections:
- Overview (required): Let others know what they can expect to do with your integration.
- Installation (required): Provide information about how to install your integration.
- Configuration (required): Detail any steps necessary to configure your integration or the service you are integrating.
- Validation (required): How can users ensure the integration is working as intended?
- Troubleshooting: Help other users by sharing solutions to common problems they might experience.
- Compatibility (required): List the version(s) of the application or service that your integration has been tested and validated against.
- Metrics (required): Include a list of the metrics your integration provides.
- Events: Include a list of events if your integration provides any.
- Service checks: Include a list of service checks if your integration provides any.
For more information, see Create an Agent-based Integration.
check.py
The file where your check logic should reside. The skeleton function boilerplates an integration class for your integration, including a check
method where you should place your check logic.
For example:
# Example check.py
import time
from checks import AgentCheck
class MyIntegrationCheck(AgentCheck):
def __init__(self, name, init_config, agentConfig, instances=None):
AgentCheck.__init__(self, name, init_config, agentConfig, instances)
def check(self, instance):
# Send a custom event.
self.event({
'timestamp': int(time.time()),
'source_type_name': 'my_integration',
'msg_title': 'Custom event',
'msg_text': 'My custom integration event occurred.',
'host': self.hostname,
'tags': [
'action:my_integration_custom_event',
]
})
For more information about writing integrations and sending metrics with the Datadog Agent, see Introduction to Agent-based Integrations.
If you need to import any third party libraries, add them to the requirements.txt
file.
ci/my_integration.rake
If your tests require a testing environment, use the install
and cleanup
tasks to respectively set up and tear down a testing environment.
For example:
# Example my_integration.rake
namespace :ci do
namespace :my_integration do |flavor|
task install: ['ci:common:install'] do
# Use the Python Virtual Environment and install packages.
use_venv = in_venv
install_requirements('my_integration/requirements.txt',
"--cache-dir #{ENV['PIP_CACHE']}",
"#{ENV['VOLATILE_DIR']}/ci.log",
use_venv)
# Setup a docker testing container.
$(docker run -p 80:80 --name my_int_container -d my_docker)
For more information about writing integration tests, see the documentation in the Datadog Agent repository. You can also reference the ci common library for helper functions such as install_requirements
and sleep_for
.
Note: You may notice the variable flavor
in this file and other areas of testing. Flavor is a term used to denote variations of integrated software, typically versions. This allows you to write one set of tests, but target different flavors, variants, or versions of the software you are integrating.
conf.yaml.example
To install your integration, you need to configure it for your specific instances. To do this, copy the conf.yaml.example
file into your Agent’s conf.d
directory, then update it with your instance specific information.
Your conf.yaml.example
file should provide two sections:
init_config
for any globally configured parametersinstances
for specific instances to integrate. This often includes a server or host address with additional parameters such as authentication information, additional tags and configuration settings.
manifest.json
This JSON file provides metadata about your integration and should include:
maintainer
: Provide a valid email address where you can be contacted regarding this integration.manifest_version
: The version of this manifest file.max_agent_version
: The maximum version of the Datadog Agent that is compatible with your integration. Datadog tries to maintain integration stability within major versions, so you should leave this at the number generated for you. If your integration breaks with a new release of the Datadog Agent, set this number and submit an issue on the Datadog Agent project.min_agent_version
: The minimum version of the Datadog Agent that is compatible with your integration.name
: The name of your integration.short_description
: Provide a short description of your integration.support
: As a community contributed integration, this should be set to “contrib”. Only set this to another value if directed to do so by Datadog staff.version
: The current version of your integration.is_public
: Boolean set to true if your integration is publichas_logo
: Boolean set to true if there is a logo for this integration in /src/images/integrations_logo
type
: checkcategories
: Categories to classify your Integration in the Datadog documentation.
Reference one of the existing integrations for an example of the manifest file.
The metadata CSV contains a list of the metrics your integration provides and basic details that help inform the Datadog web application as to which graphs and alerts can be provided for the metric.
The CSV should include a header row and the following columns:
metric_name
(required): The name of the metric as it should appear on the Datadog site when creating dashboards or monitors. Often this name is a period delimited combination of the provider, service, and metric, for example: aws.ec2.disk_write_ops
) or the application, application feature, and metric, for example: apache.net.request_per_s
.
metric_type
(required): The type of metric you are reporting. This influences how the Datadog web application handles and displays your data. Accepted values are: count
, gauge
, or rate
.
count
: A count is the number of particular events that have occurred. When reporting a count, you should only submit the number of new events (delta) recorded since the previous submission. For example, the aws.apigateway.5xxerror
metric is a count
of the number of server-side errors.gauge
: A gauge is a metric that tracks a value at a specific point in time. For example, docker.io.read_bytes
is a gauge
of the number of bytes read per second.rate
: A rate a metric over time (and as such, typically includes a per_unit_name
value). For example, lighttpd.response.status_2xx
is a rate
metric capturing the number of 2xx status codes produced per second.
interval
: The interval used for conversion between rates and counts. This is required when the metric_type
is set to the rate
type.
unit_name
: The label for the unit of measure you are gathering. The following units (grouped by type) are available:
- Bytes:
bit
, byte
, kibibyte
, mebibyte
, gibibyte
, tebibyte
, pebibyte
, exbibyte
- Cache:
eviction
, get
, hit
, miss
, set
- Database:
assertion
, column
, command
, commit
, cursor
, document
, fetch
, flush
, index
, key
, lock
, merge
, object
, offset
, query
, question
, record
, refresh
, row
, scan
, shard
, table
, ticket
, transaction
, wait
- Disk:
block
, file
, inode
, sector
- Frequency:
hertz
, kilohertz
, megahertz
, gigahertz
- General:
buffer
, check
, email
, error
, event
, garbage
, collection
, item
, location
, monitor
, occurrence
, operation
, read
, resource
, sample
, stage
, task
, time
, unit
, worker
, write
- Memory:
page
, split
- Money:
cent
, dollar
- Network:
connection
, datagram
, message
, packet
, payload
, request
, response
, segment
, timeout
- Percentage:
apdex
, fraction
, percent
, percent_nano
- System:
core
, fault
, host
, instance
, node
, process
, service
, thread
- Time:
microsecond
, millisecond
, second
, minute
, hour
, day
, week
If the unit name is not listed above, leave this value blank. To add a unit to this listing, file an issue
per_unit_name
: If you are gathering a per unit metric, you may provide an additional unit name here and it’s combined with the unit_name
. For example, providing a unit_name
of “request” and a per_unit_name
of “second” results in a metric of “requests per second”. If provided, this must be a value from the available units listed above.
description
: A basic description (limited to 400 characters) of the information this metric represents.
orientation
(required): An integer of -1
, 0
, or 1
.
-1
indicates that smaller values are better. For example, mysql.performance.slow_queries
or varnish.fetch_failed
where low counts are desirable.0
indicates no intrinsic preference in values. For example, rabbitmq.queue.messages
or postgresql.rows_inserted
where there is no preference for the size of the value or the preference depends on the business objectives of the system.1
indicates that larger values are better. For example, mesos.stats.uptime_secs
where higher uptime or mysql.performance.key_cache_utilization
where more cache hits are desired.
integration
(required): This must match the name of your integration, for example: “my_integration”.
short_name
: A more human-readable and abbreviated version of the metric name. For example, postgresql.index_blocks_read
might be set to idx blks read
. Aim for human-readability and easy understandability over brevity. Don’t repeat the integration name. If you can’t make the short_name
shorter and easier to understand than the metric_name
, leave this field empty.
curated_metric
: To mark which metrics for an integration are noteworthy for a given type (cpu
and memory
are both accepted). These are displayed in the UI above the other integration metrics.
requirements.txt
If you require any additional Python libraries, list them in requirements.txt
. The libraries are automatically installed using pip when others use your integration.
test_my_integration.py
Integration tests ensure that the Datadog Agent is correctly receiving and recording metrics from the software you are integrating.
Tests are not required for each of the metrics collected by your integration, but Datadog strongly encourages you to provide as much coverage as possible. Run the self.coverage_report()
method in your test to see which metrics are covered.
Here’s an example test_my_integration.py
:
# Example test_my_integration.py
from nose.plugins.attrib import attr
from checks import AgentCheck
from tests.checks.common import AgentCheckTest
@attr(requires='my_integration')
Class TestMyIntegration(AgentCheckTest):
def testMyIntegration(self):
self.assertServiceCheck('my_integration.can_connect', count=1, status=AgentCheck.OK, tags=[host:localhost', 'port:80'])
self.coverage_report()
For more information about tests and available test methods, reference the AgentCheckTest class in the Datadog Agent repository
Libraries
The Datadog Agent provides several useful libraries in the utils
directory. These libraries can be helpful when building your integration, but be aware that these libraries are moved in the Datadog Agent v6.
Testing your integration
As you build your check and test code, use the following to run your tests:
rake lint
: Lint your code for potential errorsrake ci:run[my_integration]
: Run the tests that you have written in your test_my_integration.py
file and that have an @attr(requires='my_integration')
annotation.rake ci:run[default]
: Run the tests you have written written in your test_my_integration.py
file (without the @attr(requires='my_integration')
annotation) in addition to some additional generic tests.
Travis CI automatically runs tests when you create a pull request. Ensure that you have thorough test coverage and that you are passing all tests prior to submitting pull requests.
Add the @attr(requires='my_integration')
annotation on the test classes or methods that require a full docker testing environment (see next section).
Don’t add this annotation to your unit and mock tests, run those with rake ci:run[default]
on Travis CI.
To iterate quickly on your unit and mock tests, instead of running all the tests with rake ci:run[default]
, run:
# run unit and mock tests, in the virtualenv
$ bundle exec rake exec["nosetests my_integration/test/test_*.py -A 'not requires'"]
Docker test environments
Datadog uses Docker containers for testing environments, which is the recommended approach. Containers are lightweight, easy to manage, and provide consistent, standardized environments for each test run.
For example, the ci/mysql.rake
file for the Datadog MySQL integration uses the official MySQL container and involves four main tasks:
before_install
- Prior to starting the new Docker test environment, ensure that any previous Docker test environments are stopped and removed.install
- The install task performs the Docker run
which starts the MySQL test server.before_script
- This task first ensures that the MySQL server is running, then connects to the server to perform some setup tasks. Keep setup tasks in your test_integration.py
file when possible, but sometimes setup and configurations need to be performed prior to the python test script.cleanup
- After the tests are complete, the Docker test environment is stopped and removed.
Installing your integration locally
When your integration is merged into the integrations-extras
repository, Datadog generates packages so that others can easily install your integration. However, you may want to install your integration locally before it’s merged.
To run locally, first copy your check.py
file into the Datadog Agent’s checks.d
directory and rename it to my_integration.py
(using the actual name of your integration).
Next, copy your conf.yaml.example
file into the Datadog Agent’s conf.d
directory and rename it to my_integration.yaml
(again, using the actual name of your integration).
See Create a New Integration for more information about the Datadog Agent directory structure.
Teardown and cleanup
When you have finished building your integration, run rake clean_env
to remove the Python virtual environment.
Submitting your integration
Once you have completed the development of your integration, submit a pull request to have Datadog review your integration. After your integration is reviewed, Datadog approves and merges your pull request or provides feedback and next steps required for approval.
Other considerations
Consider the following when writing tests:
- Test clusters. Testing single instances of your software is often easier, but tests are more useful when run against setups that are representative of real-world uses. For example, MongoDB is typically used with sharding and replica set features, so the tests reflect that.
- Consider generating calculated metrics in addition to raw metrics. For example, many databases have slow, but less frequently run queries. So it’s often useful to look at percentiles. For example, the Datadog MySQL integration includes a calculated metric for the 95th percentile query execution time.