This tutorial will walk you through instrumenting your application to send custom metrics to Datadog. If you need some help as you go, pop by #datadog on freenode, where we'll be happy to answer any questions you might have. (There's a web chat client, too.)
The easiest way to get your custom metrics into Datadog is to send them to DogStatsD, a metrics aggregation server bundled with the Datadog Agent (in versions 3.0 and above). DogStatsD implements the StatsD protocol, along with a few extensions for special Datadog features.
DogStatsD accepts custom application metrics points over UDP, and then periodically aggregates and forwards the metrics to Datadog, where they can be graphed on dashboards. Here’s a pretty standard DogStatsd setup:
DogStatsD’s primary function is to aggregate many data points into a single metric for a given interval of time (ten seconds by default). Let’s walk through an example to understand how this works.
Suppose you want to know how many times you are running a database query, your application can tell DogStatsD to increment a counter each time this query is executed. For example:
def query_my_database(): dog.increment('database.query.count') # Run the query ...
If this function is executed one hundred times in a flush interval (ten seconds by default), it will send DogStatsD one hundred UDP packets that say “increment the ‘database.query.count’ counter”. DogStatsD will aggregate these points into a single metric value - 100 in this case - and send it to the server where it can be graphed.
This means expect DogStatsD to produce one point per metric per flush interval while data is being submitted for that metric.
Like StatsD, DogStatsD receives points over UDP. UDP is good fit for application instrumentation because it is a fire and forget protocol. This means your application won’t stop its actual work to wait for a response from the metrics server, which is very important if the metrics server is down or inaccessible.
Once you have the Datadog Agent up and running, grab a DogStatsD client for your language and you’ll be ready to start hacking. Any StatsD client will work just fine, but using a Datadog StatsD client will give you a few extra features (namely tags and histograms, but more on that later).
You can see the list of StatsD clients on our libraries page.
We’ll walk through the types of metrics supported by DogStatsD in Python, but the principles are easily translated into other languages. DogStatsD supports the following types of metrics:
Gauges measure the value of a particular thing at a particular time, like the amount of fuel in a car’s gas tank or the number of users connected to a system.
dog.gauge('gas_tank.level', 0.75) dog.gauge('users.active', 1001)
Counters track how many times something happened per second, like the number of database requests or page views.
dog.increment('database.query.count') dog.increment('page_view.count', 10)
Histograms track the statistical distribution of a set of values, like the duration of a number of database queries or the size of files uploaded by users. Each histogram will track the average, the minimum, the maximum, the median and the 95th percentile.
dog.histogram('database.query.time', 0.5) dog.histogram('file.upload.size', file.get_size())
Histograms are an extension to StatsD, so you’ll need to use a client that supports them.
Sets are used to count the number of unique elements in a group. If you want to track the number of unique visitor to your site, sets are a great way to do that.
Sets are an extension to StatsD, so you’ll need to use a client that supports them.
StatsD only supports histograms for timing, not generic values (like the size of uploaded files or the number of rows returned from a query). Timers are essentially a special case of histograms, so they are treated in the same manner by DogStatsD for backwards compatibility.
The overhead of sending UDP packets can be too great for some performance intensive code paths. To work around this, StatsD clients support sampling, that is to say, only sending metrics a percentage of the time. For example:
dog.histogram('my.histogram', 1, sample_rate=0.5)
will only be sent to the server about half of the time, but it will be multipled by the sample rate to provide an estimate of the real data.
Tags are a Datadog specific extension to StatsD. They allow you to tag a metric with a dimension that’s meaningful to you and slice and dice along that dimension in your graphs. For example, if you wanted to measure the performance of two video rendering algorithms, you could tag the rendering time metric with the version of the algorithm you used.
Since tags are an extension to StatsD, so you’ll need to use a client that supports them.
# Randomly choose which rendering function we want to use ... if random() < 0.5: renderer = old_slow_renderer version = 'old' else: renderer = new_shiny_renderer version = 'new' start_time = time() renderer() duration = time() - start_time dog.histogram('rendering.duration', tags=[version])
You can post events to your Datadog event stream. You can tag them, set priority and even aggregate them with other events.
title(String) — Event title.
text(String) — Event text. Supports line breaks.
Events are aggregated on the Event Stream based on:
event_type is empty, the event will be grouped with other events that don’t have an
# Post a simple message statsd.event('There might be a storm tomorrow', 'A friend warned me earlier.') # Cry for help statsd.event('SO MUCH SNOW', 'The city is paralyzed!', alert_type='error', tags=['urgent', 'endoftheworld'])
date_happened(Time, None) — default: None — Assign a timestamp to the event. Default is now when none.
hostname(String, None) — default: None — Assign a hostname to the event.
aggregation_key(String, None) — default: None — Assign an aggregation key to the event, to group it with some others.
priority(String, None) — default: ‘normal’ — Can be ‘normal’ or ‘low’.
source_type_name(String, None) — default: None — Assign a source type to the event.
alert_type(String, None) — default: ‘info’ — Can be ‘error’, ‘warning’, ‘info’ or ‘success’.
tags- (Array[str], None) — default: None — An array of tags
DogStatsD supports the following options, all of which can be tweaked in the Agent configuration file:
# The port DogStatsD runs on. If you change this, make your the apps sending to # it change as well. dogstatsd_port: 8125 # The number of seconds to wait between flushes to the server. dogstatsd_interval: 10
If you want to send metrics to DogStatsD in your own way, here is the format of the packets:
Here’s breakdown of the fields:
metric.nameshould be a String with no colons, bars or @ characters.
valueshould be a number
msfor Timer or
Here are some example datagrams and comments explaining them:
# Increment the page.views counter. page.views:1|c # Record the fuel tank is half-empty fuel.level:0.5|g # Sample a the song length histogram half of the time. song.length:firstname.lastname@example.org # Track a unique visitor to the site. users.uniques:1234|s # Increment the users online counter tagged by country of origin. users.online:1|c|#country:china # An example putting it all together. users.online:email@example.com|#country:china
If you want to send events to DogStatsD in your own way, here is the format of the packets:
title— Event title.
text— Event text. Supports line breaks.
|d:date_happened— default: None — Assign a timestamp to the event. Default is the current Unix epoch timestamp when not supplied.
|h:hostname— default: None — Assign a hostname to the event.
|k:aggregation_key— default: None — Assign an aggregation key to the event, to group it with some others.
|p:priority— default: ‘normal’ — Can be “normal” or “low”.
|s:source_type_name— default: None — Assign a source type to the event.
|t:alert_type— default: ‘info’ — Can be “error”, “warning”, “info” or “success”.
|#tag1:value1,tag2,tag3:value3— default: None.
:in tags is part of the tag list string and has no parsing purpose like for the other parameters.
DogStatsD is open-sourced under the BSD License. Check out the source here.