Here are the components involved when submitting data via DogStatsD:
Generally the DogStatsD client (i.e. your code) and the DogStatsD server (i.e. the Agent) run on the same host, but they could be on different hosts as well: many DogStatsD clients offer the possibility to configure the host and port to which send the DogStatsD UDP packets.
The sample rate is meant to reduce the traffic from your DogStatsD client and the Agent. A sample rate of 0.5 cuts the number of UDP packets sent in half.
It’s not useful in all cases, but can be interesting if you sample many metrics, and your DogStatsD client is not on the same host as the DogStatsD server. This is a trade off: you decrease traffic but slightly lose in precision/granularity.
DogStatsD client side: If you sample counter metrics (“increment”) with a sample rate 0.5 in your code, the DogStatsD client actually sends this increment data only 50% of the time.
DogStatsD server side: when receiving the counter value, the Datadog Agent reads the sample rate information and sees the value has actually been sampled twice. It performs a simple correction: it multiplies the value received by 2.
Counter: values received are multiplied by (1/sample_rate), because it’s reasonable to suppose in most cases that for 1 datapoint received, 1/sample_rate were actually sampled with the same value.
Gauge: no correction. The value received is kept as it is.
Set: no correction. The value received is kept as it is.
histogram.count statistic is a counter metric, and receives the correction outlined above. Other statistics are gauge metrics and cannot be “corrected.”
Dividing the traffic by 2 using the Python StatsD library:
from datadog import statsd # half the increment is sent; the dd-agent compensates by multiplying by 2 the value it gets statsd.increment('my.metric_name',1,sample_rate=0.5) # remember: for gauge metrics, half the values are sent, but no good "compensation" can be done on the dd-agent side, you just lose in granularity. statsd.gauge('foo', 42,sample_rate=0.5)
Note 1: Don’t change the value you send, only adjust the sample_rate. Note 2: Using low sample rates decreases the precision of the collection. It’s not recommended unless you have a lot of data sampled by your code.