The Service Map for APM is here!

Redis

Agent Check Agent Check

Supported OS: Linux Mac OS Windows

Overview

Whether you use Redis as a database, cache, or message queue, this integration helps you track problems with your Redis servers and the parts of your infrastructure that they serve. The Datadog Agent’s Redis check collects a wealth of metrics related to performance, memory usage, blocked clients, slave connections, disk persistence, expired and evicted keys, and many more.

Setup

Installation

The Redis check is included in the Datadog Agent package, so you don’t need to install anything else on your Redis servers.

Configuration

Edit the redisdb.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting your Redis metrics and logs. See the sample redis.d/conf.yaml for all available configuration options.

Metric Collection

Add this configuration block to your redisdb.d/conf.yaml file to start gathering your Redis metrics:

init_config:

instances:
  - host: localhost
    port: 6379 # or wherever your Redis listens
#   unix_socket_path: /var/run/redis/redis.sock # if your Redis uses a socket instead of TCP
#   password: myredispassword                   # if your Redis requires auth

Configuration Options:

  • unix_socket_path - (Optional) - Can be used instead of host and port.
  • db, password, and socket_timeout - (Optional) - Additional connection options.
  • warn_on_missing_keys - (Optional) - Display a warning in the info page if the keys we’re tracking are missing.
  • slowlog-max-len - (Optional) - Maximum number of entries to fetch from the slow query log. By default, the check will read this value from the Redis config. If it’s above 128, it will default to 128 due to potential increased latency to retrieve more than 128 slowlog entries every 15 seconds. If you need to get more entries from the slow query logs set the value here. Warning: It may impact the performance of your Redis instance
  • command_stats - (Optional) - Collect INFO COMMANDSTATS output as metrics.

See the sample redisdb.d/conf.yaml for all available configuration options.

Restart the Agent to begin sending Redis metrics to Datadog.

Log Collection

Available for Agent >6.0

  • Collecting logs is disabled by default in the Datadog Agent, you need to enable it in datadog.yaml:
  logs_enabled: true
  • Add this configuration block to your redisdb.d/conf.yaml file to start collecting your Redis Logs:
    logs:
        - type: file
          path: /var/log/redis_6379.log
          source: redis
          sourcecategory: database
          service: myapplication

Change the path and service parameter values and configure them for your environment. See the sample redisdb.yaml for all available configuration options.

Validation

Run the Agent’s status subcommand and look for redisdb under the Checks section.

Data Collected

Metrics

redis.aof.buffer_length
(gauge)
Size of the AOF buffer.
shown as byte
redis.aof.last_rewrite_time
(gauge)
Duration of the last AOF rewrite.
shown as second
redis.aof.rewrite
(gauge)
Flag indicating a AOF rewrite operation is on-going.
redis.aof.size
(gauge)
AOF current file size (aof_current_size).
shown as byte
redis.clients.biggest_input_buf
(gauge)
The biggest input buffer among current client connections.
redis.clients.blocked
(gauge)
The number of connections waiting on a blocking call.
shown as connection
redis.clients.longest_output_list
(gauge)
The longest output list among current client connections.
redis.cpu.sys
(gauge)
System CPU consumed by the Redis server.
shown as second
redis.cpu.sys_children
(gauge)
System CPU consumed by the background processes.
shown as second
redis.cpu.user
(gauge)
User CPU consumed by the Redis server.
shown as second
redis.cpu.user_children
(gauge)
User CPU consumed by the background processes.
shown as second
redis.expires
(gauge)
The number of keys that have expired.
shown as key
redis.expires.percent
(gauge)
Percentage of total keys that have been expired.
shown as percent
redis.info.latency_ms
(gauge)
The latency of the redis INFO command.
shown as millisecond
redis.key.length
(gauge)
The number of elements in a given key, tagged by key, e.g. 'key:mykeyname'. Enable in Agent's redisdb.yaml with the keys option.
redis.keys
(gauge)
The total number of keys.
shown as key
redis.keys.evicted
(gauge)
The total number of keys evicted due to the maxmemory limit.
shown as key
redis.keys.expired
(gauge)
The total number of keys expired from the db.
shown as key
redis.mem.fragmentation_ratio
(gauge)
Ratio between used_memory_rss and used_memory.
shown as fraction
redis.mem.lua
(gauge)
Amount of memory used by the Lua engine.
shown as byte
redis.mem.maxmemory
()
-1
redis.mem.peak
(gauge)
The peak amount of memory used by Redis.
shown as byte
redis.mem.rss
(gauge)
Amount of memory that Redis allocated as seen by the os.
shown as byte
redis.mem.used
(gauge)
Amount of memory allocated by Redis.
shown as byte
redis.net.clients
(gauge)
The number of connected clients (excluding slaves).
shown as connection
redis.net.commands
(gauge)
The number of commands processed by the server.
shown as command
redis.net.commands.instantaneous_ops_per_sec
(gauge)
0
shown as second
redis.net.rejected
(gauge)
The number of rejected connections.
shown as connection
redis.net.slaves
(gauge)
The number of connected slaves.
shown as connection
redis.perf.latest_fork_usec
(gauge)
The duration of the latest fork.
shown as microsecond
redis.persist
(gauge)
The number of keys persisted (redis.keys - redis.expires).
shown as key
redis.persist.percent
(gauge)
Percentage of total keys that are persisted.
shown as percent
redis.pubsub.channels
(gauge)
The number of active pubsub channels.
redis.pubsub.patterns
(gauge)
The number of active pubsub patterns.
redis.rdb.bgsave
(gauge)
One if a bgsave is in progress and zero otherwise.
redis.rdb.changes_since_last
(gauge)
The number of changes since the last background save.
redis.rdb.last_bgsave_time
(gauge)
Duration of the last bg_save operation.
shown as second
redis.replication.backlog_histlen
(gauge)
The amount of data in the backlog sync buffer.
shown as byte
redis.replication.delay
(gauge)
The replication delay in offsets.
shown as offset
redis.replication.last_io_seconds_ago
(gauge)
Amount of time since the last interaction with master.
shown as second
redis.replication.master_link_down_since_seconds
(gauge)
Amount of time that the master link has been down.
shown as second
redis.replication.master_repl_offset
(gauge)
The replication offset reported by the master.
shown as offset
redis.replication.slave_repl_offset
(gauge)
The replication offset reported by the slave.
shown as offset
redis.replication.sync
(gauge)
One if a sync is in progress and zero otherwise.
redis.replication.sync_left_bytes
(gauge)
Amount of data left before syncing is complete.
shown as byte
redis.slowlog.micros.95percentile
(gauge)
The 95th percentile of the duration of queries reported in the slow log.
shown as microsecond
redis.slowlog.micros.avg
(gauge)
The average duration of queries reported in the slow log.
shown as microsecond
redis.slowlog.micros.count
(rate)
The rate of queries reported in the slow log.
shown as query
redis.slowlog.micros.max
(gauge)
The maximum duration of queries reported in the slow log.
shown as microsecond
redis.slowlog.micros.median
(gauge)
The median duration of queries reported in the slow log.
shown as microsecond
redis.stats.keyspace_hits
(gauge)
The total number of successful lookups in the db.
shown as key
redis.stats.keyspace_misses
(gauge)
The total number of missed lookups in the db.
shown as key
redis.command.calls
(gauge)
The number of times a redis command has been called, tagged by 'command', e.g. 'command:append'. Enable in Agent's redisdb.yaml with the command_stats option.
redis.command.usec_per_call
(gauge)
The CPU time consumed per redis command call, tagged by 'command', e.g. 'command:append'. Enable in Agent's redisdb.yaml with the command_stats option.

Events

The Redis check does not include any events at this time.

Service Checks

redis.can_connect:

Returns CRITICAL if the Agent cannot connect to Redis to collect metrics, otherwise OK.

redis.replication.master_link_status

Returns CRITICAL if this Redis instance is unable to connect to its master instance. Returns OK otherwise.

Troubleshooting

Agent cannot connect

    redisdb
    -------
      - instance #0 [ERROR]: 'Error 111 connecting to localhost:6379. Connection refused.'
      - Collected 0 metrics, 0 events & 1 service chec

Check that the connection info in redisdb.yaml is correct.

Agent cannot authenticate

    redisdb
    -------
      - instance #0 [ERROR]: 'NOAUTH Authentication required.'
      - Collected 0 metrics, 0 events & 1 service check

Configure a password in redisdb.yaml.

Development

Please refer to the main documentation for more details about how to test and develop Agent based integrations.

Testing Guidelines

This check has 2 test matrix, one detailing the test type:

  • unit tests (no need for a Redis instance running)
  • integration tests (a Redis instance must run locally)

another matrix defines the Redis versions to be used with integration tests:

  • redis 3.2
  • redis 4.0

The first matrix is handled by pytest using mark: tests that need a running redis instance must be decorated like this:

@pytest.mark.integration
def test_something_requiring_redis_running():
  pass

Running the tests with pytest -m"integration" will run only integration tests while pytest -m"not integration" will run whatever was not marked as an integration test.

The second matrix is defined with tox like this:

envlist = unit, redis{32,40}, flake8

...

[testenv:redis32]
setenv = REDIS_VERSION=3.2
...

[testenv:redis40]
setenv = REDIS_VERSION=4.0
...

Integration tests

Redis instances are orchestrated with docker-compose which is now a dependency to run the integration tests. It’s pytest responsible to start/stop/dispose an instance using the fixture concept.

This is how a fixture orchestrating Redis instances looks like:

@pytest.fixture(scope="session")
def redis_auth():
    # omitted docker-compose invokation setup here ...
    subprocess.check_call(args + ["up", "-d"], env=env)
    yield
    subprocess.check_call(args + ["down"], env=env)

the basic concept is that docker-compose up is run right after the fixture is made available to the test function (it blocks on yield). When the test has done, yield unblocks and docker-compose down is called. Notice the scope=session argument passed to the fixture decorator, it allows the yield to block only once for all the tests , unblocking only after the last test: this is useful to avoid having docker-compose up and down called at every test. One caveat with this approach is that if you have data in Redis, some test might operate on a dirty database - this is not an issue in this case but something to keep in mind when using scope=session.

Running the tests locally

Note: you need docker and docker-compose to be installed on your system in order to run the tests locally.

During development, tests can be locally run with tox, same as in the CI. In the case of Redis, there might be no need to test the whole matrix all the times, so for example if you want to run only the unit/mocked tests:

tox -e unit

if you want to run integration tests but against one Redis version only:

tox -e redis40

tox is great because it creates a virtual Python environment for each tox env but if you don’t need this level of isolation you can speed up the development iterations using pytest directly (which is what tox does under the hood):

REDIS_VERSION=4.0 pytest

or if you don’t want to run integration tests:

pytest -m"not integration"

Further Reading

Read our series of blog posts about how to monitor your Redis servers with Datadog. We detail the key performance metrics, how to collect them, and how to use Datadog to monitor Redis.


Mistake in the docs? Feel free to contribute!