Logging is here!

Nginx

Agent Check Agent Check

Supported OS: Linux Mac OS Windows

NGINX default dashboard

Overview

The Datadog Agent can collect many metrics from NGINX instances, including:

  • Total requests
  • Connections (accepted, handled, active)

For users of NGINX Plus, the commercial version of NGINX, the Agent can collect the significantly more metrics that NGINX Plus provides, like:

  • Errors (4xx codes, 5xx codes)
  • Upstream servers (active connections, 5xx codes, health checks, etc)
  • Caches (size, hits, misses, etc)
  • SSL (handshakes, failed handshakes, etc)

And many more.

Setup

Installation

The NGINX check is packaged with the Agent, so simply install the Agent on your NGINX servers.

NGINX status module

The NGINX check pulls metrics from a local NGINX status endpoint, so your nginx binaries need to have been compiled with one of two NGINX status modules:

NGINX Plus packages always include the http status module, so if you’re a Plus user, skip to Configuration now. For NGINX Plus release 13 and above, the status module is deprecated and you should use the new Plus API instead. See the announcement for more information.

If you use open source NGINX, however, your instances may lack the stub status module. Verify that your nginx binary includes the module before proceeding to Configuration:

$ nginx -V 2>&1| grep -o http_stub_status_module
http_stub_status_module

If the command output does not include http_stub_status_module, you must install an NGINX package that includes the module. You can compile your own NGINX—enabling the module as you compile it—but most modern Linux distributions provide alternative NGINX packages with various combinations of extra modules built in. Check your operating system’s NGINX packages to find one that includes the stub status module.

Configuration

Create a nginx.yaml file in the Agent’s conf.d directory.

Prepare NGINX

On each NGINX server, create a status.conf file in the directory that contains your other NGINX configuration files (e.g. /etc/nginx/conf.d/).

server {
  listen 81;
  server_name localhost;

  access_log off;
  allow 127.0.0.1;
  deny all;

  location /nginx_status {
    # Choose your status module

    # freely available with open source NGINX
    stub_status;

    # for open source NGINX < version 1.7.5
    # stub_status on;

    # available only with NGINX Plus
    # status;
  }
}

NGINX Plus can also use stub_status, but since that module provides fewer metrics, you should use status if you’re a Plus user.

You may optionally configure HTTP basic authentication in the server block, but since the service is only listening locally, it’s not necessary.

Reload NGINX to enable the status endpoint. (There’s no need for a full restart)

Metric Collection

  • Add this configuration setup to your nginx.yaml file to start gathering your NGINX metrics:
  init_config:

  instances:
    - nginx_status_url: http://localhost:81/nginx_status/
    # If you configured the endpoint with HTTP basic authentication
    # user: <USER>
    # password: <PASSWORD>

See the sample nginx.yaml for all available configuration options.

Log Collection

Available for Agent >6.0

  • Collecting logs is disabled by default in the Datadog Agent, you need to enable it in datadog.yaml:
  logs_enabled: true
  • Add this configuration setup to your nginx.yaml file to start collecting your NGINX Logs:
  logs:
    - type: file
      path: /var/log/nginx/access.log
      service: nginx
      source: nginx
      sourcecategory: http_web_access

    - type: file
      path: /var/log/nginx/error.log
      service: nginx
      source: nginx
      sourcecategory: http_web_access

Change the service and path parameter values and configure them for your environment. See the sample nginx.yaml for all available configuration options.

Learn more about log collection on the log documentation

Validation

Run the Agent’s status subcommand and look for nginx under the Checks section:

  Checks
  ======
    [...]

    nginx
    -----
      - instance #0 [OK]
      - Collected 7 metrics, 0 events & 1 service check

    [...]

Compatibility

The NGINX check is compatible with all major platforms.

Data Collected

Metrics

nginx.net.writing
(gauge)
The number of connections waiting on upstream responses and/or writing responses back to the client.
shown as connection
nginx.net.waiting
(gauge)
The number of keep-alive connections waiting for work.
shown as connection
nginx.net.reading
(gauge)
The number of connections reading client requets.
shown as connection
nginx.net.connections
(gauge)
The total number of active connections.
shown as connection
nginx.net.request_per_s
(gauge)
Rate of requests processed.
shown as request
nginx.net.conn_opened_per_s
(gauge)
Rate of connections opened.
shown as connection
nginx.net.conn_dropped_per_s
(gauge)
Rate of connections dropped.
shown as connection
nginx.cache.bypass.bytes
(gauge)
The total number of bytes read from the proxied server
shown as byte
nginx.cache.bypass.bytes_written
(gauge)
The total number of bytes written to the cache
shown as byte
nginx.cache.bypass.responses
(gauge)
The total number of responses not taken from the cache
shown as response
nginx.cache.bypass.responses_written
(gauge)
The total number of responses written to the cache
shown as response
nginx.cache.cold
(gauge)
A boolean value indicating whether the “cache loader” process is still loading data from disk into the cache
shown as response
nginx.cache.expired.bytes
(gauge)
The total number of bytes read from the proxied server
shown as byte
nginx.cache.expired.bytes_written
(gauge)
The total number of bytes written to the cache
shown as byte
nginx.cache.expired.responses
(gauge)
The total number of responses not taken from the cache
shown as response
nginx.cache.expired.responses_written
(gauge)
The total number of responses written to the cache
shown as response
nginx.cache.hit.bytes
(gauge)
The total number of bytes read from the cache
shown as byte
nginx.cache.hit.responses
(gauge)
The total number of responses read from the cache
shown as response
nginx.cache.max_size
(gauge)
The limit on the maximum size of the cache specified in the configuration
shown as byte
nginx.cache.miss.bytes
(gauge)
The total number of bytes read from the proxied server
shown as byte
nginx.cache.miss.bytes_written
(gauge)
The total number of bytes written to the cache
shown as byte
nginx.cache.miss.responses
(gauge)
The total number of responses not taken from the cache
shown as response
nginx.cache.miss.responses_written
(gauge)
The total number of responses written to the cache
shown as response
nginx.cache.revalidated.bytes
(gauge)
The total number of bytes read from the cache
shown as byte
nginx.cache.revalidated.response
(gauge)
The total number of responses read from the cache
shown as responses
nginx.cache.size
(gauge)
The current size of the cache
shown as response
nginx.cache.stale.bytes
(gauge)
The total number of bytes read from the cache
shown as byte
nginx.cache.stale.responses
(gauge)
The total number of responses read from the cache
shown as response
nginx.cache.updating.bytes
(gauge)
The total number of bytes read from the cache
shown as byte
nginx.cache.updating.responses
(gauge)
The total number of responses read from the cache
shown as response
nginx.connections.accepted
(gauge)
The total number of accepted client connections.
shown as connection
nginx.connections.active
(gauge)
The current number of active client connections.
shown as connection
nginx.connections.dropped
(gauge)
The total number of dropped client connections.
shown as connection
nginx.connections.idle
(gauge)
The current number of idle client connections.
shown as connection
nginx.generation
(gauge)
The total number of configuration reloads
shown as reload
nginx.load_timestamp
(gauge)
Time of the last reload of configuration (time since Epoch).
shown as millisecond
nginx.pid
(gauge)
The ID of the worker process that handled status request.
nginx.ppid
(gauge)
The ID of the master process that started the worker process
nginx.processes.respawned
(gauge)
The total number of abnormally terminated and respawned child processes.
shown as process
nginx.requests.current
(gauge)
The current number of client requests.
shown as request
nginx.requests.total
(gauge)
The total number of client requests.
shown as request
nginx.server_zone.discarded
(gauge)
The total number of requests completed without sending a response.
shown as request
nginx.server_zone.processing
(gauge)
The number of client requests that are currently being processed.
shown as request
nginx.server_zone.received
(gauge)
The total amount of data received from clients.
shown as byte
nginx.server_zone.requests
(gauge)
The total number of client requests received from clients.
shown as request
nginx.server_zone.responses.1xx
(gauge)
The number of responses with 1xx status code.
shown as response
nginx.server_zone.responses.2xx
(gauge)
The number of responses with 2xx status code.
shown as response
nginx.server_zone.responses.3xx
(gauge)
The number of responses with 3xx status code.
shown as response
nginx.server_zone.responses.4xx
(gauge)
The number of responses with 4xx status code.
shown as response
nginx.server_zone.responses.5xx
(gauge)
The number of responses with 5xx status code.
shown as response
nginx.server_zone.responses.total
(gauge)
The total number of responses sent to clients.
shown as response
nginx.server_zone.sent
(gauge)
The total amount of data sent to clients.
shown as byte
nginx.slab.pages.free
(gauge)
The current number of free memory pages
shown as page
nginx.slab.pages.used
(gauge)
The current number of used memory pages
shown as page
nginx.slab.slots.fails
(gauge)
The number of unsuccessful attempts to allocate memory of specified size
shown as request
nginx.slab.slots.free
(gauge)
The current number of free memory slots
shown as slot
nginx.slab.slots.reqs
(gauge)
The total number of attempts to allocate memory of specified size
shown as request
nginx.slab.slots.used
(gauge)
The current number of used memory slots
shown as slot
nginx.ssl.handshakes
(gauge)
The total number of successful SSL handshakes.
nginx.ssl.handshakes_failed
(gauge)
The total number of failed SSL handshakes.
nginx.ssl.session_reuses
(gauge)
The total number of session reuses during SSL handshake.
nginx.stream.server_zone.connections
(gauge)
The total number of connections accepted from clients
shown as connection
nginx.stream.server_zone.discarded
(gauge)
The total number of requests completed without sending a response.
shown as request
nginx.stream.server_zone.processing
(gauge)
The number of client requests that are currently being processed.
shown as request
nginx.stream.server_zone.received
(gauge)
The total amount of data received from clients.
shown as byte
nginx.stream.server_zone.sent
(gauge)
The total amount of data sent to clients.
shown as byte
nginx.stream.server_zone.sessions.2xx
(gauge)
The number of responses with 2xx status code.
shown as session
nginx.stream.server_zone.sessions.4xx
(gauge)
The number of responses with 4xx status code.
shown as session
nginx.stream.server_zone.sessions.5xx
(gauge)
The number of responses with 5xx status code.
shown as session
nginx.stream.server_zone.sessions.total
(gauge)
The total number of responses sent to clients.
shown as session
nginx.stream.upstream.peers.active
(gauge)
The current number of connections
shown as connection
nginx.stream.upstream.peers.backup
(gauge)
A boolean value indicating whether the server is a backup server.
nginx.stream.upstream.peers.connections
(gauge)
The total number of client connections forwarded to this server.
shown as connection
nginx.stream.upstream.peers.downstart
(gauge)
The time (time since Epoch) when the server became “unavail” or “checking” or “unhealthy”
shown as millisecond
nginx.stream.upstream.peers.downtime
(gauge)
Total time the server was in the “unavail” or “checking” or “unhealthy” states.
shown as millisecond
nginx.stream.upstream.peers.fails
(gauge)
The total number of unsuccessful attempts to communicate with the server.
shown as fail
nginx.stream.upstream.peers.health_checks.checks
(gauge)
The total number of health check requests made.
shown as request
nginx.stream.upstream.peers.health_checks.fails
(gauge)
The number of failed health checks.
shown as fail
nginx.stream.upstream.peers.health_checks.last_passed
(gauge)
Boolean indicating if the last health check request was successful and passed tests.
nginx.stream.upstream.peers.health_checks.unhealthy
(gauge)
How many times the server became unhealthy (state “unhealthy”).
nginx.stream.upstream.peers.id
(gauge)
The ID of the server.
nginx.stream.upstream.peers.received
(gauge)
The total number of bytes received from this server.
shown as byte
nginx.stream.upstream.peers.selected
(gauge)
The time (time since Epoch) when the server was last selected to process a connection.
shown as millisecond
nginx.stream.upstream.peers.sent
(gauge)
The total number of bytes sent to this server.
shown as byte
nginx.stream.upstream.peers.unavail
(gauge)
How many times the server became unavailable for client connections (state “unavail”).
nginx.stream.upstream.peers.weight
(gauge)
Weight of the server.
nginx.stream.upstream.zombies
(gauge)
The current number of servers removed from the group but still processing active client connections.
shown as server
nginx.timestamp
(gauge)
Current time since Epoch.
shown as millisecond
nginx.upstream.keepalive
(gauge)
The current number of idle keepalive connections.
shown as connection
nginx.upstream.peers.active
(gauge)
The current number of active connections.
shown as connection
nginx.upstream.peers.backup
(gauge)
A boolean value indicating whether the server is a backup server.
nginx.upstream.peers.downstart
(gauge)
The time (since Epoch) when the server became “unavail” or “unhealthy”.
shown as millisecond
nginx.upstream.peers.downtime
(gauge)
Total time the server was in the “unavail” and “unhealthy” states.
shown as millisecond
nginx.upstream.peers.fails
(gauge)
The total number of unsuccessful attempts to communicate with the server.
nginx.upstream.peers.health_checks.checks
(gauge)
The total number of health check requests made.
nginx.upstream.peers.health_checks.fails
(gauge)
The number of failed health checks.
nginx.upstream.peers.health_checks.last_passed
(gauge)
Boolean indicating if the last health check request was successful and passed tests.
nginx.upstream.peers.health_checks.unhealthy
(gauge)
How many times the server became unhealthy (state “unhealthy”).
nginx.upstream.peers.id
(gauge)
The ID of the server.
nginx.upstream.peers.received
(gauge)
The total amount of data received from this server.
shown as byte
nginx.upstream.peers.requests
(gauge)
The total number of client requests forwarded to this server.
shown as request
nginx.upstream.peers.responses.1xx
(gauge)
The number of responses with 1xx status code.
shown as response
nginx.upstream.peers.responses.1xx_count
(count)
The number of responses with 1xx status code (shown as count).
shown as response
nginx.upstream.peers.responses.2xx
(gauge)
The number of responses with 2xx status code.
shown as response
nginx.upstream.peers.responses.2xx_count
(count)
The number of responses with 2xx status code (shown as count).
shown as response
nginx.upstream.peers.responses.3xx
(gauge)
The number of responses with 3xx status code.
shown as response
nginx.upstream.peers.responses.3xx_count
(count)
The number of responses with 3xx status code (shown as count).
shown as response
nginx.upstream.peers.responses.4xx
(gauge)
The number of responses with 4xx status code.
shown as response
nginx.upstream.peers.responses.4xx_count
(count)
The number of responses with 4xx status code (shown as count).
shown as response
nginx.upstream.peers.responses.5xx
(gauge)
The number of responses with 5xx status code.
shown as response
nginx.upstream.peers.responses.5xx_count
(count)
The number of responses with 5xx status code (shown as count).
shown as response
nginx.upstream.peers.responses.total
(gauge)
The total number of responses obtained from this server.
shown as response
nginx.upstream.peers.selected
(gauge)
The time (since Epoch) when the server was last selected to process a request (1.7.5).
shown as millisecond
nginx.upstream.peers.sent
(gauge)
The total amount of data sent to this server.
shown as byte
nginx.upstream.peers.unavail
(gauge)
How many times the server became unavailable for client requests (state “unavail”) due to the number of unsuccessful attempts reaching the max_fails threshold.
nginx.upstream.peers.weight
(gauge)
Weight of the server.
nginx.version
(gauge)
Version of nginx.

Not all metrics shown are available to users of open source NGINX. Compare the module reference for stub status (open source NGINX) and http status (NGINX Plus) to understand which metrics are provided by each module.

A few open-source NGINX metrics are named differently in NGINX Plus; they refer to the exact same metric, though:

NGINX NGINX Plus
nginx.net.connections nginx.connections.active
nginx.net.conn_opened_per_s nginx.connections.accepted
nginx.net.conn_dropped_per_s nginx.connections.dropped
nginx.net.request_per_s nginx.requests.total

These metrics don’t refer exactly to the same metric, but they are somewhat related:

NGINX NGINX Plus
nginx.net.waiting nginx.connections.idle

Finally, these metrics have no good equivalent:

nginx.net.reading The current number of connections where nginx is reading the request header.
nginx.net.writing The current number of connections where nginx is writing the response back to the client.

Events

The NGINX check does not include any event at this time.

Service Checks

nginx.can_connect:

Returns CRITICAL if the Agent cannot connect to NGINX to collect metrics, otherwise OK.

Troubleshooting

You may observe one of these common problems in the output of the Datadog Agent’s info subcommand.

Agent cannot connect

  Checks
  ======

    nginx
    -----
      - instance #0 [ERROR]: "('Connection aborted.', error(111, 'Connection refused'))"
      - Collected 0 metrics, 0 events & 1 service check

Either NGINX’s local status endpoint is not running, or the Agent is not configured with correct connection information for it.

Check that the main nginx.conf includes a line like the following:

http{

  ...

  include <directory_that_contains_status.conf>/*.conf;
  # e.g.: include /etc/nginx/conf.d/*.conf;
}

Otherwise, review the Configuration section.

Further Reading

Knowledge Base

The data pulled from the NGINX Plus status page are described in the NGINX docs.

Datadog Blog

Learn more about how to monitor NGINX performance metrics thanks to our series of posts. We detail the key performance metrics, how to collect them, and how to use Datadog to monitor NGINX.