New announcements for Serverless, Network, RUM, and more from Dash!

Nginx

Agent Check Agent Check

Supported OS: Linux Mac OS Windows

NGINX default dashboard

Overview

The Datadog Agent can collect many metrics from NGINX instances, including (but not limited to)::

  • Total requests
  • Connections (e.g. accepted, handled, active)

For users of NGINX Plus, the commercial version of NGINX, the Agent can collect the significantly more metrics that NGINX Plus provides, like:

  • Errors (e.g. 4xx codes, 5xx codes)
  • Upstream servers (e.g. active connections, 5xx codes, health checks, etc.)
  • Caches (e.g. size, hits, misses, etc.)
  • SSL (e.g. handshakes, failed handshakes, etc.)

Setup

Find below instructions to install and configure the check when running the Agent on a host. See the Autodiscovery Integration Templates documentation to learn how to apply those instructions to a containerized environment.

Installation

The NGINX check is included in the Datadog Agent package, so you don’t need to install anything else on your NGINX servers.

Configuration

The NGINX check pulls metrics from a local NGINX status endpoint, so your nginx binaries need to have been compiled with one of two NGINX status modules:

NGINX Open Source

If you use open source NGINX, your instances may lack the stub status module. Verify that your nginx binary includes the module before proceeding to Configuration:

$ nginx -V 2>&1| grep -o http_stub_status_module
http_stub_status_module

If the command output does not include http_stub_status_module, you must install an NGINX package that includes the module. You can compile your own NGINX-enabling the module as you compile it-but most modern Linux distributions provide alternative NGINX packages with various combinations of extra modules built in. Check your operating system’s NGINX packages to find one that includes the stub status module.

NGINX Plus

NGINX Plus packages prior to release 13 include the http status module. For NGINX Plus release 13 and above, the status module is deprecated and you must use the new Plus API instead. See the announcement for more information.

Prepare NGINX

On each NGINX server, create a status.conf file in the directory that contains your other NGINX configuration files (e.g. /etc/nginx/conf.d/).

server {
  listen 81;
  server_name localhost;

  access_log off;
  allow 127.0.0.1;
  deny all;

  location /nginx_status {
    # Choose your status module

    # freely available with open source NGINX
    stub_status;

    # for open source NGINX < version 1.7.5
    # stub_status on;

    # available only with NGINX Plus
    # status;
  }
}

NGINX Plus

NGINX Plus users can also utilize stub_status, but since that module provides fewer metrics, Datadog recommends using status.

For NGINX Plus releases 15+, the status module is deprecated. Use the http_api_module instead. For example, enable the /api endpoint in your main NGINX configuration file (/etc/nginx/conf.d/default.conf):

  server { 
    listen 8080; 
    location /api { 
      api write=on;
    }
  } 

Reload NGINX to enable the status or API endpoint. There’s no need for a full restart.

sudo nginx -t && sudo nginx -s reload

Metric Collection

  1. Set the nginx_status_url parameter to http://localhost:81/nginx_status/ in your nginx.d/conf.yaml file to start gathering your NGINX metrics. See the sample nginx.d/conf.yaml for all available configuration options.

NGINX Plus

  • For NGINX Plus releases 13+, set the parameter use_plus_api to true in your nginx.d/conf.yaml configuration file.
  • If you are using http_api_module, set the parameter nginx_status_url to the server’s /api location in your nginx.d/conf.yaml configuration file, for example:
  nginx_status_url: http://localhost:8080/api
  1. Optional - If you are using the NGINX vhost_traffic_status module, set the parameter use_vts to true in your nginx.d/conf.yaml configuration file.

  2. Restart the Agent to start sending NGINX metrics to Datadog.

Log Collection

Available for Agent >6.0

  • Collecting logs is disabled by default in the Datadog Agent, you need to enable it in datadog.yaml:
  logs_enabled: true
  • Add this configuration block to your nginx.d/conf.yaml file to start collecting your NGINX Logs:
  logs:
    - type: file
      path: /var/log/nginx/access.log
      service: nginx
      source: nginx
      sourcecategory: http_web_access

    - type: file
      path: /var/log/nginx/error.log
      service: nginx
      source: nginx
      sourcecategory: http_web_access

Change the service and path parameter values and configure them for your environment. See the sample nginx.d/conf.yaml for all available configuration options.

Learn more about log collection in the log documentation

Note: The default NGINX log format does not have a request response time. To include it into your logs, update the NGINX log format by adding the following configuration block in the http section of your NGINX configuration file (/etc/nginx/nginx.conf):

http {
	#recommended log format
	log_format nginx '\$remote_addr - \$remote_user [\$time_local] '
                  '"\$request" \$status \$body_bytes_sent \$request_time '
                  '"\$http_referer" "\$http_user_agent"';

	access_log /var/log/nginx/access.log;
}

Validation

Run the Agent’s status subcommand and look for nginx under the Checks section.

Data Collected

Metrics

nginx.net.writing
(gauge)
The number of connections waiting on upstream responses and/or writing responses back to the client.
shown as connection
nginx.net.waiting
(gauge)
The number of keep-alive connections waiting for work.
shown as connection
nginx.net.reading
(gauge)
The number of connections reading client requests.
shown as connection
nginx.net.connections
(gauge)
The total number of active connections.
shown as connection
nginx.net.request_per_s
(gauge)
Rate of requests processed.
shown as request
nginx.net.conn_opened_per_s
(gauge)
Rate of connections opened.
shown as connection
nginx.net.conn_dropped_per_s
(gauge)
Rate of connections dropped.
shown as connection
nginx.cache.bypass.bytes
(gauge)
The total number of bytes read from the proxied server
shown as byte
nginx.cache.bypass.bytes_count
(count)
The total number of bytes read from the proxied server (shown as count)
shown as byte
nginx.cache.bypass.bytes_written
(gauge)
The total number of bytes written to the cache
shown as byte
nginx.cache.bypass.bytes_written_count
(count)
The total number of bytes written to the cache (shown as count)
shown as byte
nginx.cache.bypass.responses
(gauge)
The total number of responses not taken from the cache
shown as response
nginx.cache.bypass.responses_count
(count)
The total number of responses not taken from the cache (shown as count)
shown as response
nginx.cache.bypass.responses_written
(gauge)
The total number of responses written to the cache
shown as response
nginx.cache.bypass.responses_written_count
(count)
The total number of responses written to the cache (shown as count)
shown as response
nginx.cache.cold
(gauge)
A boolean value indicating whether the “cache loader” process is still loading data from disk into the cache
shown as response
nginx.cache.expired.bytes
(gauge)
The total number of bytes read from the proxied server
shown as byte
nginx.cache.expired.bytes_count
(count)
The total number of bytes read from the proxied server (shown as count)
shown as byte
nginx.cache.expired.bytes_written
(gauge)
The total number of bytes written to the cache
shown as byte
nginx.cache.expired.bytes_written_count
(count)
The total number of bytes written to the cache (shown as count)
shown as byte
nginx.cache.expired.responses
(gauge)
The total number of responses not taken from the cache
shown as response
nginx.cache.expired.responses_count
(count)
The total number of responses not taken from the cache (shown as count)
shown as response
nginx.cache.expired.responses_written
(gauge)
The total number of responses written to the cache
shown as response
nginx.cache.expired.responses_written_count
(count)
The total number of responses written to the cache (shown as count)
shown as response
nginx.cache.hit.bytes
(gauge)
The total number of bytes read from the cache
shown as byte
nginx.cache.hit.bytes_count
(count)
The total number of bytes read from the cache (shown as count)
shown as byte
nginx.cache.hit.responses
(gauge)
The total number of responses read from the cache
shown as response
nginx.cache.hit.responses_count
(count)
The total number of responses read from the cache (shown as count)
shown as response
nginx.cache.max_size
(gauge)
The limit on the maximum size of the cache specified in the configuration
shown as byte
nginx.cache.miss.bytes
(gauge)
The total number of bytes read from the proxied server
shown as byte
nginx.cache.miss.bytes_count
(count)
The total number of bytes read from the proxied server (shown as count)
shown as byte
nginx.cache.miss.bytes_written
(gauge)
The total number of bytes written to the cache
shown as byte
nginx.cache.miss.bytes_written_count
(count)
The total number of bytes written to the cache (shown as count)
shown as byte
nginx.cache.miss.responses
(gauge)
The total number of responses not taken from the cache
shown as response
nginx.cache.miss.responses_count
(count)
The total number of responses not taken from the cache (shown as count)
shown as response
nginx.cache.miss.responses_written
(gauge)
The total number of responses written to the cache
shown as response
nginx.cache.miss.responses_written_count
(count)
The total number of responses written to the cache
shown as response
nginx.cache.revalidated.bytes
(gauge)
The total number of bytes read from the cache
shown as byte
nginx.cache.revalidated.bytes_count
(count)
The total number of bytes read from the cache (shown as count)
shown as byte
nginx.cache.revalidated.response
(gauge)
The total number of responses read from the cache
shown as response
nginx.cache.revalidated.response_count
(count)
The total number of responses read from the cache (shown as count)
shown as response
nginx.cache.size
(gauge)
The current size of the cache
shown as response
nginx.cache.stale.bytes
(gauge)
The total number of bytes read from the cache
shown as byte
nginx.cache.stale.bytes_count
(count)
The total number of bytes read from the cache (shown as count)
shown as byte
nginx.cache.stale.responses
(gauge)
The total number of responses read from the cache
shown as response
nginx.cache.stale.responses_count
(count)
The total number of responses read from the cache (shown as count)
shown as response
nginx.cache.updating.bytes
(gauge)
The total number of bytes read from the cache
shown as byte
nginx.cache.updating.bytes_count
(count)
The total number of bytes read from the cache (shown as count)
shown as byte
nginx.cache.updating.responses
(gauge)
The total number of responses read from the cache
shown as response
nginx.cache.updating.responses_count
(count)
The total number of responses read from the cache (shown as count)
shown as response
nginx.connections.accepted
(gauge)
The total number of accepted client connections.
shown as connection
nginx.connections.accepted_count
(count)
The total number of accepted client connections (shown as count).
shown as connection
nginx.connections.active
(gauge)
The current number of active client connections.
shown as connection
nginx.connections.dropped
(gauge)
The total number of dropped client connections.
shown as connection
nginx.connections.dropped_count
(count)
The total number of dropped client connections (shown as count).
shown as connection
nginx.connections.idle
(gauge)
The current number of idle client connections.
shown as connection
nginx.generation
(gauge)
The total number of configuration reloads
shown as refresh
nginx.generation_count
(count)
The total number of configuration reloads (shown as count)
shown as refresh
nginx.load_timestamp
(gauge)
Time of the last reload of configuration (time since Epoch).
shown as millisecond
nginx.pid
(gauge)
The ID of the worker process that handled status request.
nginx.ppid
(gauge)
The ID of the master process that started the worker process
nginx.processes.respawned
(gauge)
The total number of abnormally terminated and respawned child processes.
shown as process
nginx.processes.respawned_count
(count)
The total number of abnormally terminated and respawned child processes (shown as count).
shown as process
nginx.requests.current
(gauge)
The current number of client requests.
shown as request
nginx.requests.total
(gauge)
The total number of client requests.
shown as request
nginx.requests.total_count
(count)
The total number of client requests (shown as count).
shown as request
nginx.server_zone.discarded
(gauge)
The total number of requests completed without sending a response.
shown as request
nginx.server_zone.discarded_count
(count)
The total number of requests completed without sending a response (shown as count).
shown as request
nginx.server_zone.processing
(gauge)
The number of client requests that are currently being processed.
shown as request
nginx.server_zone.received
(gauge)
The total amount of data received from clients.
shown as byte
nginx.server_zone.received_count
(count)
The total amount of data received from clients (shown as count).
shown as byte
nginx.server_zone.requests
(gauge)
The total number of client requests received from clients.
shown as request
nginx.server_zone.requests_count
(count)
The total number of client requests received from clients (shown as count).
shown as request
nginx.server_zone.responses.1xx
(gauge)
The number of responses with 1xx status code.
shown as response
nginx.server_zone.responses.1xx_count
(count)
The number of responses with 1xx status code (shown as count).
shown as response
nginx.server_zone.responses.2xx
(gauge)
The number of responses with 2xx status code.
shown as response
nginx.server_zone.responses.2xx_count
(count)
The number of responses with 2xx status code (shown as count).
shown as response
nginx.server_zone.responses.3xx
(gauge)
The number of responses with 3xx status code.
shown as response
nginx.server_zone.responses.3xx_count
(count)
The number of responses with 3xx status code (shown as count).
shown as response
nginx.server_zone.responses.4xx
(gauge)
The number of responses with 4xx status code.
shown as response
nginx.server_zone.responses.4xx_count
(count)
The number of responses with 4xx status code (shown as count).
shown as response
nginx.server_zone.responses.5xx
(gauge)
The number of responses with 5xx status code.
shown as response
nginx.server_zone.responses.5xx_count
(count)
The number of responses with 5xx status code (shown as count).
shown as response
nginx.server_zone.responses.total
(gauge)
The total number of responses sent to clients.
shown as response
nginx.server_zone.responses.total_count
(count)
The total number of responses sent to clients (shown as count).
shown as response
nginx.server_zone.sent
(gauge)
The total amount of data sent to clients.
shown as byte
nginx.server_zone.sent_count
(count)
The total amount of data sent to clients (shown as count).
shown as byte
nginx.slab.pages.free
(gauge)
The current number of free memory pages
shown as page
nginx.slab.pages.used
(gauge)
The current number of used memory pages
shown as page
nginx.slab.slots.fails
(gauge)
The number of unsuccessful attempts to allocate memory of specified size
shown as request
nginx.slab.slots.fails_count
(count)
The number of unsuccessful attempts to allocate memory of specified size (shown as count)
shown as request
nginx.slab.slots.free
(gauge)
The current number of free memory slots
nginx.slab.slots.reqs
(gauge)
The total number of attempts to allocate memory of specified size
shown as request
nginx.slab.slots.reqs_count
(count)
The total number of attempts to allocate memory of specified size (shown as count)
shown as request
nginx.slab.slots.used
(gauge)
The current number of used memory slots
nginx.ssl.handshakes
(gauge)
The total number of successful SSL handshakes.
nginx.ssl.handshakes_count
(count)
The total number of successful SSL handshakes (shown as count).
nginx.ssl.handshakes_failed
(gauge)
The total number of failed SSL handshakes.
nginx.ssl.handshakes_failed_count
(count)
The total number of failed SSL handshakes (shown as count).
nginx.ssl.session_reuses
(gauge)
The total number of session reuses during SSL handshake.
nginx.ssl.session_reuses_count
(count)
The total number of session reuses during SSL handshake (shown as count).
nginx.stream.server_zone.connections
(gauge)
The total number of connections accepted from clients
shown as connection
nginx.stream.server_zone.connections_count
(count)
The total number of connections accepted from clients (shown as count)
shown as connection
nginx.stream.server_zone.discarded
(gauge)
The total number of requests completed without sending a response.
shown as request
nginx.stream.server_zone.discarded_count
(count)
The total number of requests completed without sending a response (shown as count).
shown as request
nginx.stream.server_zone.processing
(gauge)
The number of client requests that are currently being processed.
shown as request
nginx.stream.server_zone.received
(gauge)
The total amount of data received from clients.
shown as byte
nginx.stream.server_zone.received_count
(count)
The total amount of data received from clients (shown as count).
shown as byte
nginx.stream.server_zone.sent
(gauge)
The total amount of data sent to clients.
shown as byte
nginx.stream.server_zone.sent_count
(count)
The total amount of data sent to clients (shown as count).
shown as byte
nginx.stream.server_zone.sessions.2xx
(gauge)
The number of responses with 2xx status code.
shown as session
nginx.stream.server_zone.sessions.2xx_count
(count)
The number of responses with 2xx status code (shown as count).
shown as session
nginx.stream.server_zone.sessions.4xx
(gauge)
The number of responses with 4xx status code.
shown as session
nginx.stream.server_zone.sessions.4xx_count
(count)
The number of responses with 4xx status code (shown as count).
shown as session
nginx.stream.server_zone.sessions.5xx
(gauge)
The number of responses with 5xx status code.
shown as session
nginx.stream.server_zone.sessions.5xx_count
(count)
The number of responses with 5xx status code (shown as count).
shown as session
nginx.stream.server_zone.sessions.total
(gauge)
The total number of responses sent to clients.
shown as session
nginx.stream.server_zone.sessions.total_count
(count)
The total number of responses sent to clients (shown as count).
shown as session
nginx.stream.upstream.peers.active
(gauge)
The current number of connections
shown as connection
nginx.stream.upstream.peers.backup
(gauge)
A boolean value indicating whether the server is a backup server.
nginx.stream.upstream.peers.connections
(gauge)
The total number of client connections forwarded to this server.
shown as connection
nginx.stream.upstream.peers.connections_count
(count)
The total number of client connections forwarded to this server (shown as count).
shown as connection
nginx.stream.upstream.peers.downstart
(gauge)
The time (time since Epoch) when the server became “unavail” or “checking” or “unhealthy”
shown as millisecond
nginx.stream.upstream.peers.downtime
(gauge)
Total time the server was in the “unavail” or “checking” or “unhealthy” states.
shown as millisecond
nginx.stream.upstream.peers.fails
(gauge)
The total number of unsuccessful attempts to communicate with the server.
shown as error
nginx.stream.upstream.peers.fails_count
(count)
The total number of unsuccessful attempts to communicate with the server (shown as count).
shown as error
nginx.stream.upstream.peers.health_checks.checks
(gauge)
The total number of health check requests made.
shown as request
nginx.stream.upstream.peers.health_checks.checks_count
(count)
The total number of health check requests made (shown as count).
shown as request
nginx.stream.upstream.peers.health_checks.fails
(gauge)
The number of failed health checks.
shown as error
nginx.stream.upstream.peers.health_checks.fails_count
(count)
The number of failed health checks (shown as count).
shown as error
nginx.stream.upstream.peers.health_checks.last_passed
(gauge)
Boolean indicating if the last health check request was successful and passed tests.
nginx.stream.upstream.peers.health_checks.unhealthy
(gauge)
How many times the server became unhealthy (state “unhealthy”).
nginx.stream.upstream.peers.health_checks.unhealthy_count
(count)
How many times the server became unhealthy (state “unhealthy”) (shown as count).
nginx.stream.upstream.peers.id
(gauge)
The ID of the server.
nginx.stream.upstream.peers.received
(gauge)
The total number of bytes received from this server.
shown as byte
nginx.stream.upstream.peers.received_count
(count)
The total number of bytes received from this server (shown as count).
shown as byte
nginx.stream.upstream.peers.selected
(gauge)
The time (time since Epoch) when the server was last selected to process a connection.
shown as millisecond
nginx.stream.upstream.peers.sent
(gauge)
The total number of bytes sent to this server.
shown as byte
nginx.stream.upstream.peers.sent_count
(count)
The total number of bytes sent to this server (shown as count).
shown as byte
nginx.stream.upstream.peers.unavail
(gauge)
How many times the server became unavailable for client connections (state “unavail”).
nginx.stream.upstream.peers.unavail_count
(count)
How many times the server became unavailable for client connections (state “unavail”) (shown as count).
nginx.stream.upstream.peers.weight
(gauge)
Weight of the server.
nginx.stream.upstream.zombies
(gauge)
The current number of servers removed from the group but still processing active client connections.
shown as host
nginx.timestamp
(gauge)
Current time since Epoch.
shown as millisecond
nginx.upstream.keepalive
(gauge)
The current number of idle keepalive connections.
shown as connection
nginx.upstream.peers.active
(gauge)
The current number of active connections.
shown as connection
nginx.upstream.peers.backup
(gauge)
A boolean value indicating whether the server is a backup server.
nginx.upstream.peers.downstart
(gauge)
The time (since Epoch) when the server became “unavail” or “unhealthy”.
shown as millisecond
nginx.upstream.peers.downtime
(gauge)
Total time the server was in the “unavail” and “unhealthy” states.
shown as millisecond
nginx.upstream.peers.fails
(gauge)
The total number of unsuccessful attempts to communicate with the server.
nginx.upstream.peers.fails_count
(count)
The total number of unsuccessful attempts to communicate with the server (shown as count).
nginx.upstream.peers.health_checks.checks
(gauge)
The total number of health check requests made.
nginx.upstream.peers.health_checks.checks_count
(count)
The total number of health check requests made (shown as count).
nginx.upstream.peers.health_checks.fails
(gauge)
The number of failed health checks.
nginx.upstream.peers.health_checks.fails_count
(count)
The number of failed health checks (shown as count).
nginx.upstream.peers.health_checks.last_passed
(gauge)
Boolean indicating if the last health check request was successful and passed tests.
nginx.upstream.peers.health_checks.unhealthy
(gauge)
How many times the server became unhealthy (state “unhealthy”).
nginx.upstream.peers.health_checks.unhealthy_count
(count)
How many times the server became unhealthy (state “unhealthy”) (shown as count).
nginx.upstream.peers.id
(gauge)
The ID of the server.
nginx.upstream.peers.received
(gauge)
The total amount of data received from this server.
shown as byte
nginx.upstream.peers.received_count
(count)
The total amount of data received from this server (shown as count).
shown as byte
nginx.upstream.peers.requests
(gauge)
The total number of client requests forwarded to this server.
shown as request
nginx.upstream.peers.requests_count
(count)
The total number of client requests forwarded to this server (shown as count).
shown as request
nginx.upstream.peers.responses.1xx
(gauge)
The number of responses with 1xx status code.
shown as response
nginx.upstream.peers.responses.1xx_count
(count)
The number of responses with 1xx status code (shown as count).
shown as response
nginx.upstream.peers.responses.2xx
(gauge)
The number of responses with 2xx status code.
shown as response
nginx.upstream.peers.responses.2xx_count
(count)
The number of responses with 2xx status code (shown as count).
shown as response
nginx.upstream.peers.responses.3xx
(gauge)
The number of responses with 3xx status code.
shown as response
nginx.upstream.peers.responses.3xx_count
(count)
The number of responses with 3xx status code (shown as count).
shown as response
nginx.upstream.peers.responses.4xx
(gauge)
The number of responses with 4xx status code.
shown as response
nginx.upstream.peers.responses.4xx_count
(count)
The number of responses with 4xx status code (shown as count).
shown as response
nginx.upstream.peers.responses.5xx
(gauge)
The number of responses with 5xx status code.
shown as response
nginx.upstream.peers.responses.5xx_count
(count)
The number of responses with 5xx status code (shown as count).
shown as response
nginx.upstream.peers.responses.total
(gauge)
The total number of responses obtained from this server.
shown as response
nginx.upstream.peers.responses.total_count
(count)
The total number of responses obtained from this server (shown as count).
shown as response
nginx.upstream.peers.selected
(gauge)
The time (since Epoch) when the server was last selected to process a request (1.7.5).
shown as millisecond
nginx.upstream.peers.sent
(gauge)
The total amount of data sent to this server.
shown as byte
nginx.upstream.peers.sent_count
(count)
The total amount of data sent to this server (shown as count).
shown as byte
nginx.upstream.peers.unavail
(gauge)
How many times the server became unavailable for client requests (state “unavail”) due to the number of unsuccessful attempts reaching the max_fails threshold.
nginx.upstream.peers.unavail_count
(count)
How many times the server became unavailable for client requests (state “unavail”) due to the number of unsuccessful attempts reaching the max_fails threshold (shown as count).
nginx.upstream.peers.weight
(gauge)
Weight of the server.
nginx.version
(gauge)
Version of nginx.

Not all metrics shown are available to users of open source NGINX. Compare the module reference for stub status (open source NGINX) and http status (NGINX Plus) to understand which metrics are provided by each module.

A few open-source NGINX metrics are named differently in NGINX Plus; they refer to the exact same metric, though:

NGINXNGINX Plus
nginx.net.connectionsnginx.connections.active
nginx.net.conn_opened_per_snginx.connections.accepted
nginx.net.conn_dropped_per_snginx.connections.dropped
nginx.net.request_per_snginx.requests.total

These metrics don’t refer exactly to the same metric, but they are somewhat related:

NGINXNGINX Plus
nginx.net.waitingnginx.connections.idle

Finally, these metrics have no good equivalent:

nginx.net.readingThe current number of connections where nginx is reading the request header.
nginx.net.writingThe current number of connections where nginx is writing the response back to the client.

Events

The NGINX check does not include any events.

Service Checks

nginx.can_connect:

Returns CRITICAL if the Agent cannot connect to NGINX to collect metrics, otherwise OK.

Troubleshooting

Need help? Contact Datadog support.

Further Reading

Learn more about how to monitor NGINX performance metrics thanks to our series of posts. We detail the key performance metrics, how to collect them, and how to use Datadog to monitor NGINX.


Mistake in the docs? Feel free to contribute!