Datadog-PostgreSQL Integration

PostgreSQL Graph

Overview

Get metrics from postgres service in real time to:

  • Visualize and monitor postgres states
  • Be notified about postgres failovers and events.

Setup

Installation

Install the dd-check-postgres package manually or with your favorite configuration manager

To get started with the PostgreSQL integration, create at least a read-only datadog user with proper access to your PostgreSQL Server. Start psql on your PostgreSQL database and run:

    create user datadog with password '<PASSWORD>';
    grant SELECT ON pg_stat_database to datadog;

To verify the correct permissions run the following command:

    psql -h localhost -U datadog postgres -c \
    "select * from pg_stat_database LIMIT(1);"
    && echo -e "\e[0;32mPostgres connection - OK\e[0m" || \
    || echo -e "\e[0;31mCannot connect to Postgres\e[0m"

When it prompts for a password, enter the one used in the first command.

Configuration

Edit the postgres.yaml file to point to your server and port, set the masters to monitor. See the sample postgres.yaml for all available configuration options.

Configuration Options:

  • username (Optional) - The user account used to collect metrics, set in the Installation section above
  • password (Optional) - The password for the user account.
  • dbname (Optional) - The name of the database you want to monitor.
  • ssl (Optional) - Defaults to False. Indicates whether to use an SSL connection.
  • use_psycopg2 (Optional) - Defaults to False. Setting this option to True will force the Datadog Agent to collect PostgreSQL metrics using psycopg2 instead of pg8000. Note that pyscopg2 does not support SSL connections.
  • tags (Optional) - A list of tags applied to all metrics collected. Tags may be simple strings or key-value pairs.
  • relations (Optional) - By default, all schemas are included. Add specific schemas here to collect metrics for schema relations. Each relation will generate 10 metrics and an additional 10 metrics per index. Use the following structure to declare relations:
relations:
  - relation_name: my_relation
    schemas:
      - my_schema_1
      - my_schema_2
  • collect_function_metrics (Optional) - Collect metrics regarding PL/pgSQL functions from pg_stat_user_functions
  • collect_count_metrics (Optional) - Collect count metrics. The default value is True for backward compatibility, but this might be slow. The recommended value is False.

Validation

Run the Agent’s info subcommand and look for postgres under the Checks section:

Checks
======

    postgres
    -----------
      - instance #0 [OK]
      - Collected 39 metrics, 0 events & 7 service checks

Compatibility

The postgres check is compatible with all major platforms

Data Collected

Metrics

postgresql.connections
(gauge)
The number of active connections to this database.
shown as connection
postgresql.commits
(gauge)
The number of transactions that have been committed in this database.
shown as transaction
postgresql.rollbacks
(gauge)
The number of transactions that have been rolled back in this database.
shown as transaction
postgresql.disk_read
(gauge)
The number of disk blocks read in this database.
shown as block
postgresql.buffer_hit
(gauge)
The number of times disk blocks were found in the buffer cache, preventing the need to read from the database.
shown as hit
postgresql.rows_returned
(gauge)
The number of rows returned by queries in this database
shown as row
postgresql.rows_fetched
(gauge)
The number of rows fetched by queries in this database
shown as row
postgresql.rows_inserted
(gauge)
The number of rows inserted by queries in this database
shown as row
postgresql.rows_updated
(gauge)
The number of rows updated by queries in this database
shown as row
postgresql.rows_deleted
(gauge)
The number of rows deleted by queries in this database
shown as row
postgresql.database_size
(gauge)
The disk space used by this database.
shown as byte
postgresql.deadlocks
(gauge)
The number of deadlocks detected in this database
shown as
postgresql.temp_bytes
(gauge)
The amount of data written to temporary files by queries in this database.
shown as byte
postgresql.temp_files
(gauge)
The number of temporary files created by queries in this database.
shown as file
postgresql.bgwriter.checkpoints_timed
(count)
The number of scheduled checkpoints that were performed.
shown as
postgresql.bgwriter.checkpoints_requested
(count)
The number of requested checkpoints that were performed.
shown as
postgresql.bgwriter.buffers_checkpoint
(count)
The number of buffers written during checkpoints.
shown as
postgresql.bgwriter.buffers_clean
(count)
The number of buffers written by the background writer.
shown as
postgresql.bgwriter.maxwritten_clean
(count)
The number of times the background writer stopped a cleaning scan due to writing too many buffers.
shown as
postgresql.bgwriter.buffers_backend
(count)
The number of buffers written directly by a backend.
shown as buffer
postgresql.bgwriter.buffers_alloc
(count)
The number of buffers allocated
shown as
postgresql.bgwriter.buffers_backend_fsync
(count)
The of times a backend had to execute its own fsync call instead of the background writer.
shown as
postgresql.bgwriter.write_time
(count)
The total amount of checkpoint processing time spent writing files to disk.
shown as millisecond
postgresql.bgwriter.sync_time
(count)
The total amount of checkpoint processing time spent synchronizing files to disk.
shown as millisecond
postgresql.locks
(gauge)
The number of locks active for this database.
shown as lock
postgresql.seq_scans
(gauge)
The number of sequential scans initiated on this table.
shown as
postgresql.seq_rows_read
(gauge)
The number of live rows fetched by sequential scans.
shown as row
postgresql.index_scans
(gauge)
The number of index scans initiated on this table.
shown as
postgresql.index_rows_fetched
(gauge)
The number of live rows fetched by index scans.
shown as row
postgresql.rows_hot_updated
(gauge)
The number of rows HOT updated, meaning no separate index update was needed.
shown as row
postgresql.live_rows
(gauge)
The estimated number of live rows.
shown as row
postgresql.dead_rows
(gauge)
The estimated number of dead rows.
shown as row
postgresql.index_rows_read
(gauge)
The number of index entries returned by scans on this index.
shown as row
postgresql.table_size
(gauge)
The total disk space used by the specified table. Includes TOAST, free space map, and visibility map. Excludes indexes.
shown as byte
postgresql.index_size
(gauge)
The total disk space used by indexes attached to the specified table.
shown as byte
postgresql.total_size
(gauge)
The total disk space used by the table, including indexes and TOAST data.
shown as byte
postgresql.table.count
(gauge)
The number of user tables in this database.
shown as table
postgresql.max_connections
(gauge)
The maximum number of client connections allowed to this database.
shown as connection
postgresql.percent_usage_connections
(gauge)
The number of connections to this database as a fraction of the maximum number of allowed connections.
shown as fraction
postgresql.replication_delay
(gauge)
The current replication delay in seconds. Only available with postgresql 9.1 and newer
shown as second
postgres.replication_delay_bytes
(gauge)
Deprecated please use postgresql.replication_delay_bytes instead
shown as byte
postgresql.replication_delay_bytes
(gauge)
The current replication delay in bytes. Only available with postgresql 9.2 and newer
shown as byte
postgresql.heap_blocks_read
(gauge)
The number of disk blocks read from this table.
shown as block
postgresql.heap_blocks_hit
(gauge)
The number of buffer hits in this table.
shown as hit
postgresql.index_blocks_read
(gauge)
The number of disk blocks read from all indexes on this table.
shown as block
postgresql.index_blocks_hit
(gauge)
The number of buffer hits in all indexes on this table.
shown as hit
postgresql.toast_blocks_read
(gauge)
The number of disk blocks read from this table's TOAST table.
shown as block
postgresql.toast_blocks_hit
(gauge)
The number of buffer hits in this table's TOAST table.
shown as hit
postgresql.toast_index_blocks_read
(gauge)
The number of disk blocks read from this table's TOAST table index.
shown as block
postgresql.toast_index_blocks_hit
(gauge)
The number of buffer hits in this table's TOAST table index.
shown as block

Events

The Postgres check does not include any event at this time.

Service Checks

The Postgres check does not include any service check at this time.

Troubleshooting

If you have any questions about Datadog or a use case our Docs didn’t mention, we’d love to help! Here’s how you can reach out to us:

Visit the Knowledge Base

Learn more about what you can do in Datadog on the Support Knowledge Base.

Web Support

Messages in the event stream containing @support-datadog will reach our Support Team. This is a convenient channel for referencing graph snapshots or a particular event. In addition, we have a livechat service available during the day (EST) from any page within the app.

By Email

You can also contact our Support Team via email at support@datadoghq.com.

Over Slack

Reach out to our team and other Datadog users on Slack.

Further Reading

Blog Post

To get a better idea of how (or why) to have 100x faster Postgres performance by changing 1 line with Datadog, check out our series of blog posts about it.

Knowledge Base

Custom metrics

The Agent generates PostgreSQL metrics from custom query results. For each custom query, four components are required: descriptors, metrics, query, and relation.

  • query is where you’ll construct a base SELECT statement to generate your custom metrics. Each column name in your SELECT query should have a corresponding item in the descriptors section. Each item in metrics will be substituted for the first %s in the query.
  • metrics are key-value pairs where the key is the query column name or column function and the value is a tuple containing the custom metric name and metric type (RATE, GAUGE, or MONOTONIC). In the example below, the results of the sum of the idx_scan column will appear in Datadog with the metric name postgresql.idx_scan_count_by_table.
  • descriptors is used to add tags to your custom metrics. It’s a list of lists each containing 2 strings. The first string is for documentation purposes and should be used to make clear what you are getting from the query. The second string will be the tag name. For multiple tags, include additional columns in your query string and a corresponding item in the descriptors. The order of items in descriptors must match the columns in query.
  • relation indicates whether to include schema relations specified in the relations configuration option. If set to true, the second %s in query will be set to the list of schema names specified in the relations configuration option.
Example 1
custom_metrics:
  # All index scans & reads
  - descriptors:
      - [relname, table]
      - [schemaname, schema]
    metrics:
        SUM(idx_scan) as idx_scan_count: [postgresql.idx_scan_count_by_table, RATE]
        SUM(idx_tup_read) as idx_read_count: [postgresql.idx_read_count_by_table, RATE]
    query: SELECT relname, schemaname, %s FROM pg_stat_all_indexes GROUP BY relname;
    relation: false

The example above will run two queries in PostgreSQL:

  • SELECT relname, SUM(idx_scan) as idx_scan_count FROM pg_stat_all_indexes GROUP BY relname; will generate a rate metric postgresql.idx_scan_count_by_table.
  • SELECT relname, SUM(idx_tup_read) as idx_read_count FROM pg_stat_all_indexes GROUP BY relname; will generate a rate metric postgresql.idx_read_count_by_table.

Both metrics will use the tags table and schema with values from the results in the relname and schemaname columns respectively. e.g. table: <relname>

Example 2

The postgres.yaml.example file includes an example for the SkyTools 3 Londoniste replication tool:

custom_metrics:
  # Londiste 3 replication lag
  - descriptors:
      - [consumer_name, consumer_name]
    metrics:
        GREATEST(0, EXTRACT(EPOCH FROM lag)) as lag: [postgresql.londiste_lag, GAUGE]
        GREATEST(0, EXTRACT(EPOCH FROM lag)) as last_seen: [postgresql.londiste_last_seen, GAUGE]
        pending_events: [postgresql.londiste_pending_events, GAUGE]
    query:
        SELECT consumer_name, %s from pgq.get_consumer_info() where consumer_name !~ 'watermark$';
    relation: false
Debugging

If your custom metric does not work after an Agent restart, running sudo /etc/init.d/datadog-agent info can provide more information. For example:

postgres
--------
  - instance #0 [ERROR]: 'Missing relation parameter in custom metric'
  - Collected 0 metrics, 0 events & 0 service checks

You should also check the /var/log/datadog/collector.log file for more information.