Datadog-AWS RDS Integration

RDS Dashboard

Overview

Amazon Relational Database Service (RDS) is a web service that makes it easy to setup, operate, and scale a relational database in the cloud. Enable this integration to see all your RDS metrics in Datadog

There are 3 options for monitoring RDS instances. You can choose to use standard or enhanced, and then optionally turn on the native database integration as well if you wish.

  • Standard RDS Integration - The standard integration requires selecting RDS on the left side of the AWS integration tile. You will receive metrics about your instance as often as your Cloudwatch integration allows. All RDS Engine types are supported.

  • Enhanced RDS Integration - The enhanced integration requires additional configuration and is only available for MySQL, Aurora, PostgreSQL, and MariaDB engines. Additional metrics are available but an AWS Lambda is required to submit the metrics to Datadog. The higher granularity and additional required services may result in additional AWS charges.

  • RDS + Native Database Integration - You can also choose to turn on the Native Database Integration. This is available for MySQL, Aurora, MariaDB, SQL Server, and PostgreSQL engine types. To get the metrics from RDS and the ones from the native integration to match up, you will need to use the dbinstanceidentifier tag on the native integration based on the identifier you assign to the RDS instance. The RDS instances will automatically have the tag assigned.

Setup

Installation

Standard RDS Integration

If you haven’t already, set up the Amazon Web Services integration first.

Enhanced RDS Integration

Enable Enhanced Monitoring for your RDS instance. This can either be done during instance creation or afterwards by choosing Modify under Instance Actions. We recommend choosing 15 for Monitoring Granularity.

RDS enhanced install

Create your KMS Key
  1. Open the Encryption keys section of the AWS Identity and Access Management (IAM) console at https://console.aws.amazon.com/iam/home#encryptionKeys. For Region, choose the appropriate AWS Region. Do not use the region selector in the navigation bar (top right corner).
  2. Choose Create Key.
  3. Enter an Alias for the key, such as lambda-datadog-key. Note: An alias cannot begin with aws. Aliases that begin with aws are reserved by Amazon Web Services to represent AWS-managed CMKs in your account.
  4. Save your KMS key
  5. Add the appropriate administrators and then users for the key. Ensure that you select yourself at least as a user.
  6. Encrypt the key you just created by using the AWS CLI, replacing <KMS_KEY_NAME> with the alias of the key you just created:
    aws kms encrypt --key-id alias/<KMS_KEY_NAME> --plaintext '{"api_key":"<DATADOG_API_KEY>", "app_key":"<DATADOG_APP_KEY>"}'.
    The command output will include two parts: a ciphertext blob followed by the key ID that starts with something similar to arn:aws:kms.

  7. Keep your base-64 encoded, encrypted key (CiphertextBlob) you will need it to set the <KMS_ENCRYPTED_KEYS> variable for your lambda.

Create your Role
  1. From the IAM Management Console, create a new role. Enter a name for the role, such as lambda-datadog-post-execution.
  2. Select AWS Lambda from the AWS Service Roles list. You do not need to attach any policies at this time. Press the appropriate buttons to complete the role creation.
  3. Click on the role you just created. Expand the Inline Policies section and click the link to create a policy. Choose Custom Policy and press the button to continue.
  4. Enter a policy name, such as lambda-datadog-policy. For Policy Document, enter the following, replacing <ENCRYPTION_KEY ARN> with the ARN of the Encryption Key you created previously:
    {
      "Version": "2012-10-17",
      "Statement": [
          {
              "Effect": "Allow",
              "Action": [
                  "kms:Decrypt"
              ],
              "Resource": [
                  "<ENCRYPTION_KEY ARN>"
              ]
          }
      ]
    }
Create your Lambda function
  1. From the Lambda Management Console, create a new Lambda Function. Your Lambda function must be in the same region as the KMS key you created.
  2. On the Select blueprint screen, select the datadog-process-rds-metrics blueprint.
  3. Choose RDSOSMetrics from the Log Group dropdown.
  4. Enter anything for the Filter Name and click Next.
  5. Enter a name for your function, such as lambda-datadog-post-function.
  6. Among the Lambda function’s environment variables, look for one named <KMS_ENCRYPTED_KEYS> . Set its value to the ciphertext blob from the output of the command in step 7 of previous section.
  7. Under Lambda function handler and role, choose the role you created previously. Click Next.
  8. Choose Enable Now
  9. Choose Create Function.

When clicking on test button for your lambda function you might get this error:

{ 
  "stackTrace": [ [ "/var/task/lambda_function.py", 
    109, 
    "lambda_handler", 
    "event = json.loads(gzip.GzipFile(fileobj=StringIO(event['awslogs']['data'].decode('base64'))).read())" 
    ] 
  ], 
  "errorType": "KeyError", 
  "errorMessage": "'awslogs'" 
}

Please ignore it, the Test button doesn’t work with this setup.

Native Database Integration

  1. Navigate to the AWS Console and open the RDS section to find the instance you want to monitor.
    RDS console
  2. Copy the endpoint URL (e.g. mysqlrds.blah.us-east1.rds.amazonaws.com:3306); You will need it when you configure the agent. Also make a note of the DB Instance identifier (e.g. mysqlrds). You will need it to create graphs and dashboards.

Configuration

Standard RDS Integration

Ensure RDS is checked in the AWS Integration tile.

Enhanced RDS Integration

Ensure RDS is checked in the AWS Integration tile.

Native Database Integration

Configure an agent and connect to your RDS instance by editing the appropriate yaml file in your conf.d directory and then restart your agent:

If you are using MySQL, MariaDB, or Aurora, then edit mysql.yaml:

init_config:

instances:
  - server: mysqlrds.blah.us-east1-rds.amazonaws.com # The endpoint URL from the AWS console
    user: my_username
    pass: my_password
    port: 3306
    tags:
      - dbinstanceidentifier:my_own_instance

If you are using PostgreSQL, then edit postgres.yaml:

init_config:

instances:
  - host: mysqlrds.blah.us-east1-rds.amazonaws.com
    port: 5432
    username: my_username
    password: my_password
    dbname: db_name
    tags:
      - dbinstanceidentifier:my_own_instance

If you are using Microsoft SQL Server, then edit sqlserver.yaml

init_config:

instances:
  - host: mysqlrds.blah.us-east1-rds.amazonaws.com,1433
    username: my_username
    password: my_password
    tags:
      - dbinstanceidentifier:my_own_instance

Validation

To validate that the native database integration is working, run sudo /etc/init.d/datadog-agent info. You should see something like the following:

Checks
======

[...]

  mysql
  -----
      - instance #0 [OK]
      - Collected 8 metrics & 0 events

Usage

After a few minutes, RDS metrics and metrics from MySQL, Aurora, MariaDB, SQL Server, or PostgreSQL will be accessible in Datadog in the Metrics Explorer, in Graphs and in Alerts. Here’s an example of an Aurora dashboard displaying a number of metrics from both RDS and the MySQL integration. Metrics from both integrations on the instance quicktestrds are unified using the dbinstanceidentifier tag.

rds aurora dash

Here is the default dashboard for MySQL on Amazon RDS:

RDS MySQL default dashboard

Learn more about how to monitor MySQL on Amazon RDS performance metrics thanks to our series of posts. We detail the key performance metrics, how to collect them, and how to use Datadog to monitor MySQL on Amazon RDS.

Data Collected

Metrics

In addition to the metrics you get from the database engines you will also get the following RDS metrics:

aws.rds.bin_log_disk_usage
(gauge)
Amount of disk space occupied by binary logs on the master. (Standard)
shown as byte
aws.rds.cpuutilization
(gauge)
Percentage of CPU utilization. (Standard)
shown as percent
aws.rds.cpucredit_usage
(gauge)
Number of CPU credits consumed. (Standard)
shown as
aws.rds.cpucredit_balance
(gauge)
Number of CPU credits that an instance has accumulated. (Standard)
shown as
aws.rds.database_connections
(gauge)
Number of database connections in use. (Standard)
shown as connection
aws.rds.disk_queue_depth
(gauge)
Number of outstanding IOs (read/write requests) waiting to access the disk. (Standard)
shown as request
aws.rds.freeable_memory
(gauge)
Amount of available random access memory. (Standard)
shown as byte
aws.rds.free_storage_space
(gauge)
Amount of available storage space. (Standard)
shown as byte
aws.rds.replica_lag
(gauge)
Amount of time a Read Replica DB Instance lags behind the source DB Instance.(Standard)
shown as second
aws.rds.swap_usage
(gauge)
Amount of swap space used on the DB Instance. (Standard)
shown as byte
aws.rds.read_iops
(rate)
Average number of disk read I/O operations. (Standard)
shown as operation
aws.rds.write_iops
(rate)
Average number of disk write I/O operations per second. (Standard)
shown as operation
aws.rds.read_latency
(gauge)
Average amount of time taken per disk read I/O operation. (Standard)
shown as second
aws.rds.write_latency
(gauge)
Average amount of time taken per disk write I/O operation. (Standard)
shown as second
aws.rds.read_throughput
(rate)
Average number of bytes read from disk. (Standard)
shown as byte
aws.rds.write_throughput
(rate)
Average number of bytes written to (Standard)
shown as byte
aws.rds.network_receive_throughput
(rate)
Incoming (Receive) network traffic on the DB instance. (Standard)
shown as byte
aws.rds.network_transmit_throughput
(rate)
Outgoing (Transmit) network traffic on the DB instance. (Standard)
shown as byte
aws.rds.update_throughput
(rate)
The average rate of update queries. (Standard, Aurora only)
shown as query
aws.rds.update_latency
(gauge)
The average latency for update queries. (Standard, Aurora only)
shown as millisecond
aws.rds.transaction_logs_disk_usage
(gauge)
Amount of disk space occupied by transaction logs. (Standard, Postgres Only)
shown as byte
aws.rds.total_storage_space
(gauge)
Total amount of storage available on an instance. (Standard)
shown as byte
aws.rds.select_throughput
(rate)
The average rate of select queries. (Standard, Aurora only)
shown as query
aws.rds.select_latency
(gauge)
The average latency for select queries. (Standard, Aurora only)
shown as millisecond
aws.rds.result_set_cache_hit_ratio
(gauge)
The percentage of requests that are served by the Resultset cache. (Standard, Aurora only)
shown as percent
aws.rds.queries
(rate)
The average rate of queries. (Enhanced)
shown as query
aws.rds.network_throughput
(rate)
The rate of network throughput sent and received from clients by each instance in the DB cluster. (Standard
shown as byte
aws.rds.login_failures
(count)
The average number of failed login attempts per second (Standard, Aurora only)
shown as operation
aws.rds.insert_throughput
(rate)
The average rate of insert queries. (Standard, Aurora only)
shown as query
aws.rds.insert_latency
(gauge)
The amount of latency for insert queries. (Standard, Aurora only)
shown as millisecond
aws.rds.free_local_storage
(gauge)
The amount of local storage that is free on an instance. (Standard, Aurora only)
shown as byte
aws.rds.engine_uptime
(gauge)
The amount of time that the DB instance has been active. (Enhanced)
shown as second
aws.rds.dmlthroughput
(rate)
The average rate of inserts and updates and deletes. (Standard, Aurora only)
shown as operation
aws.rds.dmllatency
(gauge)
The average latency for inserts and updates and deletes. (Standard, Aurora only)
shown as millisecond
aws.rds.delete_throughput
(rate)
The average rate of delete queries. (Standard, Aurora only)
shown as query
aws.rds.delete_latency
(gauge)
The average latency for delete queries. (Standard, Aurora only)
shown as millisecond
aws.rds.deadlocks
(count)
The average number of deadlocks in the database per second. (Standard, Aurora only)
shown as lock
aws.rds.ddlthroughput
(rate)
The average rate of DDL requests per second. (Standard, Aurora only)
shown as request
aws.rds.ddllatency
(gauge)
The amount of latency for DDL requests (create/alter/drop). (Standard, Aurora only)
shown as millisecond
aws.rds.commit_throughput
(rate)
The average rate of committed transactions. (Standard, Aurora only)
shown as transaction
aws.rds.commit_latency
(gauge)
The amount of latency for committed transactions. (Standard, Aurora only)
shown as millisecond
aws.rds.buffer_cache_hit_ratio
(gauge)
The percentage of requests that are served by the Buffer cache. (Standard, Aurora only)
shown as percent
aws.rds.blocked_transactions
(count)
The average rate of transactions in the database that are blocked. (Standard, Aurora only)
shown as transaction
aws.rds.aurora_replica_lag_minimum
(gauge)
The minimum amount of lag between the primary instance and each Aurora instance in the DB cluster. (Standard, Aurora only)
shown as millisecond
aws.rds.aurora_replica_lag_maximum
(gauge)
The maximum amount of lag between the primary instance and each Aurora instance in the DB cluster. (Standard, Aurora only)
shown as millisecond
aws.rds.aurora_replica_lag
(gauge)
The average lag when replicating updates from the primary instance. (Standard, Aurora only)
shown as millisecond
aws.rds.active_transactions
(gauge)
The average rate of current transactions executing on a DB instance. (Standard, Aurora only)
shown as transaction
aws.rds.volume_bytes_used
(gauge)
The amount of storage in bytes used by your Aurora database. (Enhanced, Aurora, only)
shown as byte
aws.rds.volume_read_iops
(count)
The number of billed read I/O operations from a cluster volume, reported at 5-minute intervals (Standard, Aurora only)
shown as operation
aws.rds.volume_write_iops
(count)
The average number of write disk I/O operations to the cluster volume reported at 5-minute intervals (Standard, Aurora only)
shown as operation
aws.rds.uptime
(gauge)
RDS instance uptime. (Enhanced)
shown as second
aws.rds.cpuutilization.guest
(gauge)
The percentage of CPU in use by guest programs. (Enhanced)
shown as percent
aws.rds.cpuutilization.idle
(gauge)
The percentage of CPU that is idle. (Enhanced)
shown as percent
aws.rds.cpuutilization.irq
(gauge)
The percentage of CPU in use by software interrupts. (Enhanced)
shown as percent
aws.rds.cpuutilization.nice
(gauge)
The percentage of CPU in use by programs running at lowest priority. (Enhanced)
shown as percent
aws.rds.cpuutilization.steal
(gauge)
The percentage of CPU in use by other virtual machines. (Enhanced)
shown as percent
aws.rds.cpuutilization.system
(gauge)
The percentage of CPU in use by the kernel. (Enhanced)
shown as percent
aws.rds.cpuutilization.total
(gauge)
The total percentage of the CPU in use. This value excludes the nice value. (Enhanced)
shown as percent
aws.rds.cpuutilization.user
(gauge)
The percentage of CPU in use by user programs. (Enhanced)
shown as percent
aws.rds.cpuutilization.wait
(gauge)
The percentage of CPU unused while waiting for I/O access. (Enhanced)
shown as percent
aws.rds.load.1
(gauge)
The number of processes requesting CPU time over the last minute. (Enhanced)
shown as process
aws.rds.load.15
(gauge)
The number of processes requesting CPU time over the last 15 minutes. (Enhanced)
shown as process
aws.rds.load.5
(gauge)
The number of processes requesting CPU time over the last 5 minutes. (Enhanced)
shown as process
aws.rds.memory.active
(gauge)
The amount of assigned memory. (Enhanced)
shown as kibibyte
aws.rds.memory.buffers
(gauge)
The amount of memory used for buffering I/O requests prior to writing to the storage device. (Enhanced)
shown as kibibyte
aws.rds.memory.cached
(gauge)
The amount of memory used for caching file system–based I/O. (Enhanced)
shown as kibibyte
aws.rds.memory.dirty
(gauge)
The amount of memory pages in RAM that have been modified but not written to their related data block in storage. (Enhanced)
shown as kibibyte
aws.rds.memory.free
(gauge)
The amount of unassigned memory. (Enhanced)
shown as kibibyte
aws.rds.memory.hugePagesFree
(gauge)
The number of free huge pages. (Enhanced)
shown as page
aws.rds.memory.hugePagesRsvd
(gauge)
The number of committed huge pages. (Enhanced)
shown as page
aws.rds.memory.hugePagesSize
(gauge)
The size for each huge pages unit. (Enhanced)
shown as kibibyte
aws.rds.memory.hugePagesSurp
(gauge)
The number of available surplus huge pages over the total. (Enhanced)
shown as page
aws.rds.memory.hugePagesTotal
(gauge)
The total number of huge pages for the system. (Enhanced)
shown as page
aws.rds.memory.inactive
(gauge)
The amount of inactive memory (Enhanced)
shown as kibibyte
aws.rds.memory.mapped
(gauge)
The total amount of file-system contents that is memory mapped inside a process address space. (Enhanced)
shown as kibibyte
aws.rds.memory.pageTables
(gauge)
The amount of memory used by page tables. (Enhanced)
shown as kibibyte
aws.rds.memory.slab
(gauge)
The amount of reusable kernel data structures. (Enhanced)
shown as kibibyte
aws.rds.memory.total
(gauge)
The total amount of memory. (Enhanced)
shown as kibibyte
aws.rds.memory.writeback
(gauge)
The amount of dirty pages in RAM that are still being written to the backing storage. (Enhanced)
shown as kibibyte
aws.rds.process.cpuUsedPc
(gauge)
The percentage of CPU used by the process. (Enhanced)
shown as percent
aws.rds.process.memoryUsedPc
(gauge)
The amount of memory used by the process. (Enhanced)
shown as kibibyte
aws.rds.process.parentID
(gauge)
The process identifier for the parent proces of the process. (Enhanced)
shown as
aws.rds.process.rss
(gauge)
The amount of RAM allocated to the process. (Enhanced)
shown as kibibyte
aws.rds.process.tgid
(gauge)
The thread group identifier which is a number representing the process ID to which a thread belongs. This identifier is used to group threads from the same process. (Enhanced)
shown as
aws.rds.process.vss
(gauge)
The amount of virtual memory allocated to the process. (Enhanced)
shown as kibibyte
aws.rds.diskio.avgQueueLen
(gauge)
The number of requests waiting in the I/O device's queue. This metric is not available for Amazon Aurora. (Enhanced)
shown as request
aws.rds.diskio.avgReqSz
(gauge)
The average request size. This metric is not available for Amazon Aurora. (Enhanced)
shown as kibibyte
aws.rds.diskio.await
(gauge)
The number of milliseconds required to respond to requests including queue time and service time. This metric is not available for Amazon Aurora. (Enhanced)
shown as millisecond
aws.rds.diskio.readIOsPS
(rate)
The rate of read operations. (Enhanced)
shown as operation
aws.rds.diskio.readKb
(gauge)
The total amount of data read. This metric is not available for Amazon Aurora. (Enhanced)
shown as kibibyte
aws.rds.diskio.readKbPS
(rate)
The rate that data is read. This metric is not available for Amazon Aurora. (Enhanced)
shown as kibibyte
aws.rds.diskio.rrqmPS
(rate)
The rate of merged read requests queue. This metric is not available for Amazon Aurora. (Enhanced)
shown as request
aws.rds.diskio.tps
(rate)
The rate of I/O transactions. This metric is not available for Amazon Aurora. (Enhanced)
shown as transaction
aws.rds.diskio.util
(gauge)
The percentage of CPU time during which requests were issued. The percentage of CPU time during which requests were issued. (Enhanced)
shown as percent
aws.rds.diskio.writeIOsPS
(rate)
The rate of write operations. (Enhanced)
shown as operation
aws.rds.diskio.writeKb
(gauge)
The total amount of data written. This metric is not available for Amazon Aurora. (Enhanced)
shown as kibibyte
aws.rds.diskio.writeKbPS
(rate)
The rate that data is written. This metric is not available for Amazon Aurora. (Enhanced)
shown as kibibyte
aws.rds.diskio.wrqmPS
(rate)
The rate of merged write requests queue. This metric is not available for Amazon Aurora. (Enhanced)
shown as request
aws.rds.filesystem.maxFiles
(gauge)
The maximum number of files that can be created for the file system. (Enhanced)
shown as file
aws.rds.filesystem.total
(gauge)
The total amount of disk space available for the file system. (Enhanced)
shown as kibibyte
aws.rds.filesystem.used
(gauge)
The amount of disk space used by files in the file system. (Enhanced)
shown as kibibyte
aws.rds.filesystem.usedFilePercent
(gauge)
The percentage of available files in use. (Enhanced)
shown as percent
aws.rds.filesystem.usedFiles
(gauge)
The number of files in the file system. (Enhanced)
shown as file
aws.rds.filesystem.usedPercent
(gauge)
The percentage of the file-system disk space in use. (Enhanced)
shown as percent
aws.rds.network.rx
(gauge)
The number of packets received. (Enhanced)
shown as packet
aws.rds.network.tx
(gauge)
The number of packets uploaded. (Enhanced)
shown as packet
aws.rds.swap.cached
(gauge)
The amount of swap memory used as cache memory. (Enhanced)
shown as kibibyte
aws.rds.swap.free
(gauge)
The total amount of swap memory free. (Enhanced)
shown as kibibyte
aws.rds.swap.total
(gauge)
The total amount of swap memory available. (Enhanced)
shown as kibibyte
aws.rds.tasks.blocked
(gauge)
The number of tasks that are blocked. (Enhanced)
shown as task
aws.rds.tasks.running
(gauge)
The number of tasks that are running. (Enhanced)
shown as task
aws.rds.tasks.sleeping
(gauge)
The number of tasks that are sleeping. (Enhanced)
shown as task
aws.rds.tasks.stopped
(gauge)
The number of tasks that are stopped. (Enhanced)
shown as task
aws.rds.tasks.total
(gauge)
The total number of tasks. (Enhanced)
shown as task
aws.rds.tasks.zombie
(gauge)
The number of child tasks that are inactive with an active parent task. (Enhanced)
shown as task
aws.rds.virtual_cpus
(gauge)
The number of virtual CPUs for the DB instance. (Enhanced)
shown as cpu

Each of the metrics retrieved from AWS will be assigned the same tags that appear in the AWS console, including but not limited to host name, security-groups, and more.