Log Archives

Log Archives

Overview

Configure your Datadog account to forward all the logs ingested - whether indexed or not - to a cloud storage system of your own. Keep your logs in a storage-optimized archive for longer periods of time and meet compliance requirements while also keeping auditability for ad hoc investigations, with Rehydration.

This guide shows you how to set up an archive for forwarding ingested logs to your own cloud-hosted storage bucket:

  1. If you haven’t already, set up a Datadog integration for your cloud provider
  2. Create a storage bucket
  3. Set permissions to read and/or write on that archive
  4. Route your logs to and from that archive
  5. Configure advanced settings such as encryption, storage class, and tags
  6. Validate your setup checking for possible misconfigurations that Datadog would be able to detect for you

Note: Only Datadog users with the logs_write_archive permission can create, modify, or delete log archive configurations.

Configure an archive

Set up an integration

AWS Role Delegation is not supported on the Datadog for Government site. Access keys must be used.

If not already configured, set up the AWS integration for the AWS account that holds your S3 bucket.

  • In the general case, this involves creating a role that Datadog can use to integrate with AWS S3.
  • Specifically for AWS GovCloud or China accounts, use access keys as an alternative to role delegation.

Set up the Azure integration within the subscription that holds your new storage account, if you haven’t already. This involves creating an app registration that Datadog can use to integrate with.

Set up the GCP integration for the project that holds your GCS storage bucket, if you haven’t already. This involves creating a GCP service account that Datadog can use to integrate with.

Create a storage bucket

Go into your AWS console and create an S3 bucket to send your archives to.

Notes:

  • Do not make your bucket publicly readable.
  • Do not set Object Lock because the last data needs to be rewritten in some rare cases (typically a timeout).
  • Go to your Azure Portal and create a storage account to send your archives to. Give your storage account a name, any account kind, and select the hot access tier.
  • Create a container service into that storage account. Take note of the container name as you will need to add this in the Datadog Archive Page.

Note: Do not set immutability policies because the last data needs to be rewritten in some rare cases (typically a timeout).

Go to your GCP account and create a GCS bucket to send your archives to. Under Choose how to control access to objects, select Set object-level and bucket-level permissions.

Note: Do not add retention policy because the last data needs to be rewritten in some rare cases (typically a timeout).

Set permissions

Add the following two permission statements to your IAM policies. Edit the bucket names and, if desired, specify the paths that contain your log archives.

Notes:

  • The GetObject and ListBucket permissions allow for rehydrating from archives.
  • The PutObject permission is sufficient for uploading archives.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DatadogUploadAndRehydrateLogArchives",
      "Effect": "Allow",
      "Action": ["s3:PutObject", "s3:GetObject"],
      "Resource": [
        "arn:aws:s3:::<MY_BUCKET_NAME_1_/_MY_OPTIONAL_BUCKET_PATH_1>/*",
        "arn:aws:s3:::<MY_BUCKET_NAME_2_/_MY_OPTIONAL_BUCKET_PATH_2>/*"
      ]
    },
    {
      "Sid": "DatadogRehydrateLogArchivesListBucket",
      "Effect": "Allow",
      "Action": "s3:ListBucket",
      "Resource": [
        "arn:aws:s3:::<MY_BUCKET_NAME_1>",
        "arn:aws:s3:::<MY_BUCKET_NAME_2>"
      ]
    }
  ]
}
  • Grant the Datadog app permission to write to and rehydrate from your storage account.
  • Select your storage account from the Storage Accounts page, go to Access Control (IAM), and select Add -> Add Role Assignment.
  • Input the Role called Storage Blob Data Contributor, select the Datadog app, which you created to integrate with Azure, and save.

Grant your Datadog GCP service account permissions to write your archives to your bucket.

  • If you’re creating a new Service Account, this can be done from the GCP Credentials page.
  • If you’re updating an existing Service Account, this can be done from the GCP IAM Admin page).

Add the role under Storage called Storage Object Admin.

Route your logs to a bucket

Go to the Archives page in the Datadog app and select the Add a new archive option at the bottom.

Note: Only Datadog users with logs_write_archive permission can complete this and the following step.

Select the appropriate AWS account and role combination for your S3 bucket.

Input your bucket name. Optional: Input a prefix directory for all the content of your log archives.

Select the Azure Storage archive type, and the Azure tenant and client for the Datadog App that has the Storage Blob Data Contributor role on your storage account.

Input your storage account name and the container name for your archive. Optional: Input a prefix directory for all the content of your log archives.

Select the GCS archive type, and the GCS Service Account that has permissions to write on your storage bucket.

Input your bucket name. Optional: Input a prefix directory for all the content of your log archives.

Advanced settings

Datadog permissions

By default:

  • All Datadog Admin users can create, edit and reorder (see Configure Multiple Archives archives
  • All Datadog Admin and Standard users can rehydrate from archives
  • All users, including Datadog Read Only users, can access rehydrated logs

Use this optional configuration step to assign roles on that archive and restrict who can:

Datadog tags

Use this configuration optional step to:

  • Include all log tags in your archives (activated by default on all new archives). Note: This increases the size of resulting archives.
  • Add tags on rehydrated logs according to your Restriction Queries policy. See logs_read_data permission.

Storage class

You can set a lifecycle configuration on your S3 bucket to automatically transition your log archives to optimal storage classes.

Rehydration supports all storage classes except for Glacier and Glacier Deep Archive. If you wish to rehydrate from archives in the Glacier or Glacier Deep Archive storage classes, you must first move them to a different storage class.

Server side encryption (SSE)

SSE-S3

The easiest method to add server side encryption to your S3 log archives is with S3’s native server side encryption, SSE-S3.

To enable it, go to the Properties tab in your S3 bucket and select Default Encryption. Select the AES-256 option and Save.

SSE-KMS

Alternatively, Datadog supports server side encryption with a CMK from AWS KMS. To enable it, take the following steps:

  1. Create your CMK
  2. Attach a CMK policy to your CMK with the following content, replacing the AWS account number and Datadog IAM role name appropriately:
{
    "Id": "key-consolepolicy-3",
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Enable IAM User Permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<MY_AWS_ACCOUNT_NUMBER>:root"
            },
            "Action": "kms:*",
            "Resource": "*"
        },
        {
            "Sid": "Allow use of the key",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<MY_AWS_ACCOUNT_NUMBER>:role/<MY_DATADOG_IAM_ROLE_NAME>"
            },
            "Action": [
                "kms:Encrypt",
                "kms:Decrypt",
                "kms:ReEncrypt*",
                "kms:GenerateDataKey*",
                "kms:DescribeKey"
            ],
            "Resource": "*"
        },
        {
            "Sid": "Allow attachment of persistent resources",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<MY_AWS_ACCOUNT_NUMBER>:role/<MY_DATADOG_IAM_ROLE_NAME>"
            },
            "Action": [
                "kms:CreateGrant",
                "kms:ListGrants",
                "kms:RevokeGrant"
            ],
            "Resource": "*",
            "Condition": {
                "Bool": {
                    "kms:GrantIsForAWSResource": "true"
                }
            }
        }
    ]
}
  1. Go to the Properties tab in your S3 bucket and select Default Encryption. Choose the “AWS-KMS” option, select your CMK ARN, and save.

Validation

Once your archive settings are successfully configured in your Datadog account, your processing pipelines begin to enrich all logs ingested into Datadog. These logs are subsequently forwarded to your archive.

However, after creating or updating your archive configurations, it can take several minutes before the next archive upload is attempted. Logs are uploaded to the archive every 15 minutes, so check back on your storage bucket in 15 minutes to make sure the archives are successfully being uploaded from your Datadog account. After that, if the archive is still in a pending state, check your inclusion filters to make sure the query is valid and matches log events in live tail.

If Datadog detects a broken configuration, the corresponding archive is highlighted in the configuration page. Click on the error icon to see the actions to take to resolve the issue.

Multiple archives

If multiple archives are defined, logs enter the first archive based on filter. Therefore, it is important to order your archives carefully.

For example, if you create a first archive filtered to the env:prod tag and a second archive without any filter (the equivalent of *), all your production logs would go to one storage bucket/path, and the rest would go to the other.

Format of the archives

The log archives that Datadog forwards to your storage bucket are in compressed JSON format (.json.gz). Using the prefix you indicate (or / if there is none), the archives are stored in a directory structure that indicates on what date and at what time the archive files were generated, like so:

/my/bucket/prefix/dt=20180515/hour=14/archive_143201.1234.7dq1a9mnSya3bFotoErfxl.json.gz
/my/bucket/prefix/dt=<YYYYMMDD>/hour=<HH>/archive_<HHmmss.SSSS>.<DATADOG_ID>.json.gz

This directory structure simplifies the process of querying your historical log archives based on their date.

Within the zipped JSON file, each event’s content is formatted as follows:

{
    "_id": "123456789abcdefg",
    "date": "2018-05-15T14:31:16.003Z",
    "host": "i-12345abced6789efg",
    "source": "source_name",
    "service": "service_name",
    "status": "status_level",
    "message": "2018-05-15T14:31:16.003Z INFO rid='acb-123' status=403 method=PUT",
    "attributes": { "rid": "abc-123", "http": { "status_code": 403, "method": "PUT" } },
    "tags": [ "env:prod", "team:acme" ]
}

Further Reading

Next, learn how to access your archived log content from Datadog:


Additional helpful documentation, links, and articles:


*Logging without Limits is a trademark of Datadog, Inc.