Log Archives

Overview

Configure your Datadog account to forward all the logs ingested - whether indexed or not - to a cloud storage system of your own. Keep your logs in a storage-optimized archive for longer periods of time and meet compliance requirements while also keeping auditability for ad hoc investigations, with Rehydration.

Archive page view

This guide shows you how to set up an archive for forwarding ingested logs to your own cloud-hosted storage bucket:

  1. If you haven’t already, set up a Datadog integration for your cloud provider
  2. Create a storage bucket
  3. Set permissions to read and/or write on that archive
  4. Route your logs to and from that archive
  5. Configure advanced settings such as encryption, storage class, and tags
  6. Validate your setup checking for possible misconfigurations that Datadog would be able to detect for you

Note: Only Datadog users with the logs_write_archive permission can create, modify, or delete log archive configurations.

Configure an archive

Set up an integration

AWS Role Delegation is not supported on the Datadog for Government site. Access keys must be used.

If not already configured, set up the AWS integration for the AWS account that holds your S3 bucket.

  • In the general case, this involves creating a role that Datadog can use to integrate with AWS S3.
  • Specifically for AWS GovCloud or China accounts, use access keys as an alternative to role delegation.

Set up the Azure integration within the subscription that holds your new storage account, if you haven’t already. This involves creating an app registration that Datadog can use to integrate with.

Note: Archiving to Azure ChinaCloud is not supported.

Set up the GCP integration for the project that holds your GCS storage bucket, if you haven’t already. This involves creating a GCP service account that Datadog can use to integrate with.

Create a storage bucket

Go into your AWS console and create an S3 bucket to send your archives to.

Notes:

  • Do not make your bucket publicly readable.
  • Do not set Object Lock because the last data needs to be rewritten in some rare cases (typically a timeout).
  • See AWS Pricing for inter-region data transfer fees and how cloud storage costs may be impacted. Consider creating your storage bucket in us-east-1 to manage your inter-region data transfer fees.
  • Go to your Azure Portal and create a storage account to send your archives to. Give your storage account a name, any account kind, and select the hot or cool access tier.
  • Create a container service into that storage account. Take note of the container name as you will need to add this in the Datadog Archive Page.

Note: Do not set immutability policies because the last data needs to be rewritten in some rare cases (typically a timeout).

Go to your GCP account and create a GCS bucket to send your archives to. Under Choose how to control access to objects, select Set object-level and bucket-level permissions.

Note: Do not add retention policy because the last data needs to be rewritten in some rare cases (typically a timeout).

Set permissions

  1. Create a policy with the following two permission statements:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "DatadogUploadAndRehydrateLogArchives",
          "Effect": "Allow",
          "Action": ["s3:PutObject", "s3:GetObject"],
          "Resource": [
            "arn:aws:s3:::<MY_BUCKET_NAME_1_/_MY_OPTIONAL_BUCKET_PATH_1>/*",
            "arn:aws:s3:::<MY_BUCKET_NAME_2_/_MY_OPTIONAL_BUCKET_PATH_2>/*"
          ]
        },
        {
          "Sid": "DatadogRehydrateLogArchivesListBucket",
          "Effect": "Allow",
          "Action": "s3:ListBucket",
          "Resource": [
            "arn:aws:s3:::<MY_BUCKET_NAME_1>",
            "arn:aws:s3:::<MY_BUCKET_NAME_2>"
          ]
        }
      ]
    }
    
    • The GetObject and ListBucket permissions allow for rehydrating from archives.
    • The PutObject permission is sufficient for uploading archives.
  2. Edit the bucket names.

  3. Optionally, specify the paths that contain your log archives.

  4. Attach the new policy to the Datadog integration role.
    a. Navigate to Roles in the AWS IAM console.
    b. Locate the role used by the Datadog integration. By default it is named DatadogIntegrationRole, but the name may vary if your organization has renamed it. Click the role name to open the role summary page.
    c. Click Add permissions, and then Attach policies.
    d. Enter the name of the policy created above.
    e. Click Attach policies.

Note: Ensure that the resource value under the s3:PutObject and s3:GetObject actions ends with /* because these permissions are applied to objects within the buckets.

  • Grant the Datadog app permission to write to and rehydrate from your storage account.
  • Select your storage account from the Storage Accounts page, go to Access Control (IAM), and select Add -> Add Role Assignment.
  • Input the Role called Storage Blob Data Contributor, select the Datadog app, which you created to integrate with Azure, and save.
Add the Storage Blob Data Contributor role to your Datadog App.

Grant your Datadog GCP service account permissions to write your archives to your bucket.

Add the role under Storage called Storage Object Admin.

Add the Storage Object Admin role to your Datadog GCP Service Account.

Route your logs to a bucket

Go to the Archives page in the Datadog app and select the Add a new archive option at the bottom.

Notes:

  • Only Datadog users with logs_write_archive permission can complete this and the following step.
  • Archiving logs to Azure Blob Storage requires an App Registration. See instructions on the Azure integration page, and set the “site” on the right-hand side of the documentation page to “US.” App Registration(s) created for archiving purposes only need the “Storage Blob Data Contributor” role. If your storage bucket is in a subscription being monitored through a Datadog Resource, a warning is displayed about the App Registration being redundant. You can ignore this warning.

Select the appropriate AWS account and role combination for your S3 bucket.

Input your bucket name. Optional: Input a prefix directory for all the content of your log archives.

Set your S3 bucket info in Datadog

Select the Azure Storage archive type, and the Azure tenant and client for the Datadog App that has the Storage Blob Data Contributor role on your storage account.

Input your storage account name and the container name for your archive. Optional: Input a prefix directory for all the content of your log archives.

Set your Azure storage account info in Datadog

Select the GCS archive type, and the GCS Service Account that has permissions to write on your storage bucket.

Input your bucket name. Optional: Input a prefix directory for all the content of your log archives.

Set your Azure storage account info in Datadog

Advanced settings

Datadog permissions

By default:

  • All Datadog Admin users can create, edit and reorder (see Configure Multiple Archives archives
  • All Datadog Admin and Standard users can rehydrate from archives
  • All users, including Datadog Read Only users, can access rehydrated logs

Use this optional configuration step to assign roles on that archive and restrict who can:

Restrict access to Archives and Rehydrated logs

Datadog tags

Use this optional configuration step to:

  • Include all log tags in your archives (activated by default on all new archives). Note: This increases the size of resulting archives.
  • Add tags on rehydrated logs according to your Restriction Queries policy. See logs_read_data permission.
Configure Archive Tags

Define maximum scan size

Use this optional configuration step to define the maximum volume of log data (in GB) that can be scanned for Rehydration on your Log Archives.

For Archives with a maximum scan size defined, all users need to estimate the scan size before they are allowed to start a Rehydration. If the estimated scan size is greater than what is permitted for that Archive, users must reduce the time range over which they are requesting the Rehydration. Reducing the time range will reduce the scan size and allow the user to start a Rehydration.

Define maximum scan size on Archive

Storage class

You can set a lifecycle configuration on your S3 bucket to automatically transition your log archives to optimal storage classes.

Rehydration only supports the following storage classes:

  • S3 Standard
  • S3 Standard-IA
  • S3 One Zone-IA
  • S3 Glacier Instant Retrieval

If you wish to rehydrate from archives in another storage class, you must first move them to one of the supported storage classes above.

Archiving and Rehydration only supports the following access tiers:

  • Hot access tier
  • Cool access tier

If you wish to rehydrate from archives in another access tier, you must first move them to one of the supported tiers above.

Server side encryption (SSE)

SSE-S3

The easiest method to add server side encryption to your S3 log archives is with S3’s native server side encryption, SSE-S3.

To enable it, go to the Properties tab in your S3 bucket and select Default Encryption. Select the AES-256 option and Save.

Select the AES-256 option and Save.
SSE-KMS

Alternatively, Datadog supports server side encryption with a CMK from AWS KMS. To enable it, take the following steps:

  1. Create your CMK
  2. Attach a CMK policy to your CMK with the following content, replacing the AWS account number and Datadog IAM role name appropriately:
{
    "Id": "key-consolepolicy-3",
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Enable IAM User Permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<MY_AWS_ACCOUNT_NUMBER>:root"
            },
            "Action": "kms:*",
            "Resource": "*"
        },
        {
            "Sid": "Allow use of the key",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<MY_AWS_ACCOUNT_NUMBER>:role/<MY_DATADOG_IAM_ROLE_NAME>"
            },
            "Action": [
                "kms:Encrypt",
                "kms:Decrypt",
                "kms:ReEncrypt*",
                "kms:GenerateDataKey*",
                "kms:DescribeKey"
            ],
            "Resource": "*"
        },
        {
            "Sid": "Allow attachment of persistent resources",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<MY_AWS_ACCOUNT_NUMBER>:role/<MY_DATADOG_IAM_ROLE_NAME>"
            },
            "Action": [
                "kms:CreateGrant",
                "kms:ListGrants",
                "kms:RevokeGrant"
            ],
            "Resource": "*",
            "Condition": {
                "Bool": {
                    "kms:GrantIsForAWSResource": "true"
                }
            }
        }
    ]
}
  1. Go to the Properties tab in your S3 bucket and select Default Encryption. Choose the “AWS-KMS” option, select your CMK ARN, and save.

Validation

Once your archive settings are successfully configured in your Datadog account, your processing pipelines begin to enrich all logs ingested into Datadog. These logs are subsequently forwarded to your archive.

However, after creating or updating your archive configurations, it can take several minutes before the next archive upload is attempted. The frequency at which archives are uploaded can vary. Check back on your storage bucket in 15 minutes to make sure the archives are successfully being uploaded from your Datadog account. After that, if the archive is still in a pending state, check your inclusion filters to make sure the query is valid and matches log events in live tail.

When Datadog fails to upload logs to an external archive, due to unintentional changes in settings or permissions, the corresponding Log Archive is highlighted in the configuration page. Hover over the archive to view the error details and the actions to take to resolve the issue.

In addition, an event is generated, visible in the Events Explorer. Build a monitor on such events to detect and remediate failures quickly.

Check that your archives are properly set up.

Multiple archives

If multiple archives are defined, logs enter the first archive based on filter. Therefore, it is important to order your archives carefully.

For example, if you create a first archive filtered to the env:prod tag and a second archive without any filter (the equivalent of *), all your production logs would go to one storage bucket/path, and the rest would go to the other.

Logs enter the first archive whose filter they match on.

Format of the archives

The log archives that Datadog forwards to your storage bucket are in compressed JSON format (.json.gz). Using the prefix you indicate (or / if there is none), the archives are stored in a directory structure that indicates on what date and at what time the archive files were generated, like so:

/my/bucket/prefix/dt=20180515/hour=14/archive_143201.1234.7dq1a9mnSya3bFotoErfxl.json.gz
/my/bucket/prefix/dt=<YYYYMMDD>/hour=<HH>/archive_<HHmmss.SSSS>.<DATADOG_ID>.json.gz

This directory structure simplifies the process of querying your historical log archives based on their date.

Within the zipped JSON file, each event’s content is formatted as follows:

{
    "_id": "123456789abcdefg",
    "date": "2018-05-15T14:31:16.003Z",
    "host": "i-12345abced6789efg",
    "source": "source_name",
    "service": "service_name",
    "status": "status_level",
    "message": "2018-05-15T14:31:16.003Z INFO rid='acb-123' status=403 method=PUT",
    "attributes": { "rid": "abc-123", "http": { "status_code": 403, "method": "PUT" } },
    "tags": [ "env:prod", "team:acme" ]
}

Further Reading

Next, learn how to access your archived log content from Datadog:


Additional helpful documentation, links, and articles:


*Logging without Limits is a trademark of Datadog, Inc.