(LEGACY) Route Logs in Datadog-Rehydratable Format to Amazon S3

Docs > Observability Pipelines > (LEGACY) Observability Pipelines Documentation > (LEGACY) Observability Pipelines Guides > (LEGACY) Route Logs in Datadog-Rehydratable Format to Amazon S3

Cette page n'est pas encore disponible en français, sa traduction est en cours.
Si vous avez des questions ou des retours sur notre projet de traduction actuel, n'hésitez pas à nous contacter.

The Observability Pipelines Datadog Archives destination is in beta.

Overview

The Observability Pipelines datadog_archives destination formats logs into a Datadog-rehydratable format and then routes them to Log Archives. These logs are not ingested into Datadog, but are routed directly to the archive. You can then rehydrate the archive in Datadog when you need to analyze and investigate them.

The Observability Pipelines Datadog Archives destination is useful when:

You have a high volume of noisy logs, but you may need to index them in Log Management ad hoc.
You have a retention policy.

For example in this first diagram, some logs are sent to a cloud storage for archiving and others to Datadog for analysis and investigation. However, the logs sent directly to cloud storage cannot be rehydrated in Datadog when you need to investigate them.

A diagram showing logs going to cloud storage and Datadog.

In this second diagram, all logs are going to the Datadog Agent, including the logs that went to a cloud storage in the first diagram. However, in the second scenario, before the logs are ingested into Datadog, the datadog_archives destination formats and routes the logs that would have gone directly to a cloud storage to Datadog Log Archives instead. The logs in Log Archive can be rehydrated in Datadog when needed.

A diagram showing all logs going to Datadog.

This guide walks you through how to:

Configure a Log Archive
Configure the datadog_archives destination
Rehydrate your archive

datadog_archives is available for Observability Pipelines Worker version 1.5 and later.

Configure a Log Archive

Create an Amazon S3 bucket

Navigate to Amazon S3 buckets.
Click Create bucket.
Enter a descriptive name for your bucket.
Do not make your bucket publicly readable.
Optionally, add tags.
Click Create bucket.

Set up an IAM policy that allows Workers to write to the S3 bucket

Navigate to the IAM console.
Select Policies in the left side menu.
Click Create policy.
Click JSON in the Specify permissions section.

Copy the below policy and paste it into the Policy editor. Replace <MY_BUCKET_NAME> and <MY_BUCKET_NAME_1_/_MY_OPTIONAL_BUCKET_PATH_1> with the information for the S3 bucket you created earlier.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "DatadogUploadAndRehydrateLogArchives",
            "Effect": "Allow",
            "Action": ["s3:PutObject", "s3:GetObject"],
            "Resource": "arn:aws:s3:::<MY_BUCKET_NAME_1_/_MY_OPTIONAL_BUCKET_PATH_1>/*"
        },
        {
            "Sid": "DatadogRehydrateLogArchivesListBucket",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::<MY_BUCKET_NAME>"
        }
    ]
}

Click Next.
Enter a descriptive policy name.
Optionally, add tags.
Click Create policy.

Create an IAM user

Create an IAM user and attach the IAM policy you created earlier to it.

Navigate to the IAM console.
Select Users in the left side menu.
Click Create user.
Enter a user name.
Click Next.
Select Attach policies directly.
Choose the IAM policy you created earlier to attach to the new IAM user.
Click Next.
Optionally, add tags.
Click Create user.

Create access credentials for the new IAM user. Save these credentials as AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY.

Create a service account

Create a service account to use the policy you created above. In the Helm configuration, replace ${DD_ARCHIVES_SERVICE_ACCOUNT} with the name of the service account.

Create an IAM user

Create an IAM user and attach the IAM policy you created earlier to it.

Navigate to the IAM console.
Select Users in the left side menu.
Click Create user.
Enter a user name.
Click Next.
Select Attach policies directly.
Choose the IAM policy you created earlier to attach to the new IAM user.
Click Next.
Optionally, add tags.
Click Create user.

Create access credentials for the new IAM user. Save these credentials as AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY.

Create an IAM user

Create an IAM user and attach the IAM policy you created earlier to it.

Navigate to the IAM console.
Select Users in the left side menu.
Click Create user.
Enter a user name.
Click Next.
Select Attach policies directly.
Choose the IAM policy you created earlier to attach to the new IAM user.
Click Next.
Optionally, add tags.
Click Create user.

Create access credentials for the new IAM user. Save these credentials as AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY.

Attach the policy to the IAM instance profile

Attach the policy to the IAM Instance Profile that is created with Terraform, which you can find under the iam-role-name output.

Connect the S3 bucket to Datadog Log Archives

Navigate to Datadog Log Forwarding.
Click Add a new archive.
Enter a descriptive archive name.
Add a query that filters out all logs going through log pipelines so that none of those logs go into this archive. For example, add the query observability_pipelines_read_only_archive, assuming no logs going through the pipeline have that tag added.
Select AWS S3.
Select the AWS Account that your bucket is in.
Enter the name of the S3 bucket.
Optionally, enter a path.
Check the confirmation statement.
Optionally, add tags and define the maximum scan size for rehydration. See Advanced settings for more information.
Click Save.

See the Log Archives documentation for additional information.

Configure the `datadog_archives` destination

You can configure the datadog_archives destination using the configuration file or the pipeline builder UI.

If the Worker is ingesting logs that are not coming from the Datadog Agent and are routed to the Datadog Archives destination, those logs are not tagged with reserved attributes. This means that you lose Datadog telemetry and the benefits of unified service tagging. For example, say your syslogs are sent to datadog_archives and those logs have the status tagged as severity instead of the reserved attribute of status and the host tagged as hostname instead of the reserved attribute host. When these logs are rehydrated in Datadog, the status for the logs are all set to info and none of the logs will have a hostname tag.

Configuration file

For manual deployments, the sample pipelines configuration file for Datadog includes a sink for sending logs to Amazon S3 under a Datadog-rehydratable format.

In the sample pipelines configuration file, replace AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY with the AWS credentials you created earlier.

Replace ${DD_ARCHIVES_BUCKET} and ${DD_ARCHIVES_REGION} parameters based on your S3 configuration.

Pipeline builder UI

Navigate to your Pipeline.
(Optional) Add a remap transform to tag all logs going to datadog_archives.
a. Click Edit and then Add More in the **Add Transforms.
b. Click the Remap tile.
c. Enter a descriptive name for the component.
d. In the Inputs field, select the source to connect this destination to.
e. Add .sender = "observability_pipelines_worker" in the Source section.
f. Click Save.
g. Navigate back to your pipeline.
Click Edit.
Click Add More in the Add Destination tile.
Click the Datadog Archives tile.
Enter a descriptive name for the component.
Select the sources or transforms to connect this destination to.

In the Bucket field, enter the name of the S3 bucket you created earlier.
Enter aws_s3 in the Service field.
Toggle AWS S3 to enable those specific configuration options.
In the Storage Class field, select the storage class in the dropdown menu.
Set the other configuration options based on your use case.
Click Save.

In the Bucket field, enter the name of the S3 bucket you created earlier.
Enter azure_blob in the Service field.
Toggle Azure Blob to enable those specific configuration options.
Enter the Azure Blob Storage Account connection string.
Set the other configuration options based on your use case.
Click Save.

In the Bucket field, enter the name of the S3 bucket you created earlier.
Enter gcp_cloud_storage in the Service field.
Toggle GCP Cloud Storage to enable those specific configuration options.
Set the configuration options based on your use case.
Click Save.

If you are using Remote Configuration, deploy the change to your pipeline in the UI. For manual configuration, download the updated configuration and restart the worker.

See Datadog Archives reference for details on all configuration options.

Rehydrate your archive

See Rehydrating from Archives for instructions on how to rehydrate your archive in Datadog so that you can start analyzing and investigating those logs.