Data Streams Monitoring (DSM) provides visibility into your non-empty dead letter queues (DLQs), enabling you to monitor and inspect message processing failures. DSM also enables you to remediate these message processing failures directly within Datadog.
Monitoring dead letter queues is available for Amazon SQS queues.
Monitor DLQs
Setup
Usage
Create a monitor for a dead letter queue
To track if your queue is rerouting messages to its DLQ, you can create a metric monitors that alerts on the data_streams.sqs.dead_letter_queue.messages
metric.
To create a monitor for a queue’s DLQ:
- In Datadog, navigate to Data Streams Monitoring.
- Select the Explore tab (default).
- Click on a supported queue to open its side panel.
- Select the Dead Letter Queue tab.
- Click Create Monitor to open a monitor setup page. The default inputs are sufficient to create a monitor that alerts when your DLQ is non-empty, but you can also make additional configurations on this page if you wish.
- Click Create at the bottom of the page.
Detect message processing issues
Data Streams Monitoring helps you detect where messages couldn’t be processed and what downstream services could be affected:
The DSM Service Map highlights queues with messages in their DLQs, helping you to visually identify where failures occur
The DSM Issues page lists all queues that are experiencing message processing issues
You can inspect and resolve non-empty DLQs directly in Datadog by using Datadog Actions.
Setup
In Datadog, create a Connection. You need an IAM entity to perform the actions. This IAM entity can be an IAM User (with a secret access key) or IAM Role (assumed by using sts:AssumeRole
) and have the following permissions:
sqs:ReceiveMessage
(for peek)sqs:StartMessageMoveTask
(for redrive)sqs:PurgeQueue
(for purge)
These permissions can be applied globally to all SQS queues, or restricted to specific queues.
Usage
After you set up the connection, you can click on a supported queue to open its side panel, where you can use the following actions:
- Peek to inspect failed message content and identify the root cause
- Redrive to requeue messages for another processing attempt
- Purge to clear messages that no longer need processing
Troubleshooting
If you are unable to see dead letter queue information:
- Confirm that you have installed the Datadog-AWS integration
- Confirm that your AWS role uses the AWS-managed
AmazonSQSReadOnlyAccess
policy - Confirm that your role has
sqs:ListQueues
and sqs:GetQueueAttributes
permissions