Datadog automatically generates suggestions to resolve errors and performance problems and optimizes cost for your serverless applications.
Datadog uses AWS CloudWatch metrics, Datadog enhanced AWS Lambda metrics, and Lambda REPORT
log lines to suggest insights to you. To set these up,
REPORT
logs are indexed in Datadog.Note: Datadog generates High Errors, High Duration, Throttled, and High Iterator Age insights out of the box after setting up the AWS integration. All other insights, including those generated on individual invocations, require the Datadog Forwarder and Enhanced Lambda Metrics.
More than 1% of the function’s invocations were errors in the selected time range.
Resolution: Examine the function’s logs, check for recent code or configuration changes with Deployment Tracking, or look for failures across microservices with distributed tracing.
More than 10% of the function’s invocations were errors in the selected time range.
Resolution: Examine the function’s logs, check for recent code or configuration changes with Deployment Tracking, or look for failures across microservices with distributed tracing.
At least one invocation in the selected time range used over 95% of the allocated memory.
Distributed tracing can help you pinpoint Lambda functions with low memory limits and parts of your application using excessive amounts of memory.
Resolution: Lambda functions using close to their maximum configured memory are at risk of being killed by the Lambda runtime, resulting in user-facing errors. Consider increasing the amount of configured memory on your function. Note that this could affect your AWS bill.
At least one invocation in the selected time range exceeded 95% of the configured timeout.
Distributed tracing can help you pinpoint slow API calls in your application.
Resolution: Lambda functions running for close to their configured timeout are at risk of being killed by the Lambda runtime. This could lead to slow or failed responses to incoming requests. Consider increasing the configured timeout if you expect your function to need more execution time. Note that this could affect your AWS bill.
More than 1% of the function’s invocations were cold starts in the selected time range.
Datadog’s enhanced metrics and distributed tracing can help you understand the impact of cold starts on your applications today.
Resolution: Cold starts occur when your serverless applications receive sudden increases in traffic, and can occur when the function was previously inactive or when it was receiving a relatively constant number of requests. Users may perceive cold starts as slow response times or lag. To get ahead of cold starts, consider enabling provisioned concurrency on your impacted Lambda functions. Note that this could affect your AWS bill.
At least one invocation in the selected time range ran out of memory.
Resolution: Lambda functions that use more than their allotted amount of memory can be killed by the Lambda runtime. To users, this may look like failed requests to your application. Distributed tracing can help you pinpoint parts of your application using excessive amounts of memory. Consider increasing the amount of memory your Lambda function is allowed to use.
At least one invocation in the selected time range timed out. This occurs when your function runs for longer than the configured timeout or the global Lambda timeout.
Resolution: Distributed tracing can help you pinpoint slow requests to APIs and other microservices. You can also consider increasing the timeout of your function. Note that this could affect your AWS bill.
More than 10% of invocations in the selected time range were throttled. Throttling occurs when your serverless Lambda applications receive high levels of traffic without adequate concurrency.
Resolution: Check your Lambda concurrency metrics and confirm if aws.lambda.concurrent_executions.maximum
is approaching your AWS account concurrency level. If so, consider configuring reserved concurrency, or request a service quota increase from AWS. Note that this may affect your AWS bill.
The function’s iterator was older than two hours. Iterator age measures the age of the last record for each batch of records processed from a stream. When this value increases, it means your function cannot process data fast enough.
Resolution: Enable distributed tracing to isolate why your function has so much data being streamed to it. You can also consider increasing the shard count and batch size of the stream your function reads from.
No invocation in the selected time range used more than 10% of the allocated memory. This means your function has more billable resources allocated to it than it may need.
Resolution: Consider decreasing the amount of allocated memory on your Lambda function. Note that this may affect your AWS bill.
Additional helpful documentation, links, and articles:
On this Page