Logging is here!

AWS Lambda

Crawler Crawler

Overview

Amazon Lambda is a compute service that runs code in response to events and automatically manages the compute resources required by that code.

Enable this integration to begin collecting Cloudwatch & custom metrics from your Lambda functions.

Setup

Installation

If you haven’t already, set up the Amazon Web Services integration first.

Metric collection

  1. In the AWS integration tile, ensure that Lambda is checked under metric collection.

  2. Add those permissions to your Datadog IAM policy in order to collect Amazon Lambda metrics:

    • lambda:List*: List Lambda functions, metadata, and tags.
    • logs:DescribeLogGroups: List available log groups.
    • logs:DescribeLogStreams: List available log streams for a group.
    • logs:FilterLogEvents: Fetch specific log events for a stream to generate metrics.
    • tag:GetResources: Get custom tags applied to Lambda functions.

    For more information on Lambda policies, review the documentation on the AWS website.

  3. Install the Datadog - AWS Lambda integration.

Log collection

  1. If you haven’t already, set up the Datadog log collection AWS Lambda function.
  2. Once the lambda function is installed, manually add a trigger on the Cloudwatch Log group that contains your API Gateway logs in the AWS console:
    cloudwatch log group
    Select the corresponding CloudWatch Log group, add a filter name (but feel free to leave the filter empty) and add the trigger: Ensure that the log group you select is not the log group of another Lambda that is pushing logs to Datadog. This will create a cyclical trigger that generates a very large amount of logs, which in turn will affect your Datadog bill.
    cloudwatch trigger

Once done, go in your Datadog Log section to start exploring your logs!

Lambda metrics

To send custom metrics to Datadog from your lambda logs, print a log line from your Lambda, using the following format:

MONITORING|<unix_epoch_timestamp>|<value>|<metric_type>|<metric_name>|#<tag_list>

Where:

  • MONITORING signals to the Datadog integration that it should collect this log entry

  • <unix_epoch_timestamp> is in seconds, not milliseconds

  • <value> MUST be a number (i.e. integer or float)

  • <metric_type> is count, gauge, histogram, or check

  • <metric_name> uniquely identifies your metric and adheres to the metric naming policy

  • <tag_list> is optional, comma separated, and must be preceded by #.
    The tag function_name:<name_of_the_function> will automatically be applied to custom metrics

We highly recommend this method for sending custom metrics from your Lambdas. We only store one point per time series per second, so it’s hard to ensure that a time series from a Lambda function is unique without adding a UUID as a tag. Adding a UUID tag for every Lambda execution would cause the cardinality of the metric to explode.

Sending metrics via the CloudWatch log avoids this issue because we aggregate over all calls of the function to generate the value for each time series —for counts we take the sum for each timestamp; for gauges we take the last value.

Sample snippets (in Python):

Count/Gauge
unix_epoch_timestamp = int(time.time())
value = 42
metric_type = 'count'
metric_name = 'my.metric.name'
tags = ['tag1:value', 'tag2']

print('MONITORING|{0}|{1}|{2}|{3}|#{4}'.format(
    unix_epoch_timestamp, value, metric_type, metric_name, ','.join(tags)
))
Histogram
unix_epoch_timestamp = int(time.time())
metric_type = 'histogram'
metric_name = 'my.metric.name.hist'
tags = ['tag1:value', 'tag2']

for i in xrange(0,10):
    print('MONITORING|{0}|{1}|{2}|{3}|#{4}'.format(
        unix_epoch_timestamp, i, metric_type, metric_name, ','.join(tags)
))
Using the histogram metric type provides avg, count, max, min, 95p, and median values. These values are calculated at one second granularity.
Service Check
unix_epoch_timestamp = int(time.time())
value = 1 # WARNING
metric_type = 'check'
metric_name = 'my.metric.name.check'

print('MONITORING|{0}|{1}|{2}|{3}'.format(
    timestamp, value, metric_type, metric_name
))

Data Collected

Metrics

aws.lambda.duration
(gauge)
Measures the average elapsed wall clock time from when the function code starts executing as a result of an invocation to when it stops executing.
shown as millisecond
aws.lambda.duration.maximum
(gauge)
Measures the maximum elapsed wall clock time from when the function code starts executing as a result of an invocation to when it stops executing.
shown as millisecond
aws.lambda.duration.minimum
(gauge)
Measures the minimum elapsed wall clock time from when the function code starts executing as a result of an invocation to when it stops executing.
shown as millisecond
aws.lambda.duration.sum
(gauge)
Measures the total execution time of the lambda function executing.
shown as millisecond
aws.lambda.duration.p80
(gauge)
Measures the p80 elapsed wall clock time from when the function code starts executing as a result of an invocation to when it stops executing.
shown as millisecond
aws.lambda.duration.p95
(gauge)
Measures the p95 elapsed wall clock time from when the function code starts executing as a result of an invocation to when it stops executing.
shown as millisecond
aws.lambda.duration.p99
(gauge)
Measures the p99 elapsed wall clock time from when the function code starts executing as a result of an invocation to when it stops executing.
shown as millisecond
aws.lambda.duration.p99.9
(gauge)
Measures the p99.9 elapsed wall clock time from when the function code starts executing as a result of an invocation to when it stops executing.
shown as millisecond
aws.lambda.errors
(count)
Measures the number of invocations that failed due to errors in the function (response code 4XX).
shown as error
aws.lambda.invocations
(count)
Measures the number of times a function is invoked in response to an event or invocation API call.
shown as invocation
aws.lambda.throttles
(count)
Measures the number of Lambda function invocation attempts that were throttled due to invocation rates exceeding the customer's concurrent limits (error code 429). Failed invocations may trigger a retry attempt that succeeds.
shown as throttle
aws.lambda.iterator_age
(gauge)
Measures the age of the last record for each batch of records processed
shown as millisecond
aws.lambda.iterator_age.minimum
(gauge)
Measures the minimum age of the last record for each batch of records processed
shown as millisecond
aws.lambda.iterator_age.maximum
(gauge)
Measures the maximum age of the last record for each batch of records processed
shown as millisecond
aws.lambda.iterator_age.sum
(gauge)
Measures the sum of the ages of the last record for each batch of records processed
shown as millisecond
aws.lambda.dead_letter_errors
(count)
Measures the sum of times Lambda is unable to write the failed event payload to your configured Dead Letter Queues.
shown as error
aws.lambda.concurrent_executions
(gauge)
Measures the average of concurrent executions for a given function at a given point in time.
shown as execution
aws.lambda.concurrent_executions.minimum
(gauge)
Measures the minimum of concurrent executions for a given function at a given point in time.
shown as execution
aws.lambda.concurrent_executions.maximum
(gauge)
Measures the maximum of concurrent executions for a given function at a given point in time.
shown as execution
aws.lambda.concurrent_executions.sum
(gauge)
Measures the sum of concurrent executions for a given function at a given point in time.
shown as execution
aws.lambda.unreserved_concurrent_executions
(gauge)
Measures the sum of the concurrency of the functions that don't have a custom concurrency limit specified.
shown as execution

The metrics above get tagged in Datadog with any tags from AWS, including (but not limited to) function name, security-groups, and more.

Custom metrics only get tagged with function name.

Events

The AWS Lambda integration does not include any event at this time.

Service Checks

The AWS Lambda integration does not include any service check at this time.

Troubleshooting

Need help? Contact Datadog Support.

Further Reading