Distributed Tracing with AWS Lambda Serverless Applications

Trace Serverless Functions

By connecting your serverless traces to metrics, Datadog provides a context-rich picture of your application’s performance, allowing you to better troubleshoot performance issues given the distributed nature of serverless applications.

The Datadog Python, Node.js, Ruby, Go, Java, and .NET tracing libraries support distributed tracing for AWS Lambda.

Send traces from your serverless application

Architecture diagram for tracing AWS Lambda with Datadog

The Datadog Python, Node.js, Ruby, Go, Java, and .NET tracing libraries support distributed tracing for AWS Lambda. You can install the tracer using the installation instructions. If you already have the extension installed, ensure that the environment variable DD_TRACE_ENABLED is set to true.

Runtime recommendations

Python
Node.js
Ruby
Java
go
.NET

Python and Node.js

The Datadog Lambda Library and tracing libraries for Python and Node.js support:

  • Automatic correlation of Lambda logs and traces with trace ID and tag injection.
  • Installation without any code changes using Serverless Framework, AWS SAM and AWS CDK integrations.
  • Tracing HTTP requests invoking downstream Lambda functions or containers.
  • Tracing consecutive Lambda invocations made via the AWS SDK.
  • Cold start tracing
  • Tracing asynchronous Lambda invocations through AWS Managed Services
    • API Gateway
    • SQS
    • SNS
    • SNS and SQS direct integration
    • Kinesis
    • EventBridge
    • DynamoDB
    • S3
  • Tracing dozens of additional out-of-the-box Python and Node.js libraries.

For Python and Node.js serverless applications, Datadog recommends you install Datadog’s tracing libraries.

Looking to trace through serverless resources not listed above? Open a feature request.

Ruby

The Datadog Lambda Library and tracing libraries for Ruby support:

  • Automatic correlation of Lambda logs and traces with trace ID and tag injection.
  • Tracing HTTP requests invoking downstream Lambda functions or containers.
  • Tracing dozens of additional out-of-the-box Ruby libraries.

You can trace your serverless functions in Datadog with Datadog’s tracing libraries.

Looking to trace through serverless resources not listed above? Open a feature request.

Go

The Datadog Lambda Library and tracing libraries for Go support:

  • Manual correlation of Lambda logs and traces with trace ID and tag injection.
  • Tracing HTTP requests invoking downstream Lambda functions or containers.
  • Tracing dozens of additional out-of-the-box Go libraries.

For Go serverless applications, Datadog recommends installing Datadog’s tracing libraries.

Looking to trace through serverless resources not listed above? Open a feature request.

Java

The Datadog Lambda Library and tracing libraries for Java support:

  • Correlation of Lambda logs and traces with trace ID and tag injection. See Connecting Java logs and traces for more details.
  • Tracing HTTP requests invoking downstream Lambda functions or containers.
  • Tracing dozens of additional out-of-the-box Java libraries.

For Java serverless applications, Datadog recommends installing Datadog’s tracing libraries.

Have feedback on the Datadog’s tracing libraries for Java Lambda functions? Make sure to check out discussions going on in the #serverless channel in the Datadog Slack community.

.NET

The tracing library for .NET supports:

  • Tracing HTTP requests invoking downstream Lambda functions or containers.
  • Tracing dozens of additional out-of-the-box .NET libraries.

For .NET serverless applications, Datadog recommends installing Datadog’s tracing libraries.

Learn more about tracing through .NET Azure serverless applications.

Span Auto-linking

Datadog automatically detects linked spans when segements of your asynchronous requests cannot propogate trace context. For example, this may occur when a request triggers an S3 Change Events, or DynamoDB Streams. Span Auto-links appear in the Span Links tab in the Datadog Trace UI.

This feature is available in Python version 101 and higher.

Note: Span Auto-linking may not link traces if you are only ingesting a sample of your traces because the linked traces might be dropped before ingestion. To improve your chances of seeing auto-linked spans, increase your sample rate.

If you are viewing the request that originated before the change event and the linked trace is ingested, you can see the linked span as a Backward link.

If you are viewing the request that originated before the change event and the linked trace is ingested, you can see the linked span as a Forward link.

This functionality is available for Python instrumented AWS Lambda functions on layer version 101 and above and python applications instrumented with dd-trace-py on version 2.16 and above.

DyanmoDB Change Stream Auto-linking

For DyanmoDB Change Streams, Span Auto-linking supports the following operations:

  • PutItem
  • UpdateItem
  • DeleteItem
  • BatchWriteItem
  • TransactWriteItems

S3 Change Notification Auto-linking

For S3 Change Notifications, Span Auto-linking supports the following operations:

  • PutObject
  • CompleteMultipartUpload
  • CopyObject

Hybrid environments

If you have installed Datadog’s tracing libraries (dd-trace) on both your Lambda functions and hosts, your traces automatically show you the complete picture of requests that cross infrastructure boundaries, whether it be AWS Lambda, containers, on-prem hosts, or managed services.

If dd-trace is installed on your hosts with the Datadog Agent, and your serverless functions are traced with AWS X-Ray, trace merging is required to see a single, connected trace across your infrastructure. See the Serverless Trace Merging documentation to learn more about merging traces from dd-trace and AWS X-Ray.

Datadog’s AWS X-Ray integration only provides traces for Lambda functions. See the Datadog APM documentation to learn more about tracing in container or host-based environments.

Profiling your Lambda Functions

Datadog’s Continuous Profiler is available in Preview for Python in version 4.62.0 and layer version 62 and above. This optional feature is enabled by setting the DD_PROFILING_ENABLED environment variable to true.

The Continuous Profiler works by spawning a thread that periodically wakes up and takes a snapshot of the CPU and heap of all running Python code. This can include the profiler itself. If you want the profiler to ignore itself, set DD_PROFILING_IGNORE_PROFILER to true.

Trace Merging

Use cases

Datadog recommends using only the Datadog APM trace library (dd-trace), but in some advanced situations users can combine Datadog tracing and AWS X-Ray using trace merging. Trace merging is available for Node.js and Python AWS Lambda functions. If you aren’t sure which tracing library to use, read about choosing your tracing library.

There are two primary reasons for instrumenting both dd-trace and AWS X-Ray tracing libraries:

  • In an AWS serverless environment, you are already tracing your Lambda functions with dd-trace, you require AWS X-Ray active tracing for AWS managed services such as AppSync and Step Functions, and you want to visualize the dd-trace and AWS X-Ray spans in one single trace.
  • In a hybrid environment with both Lambda functions and hosts, dd-trace instruments your hosts, AWS X-Ray instruments your Lambda functions, and you want to visualize connected traces for transactions across Lambda functions and hosts.

Note: This may result in higher usage bills. X-Ray spans continue to be available in your merged traces after 2-5 minutes. In many cases, Datadog recommends only using a single tracing library. Learn more about choosing your tracing library.

You can find setup instructions for each of the above use cases below:

Trace merging in an AWS serverless environment

AWS X-Ray provides both a backend AWS service (AWS X-Ray active tracing) and a set of client libraries. Enabling the backend AWS service alone in the Lambda console gives you Initialization and Invocation spans for your AWS Lambda functions. You can also enable AWS X-Ray active tracing from the API Gateway and Step Function consoles.

Both the AWS X-Ray SDK and Datadog APM client libraries (dd-trace) add metadata and spans for downstream calls by accessing the function directly. Assuming you are using dd-trace to trace at the handler level, your setup should be similar to the following:

  1. You have enabled AWS X-Ray active tracing on your Lambda functions from the AWS Lambda console and our AWS X-Ray integration within Datadog.
  2. You have instrumented your Lambda functions with Datadog APM (dd-trace) by following the installation instructions for your Lambda runtime.
  3. Third-party libraries are automatically patched by dd-trace, so the AWS X-Ray client libraries do not need to be installed.
  4. Set the DD_MERGE_XRAY_TRACES environment variable to true on your Lambda functions to merge the X-Ray and dd-trace traces (DD_MERGE_DATADOG_XRAY_TRACES in Ruby).

Tracing across AWS Lambda and hosts

Context propagation with the Datadog tracing libraries

If you have installed Datadog’s tracing libraries (dd-trace) on both your Lambda functions and hosts, your traces will automatically show you the complete picture of requests that cross infrastructure boundaries, whether it be AWS Lambda, containers, on-prem hosts, or managed services.

Context propagation with the X-Ray integration

If dd-trace is installed on your hosts with the Datadog Agent, and your Node.js or Python serverless functions are traced with AWS X-Ray, your setup should be similar to the following:

  1. You have installed the AWS X-Ray integration for tracing your Lambda functions, enabling both AWS X-Ray active tracing and installing the X-Ray client libraries.
  2. You have installed the Datadog Lambda Library for your Lambda runtime, and the DD_TRACE_ENABLED environment variable is set to true.
  3. Datadog APM is configured on your hosts and container-based infrastructure.

Then, for X-Ray and Datadog APM traces to appear in the same flame graph, all services must have the same env tag.

Note: Distributed Tracing is supported for any runtime for your host or container-based applications. Your hosts and Lambda functions do not need to be in the same runtime.

trace of a request from a host to a Lambda function

Trace Propagation

Serverless Distributed Non-HTTP Trace

Required setup

Additional instrumentation is sometimes required to see a single, connected trace in Node and Python serverless applications asynchronously triggering Lambda functions. If you are just getting started with monitoring serverless applications in Datadog, follow our main installation steps and read this page on choosing your tracing library. Once you are sending traces from your Lambda functions to Datadog using the Datadog Lambda Library, you may want to follow these steps to connect traces between two Lambda functions in cases such as:

  • Triggering Lambda functions via Step Functions
  • Invoking Lambda functions via non-HTTP protocols such as MQTT

Tracing many AWS Managed services (listed here) is supported out-of-the-box and does not require following the steps outlined on this page.

To successfully connect trace context between resources sending traces, you need to:

  • Include Datadog trace context in outgoing events. The outgoing event can originate from a host or Lambda function with dd-trace installed.
  • Extract the trace context in the consumer Lambda function.

Passing trace context

The following code samples outline how to pass trace context in outgoing payloads to services which do not support HTTP headers, or managed services not supported natively by Datadog in Node and Python:

In Python, you can use the get_dd_trace_context helper function to pass tracing context to outgoing events in a Lambda functions:

import json
import boto3
import os

from datadog_lambda.tracing import get_dd_trace_context  # Datadog tracing helper function

def handler(event, context):
    my_custom_client.sendRequest(
        {
          'myCustom': 'data',
          '_datadog': {
              'DataType': 'String',
              'StringValue': json.dumps(get_dd_trace_context()) # Includes trace context in outgoing payload.
          },
        },
    )

In Node, you can use the getTraceHeaders helper function to pass tracing context to outgoing events in a Lambda function:

const { getTraceHeaders } = require("datadog-lambda-js"); // Datadog tracing helper function

module.exports.handler = async event => {
  const _datadog = getTraceHeaders(); // Captures current Datadog trace context.

  var payload = JSON.stringify({ data: 'sns', _datadog });
  await myCustomClient.sendRequest(payload)

From hosts

If you aren’t passing trace context from your Lambda functions, you can use the following code template in place of the getTraceHeaders and get_dd_trace_context helper functions to get the current span context. Instructions on how to do this in every runtime are outlined here.

const tracer = require("dd-trace");

exports.handler = async event => {
  const span = tracer.scope().active();
  const _datadog = {}
  tracer.inject(span, 'text_map', _datadog)

  // ...

Extracting trace context

To extract the above trace context from the consumer Lambda function, you need to define an extractor function that captures trace context before the execution of your Lambda function handler. To do this, configure the DD_TRACE_EXTRACTOR environment variable to point to the location of your extractor function. The format for this is <FILE NAME>.<FUNCTION NAME>. For example, extractors.json if the json extractor is in the extractors.js file. Datadog recommends you place your extractor methods all in one file, as extractors can be re-used across multiple Lambda functions. These extractors are completely customizable to fit any use case.

Notes:

  • If you are using TypeScript or a bundler like webpack, you must import or require your Node.js module where the extractors are defined. This ensures the module gets compiled and bundled into your Lambda deployment package.
  • If your Node.js Lambda function runs on arm64, you must define the extractor in your function code instead of using the DD_TRACE_EXTRACTOR environment variable.

Sample extractors

The following code samples outline sample extractors you might use for propagating trace context across a third party system, or an API which does not support standard HTTP headers.

def extractor(payload):
    trace_headers = json.loads(payload["_datadog"]);
    trace_id = trace_headers["x-datadog-trace-id"];
    parent_id = trace_headers["x-datadog-parent-id"];
    sampling_priority = trace_headers["x-datadog-sampling-priority"];
    return trace_id, parent_id, sampling_priority
exports.json = (payload) => {
    const traceData = payload._datadog
    const traceID = traceData["x-datadog-trace-id"];
    const parentID = traceData["x-datadog-parent-id"];
    const sampledHeader = traceData["x-datadog-sampling-priority"];
    const sampleMode = parseInt(sampledHeader, 10);

    return {
      parentID,
      sampleMode,
      source: 'event',
      traceID,
    };
};
var exampleSQSExtractor = func(ctx context.Context, ev json.RawMessage) map[string]string {
	eh := events.SQSEvent{}

	headers := map[string]string{}

	if err := json.Unmarshal(ev, &eh); err != nil {
		return headers
	}

	// Using SQS as a trigger with a batchSize=1 so it's important we check
  // for this as a single SQS message will drive the execution of the handler.
	if len(eh.Records) != 1 {
		return headers
	}

	record := eh.Records[0]

	lowercaseHeaders := map[string]string{}
	for k, v := range record.MessageAttributes {
		if v.StringValue != nil {
			lowercaseHeaders[strings.ToLower(k)] = *v.StringValue
		}
	}

	return lowercaseHeaders
}

cfg := &ddlambda.Config{
    TraceContextExtractor: exampleSQSExtractor,
}
ddlambda.WrapFunction(handler, cfg)

Sending traces to Datadog with the X-Ray Integration

If you are already tracing your serverless application with X-Ray and want to continue using X-Ray, you can install the AWS X-Ray integration to send traces from X-Ray to Datadog.

Further Reading