Network Performance Monitoring is now generally available! Network Monitoring is now available!

APM Troubleshooting

When experiencing unexpected behavior with Datadog APM, there are a few common issues you can look for before reaching out to Datadog support:

  1. Make sure the Agent has APM enabled:

    Run the following command on the Agent host: netstat -van | grep 8126.

    If you don’t see an entry, then the Agent is not listening on port 8126, which usually means either that the Agent is not running or that APM is not enabled in your datadog.yaml file. See the APM Agent setup documentation for more information.

  2. Ensure that the Agent is functioning properly:

    In some cases the Agent may have issues sending traces to Datadog. Enable Agent debug mode and check the Trace Agent logs to see if there are any errors.

  3. Verify that your tracer is running correctly:

    After having enabled tracer debug mode, check your Agent logs to see if there is more info about your issue.

If there are errors that you don’t understand, or traces are reported to be flushed to Datadog and you still cannot see them in the Datadog UI, contact Datadog support and provide the relevant log entries with a flare.

Tracer debug mode

Datadog debug settings are used to diagnose issues or audit trace data. Enabling debug mode in production systems is not recommended, as it increases the number of events that are sent to your loggers. Use it sparingly, for debugging purposes only.

Debug mode is disabled by default. To enable it, follow the corresponding language tracer instructions:

To enable debug mode for the Datadog Java Tracer, set the flag -Ddd.trace.debug=true when starting the JVM or add DD_TRACE_DEBUG=true as environment variable.

Note: Datadog Java Tracer implements SL4J SimpleLogger. As such, all of its settings can be applied like logging to a dedicated log file: -Ddatadog.slf4j.simpleLogger.logFile=<NEW_LOG_FILE_PATH>

To enable debug mode for the Datadog Python Tracer, set the environment variable DATADOG_TRACE_DEBUG=true when using ddtrace-run.

To enable debug mode for the Datadog Ruby Tracer, set the debug option to true in the tracer initialization configuration:

Datadog.configure do |c|
  c.tracer debug: true
end

Application Logs:

By default, all logs are processed by the default Ruby logger. When using Rails, you should see the messages in your application log file.

Datadog client log messages are marked with [ddtrace], so you can isolate them from other messages.

Additionally, it is possible to override the default logger and replace it with a custom one. This is done using the log attribute of the tracer.

f = File.new("<FILENAME>.log", "w+")           # Log messages should go there
Datadog.configure do |c|
  c.tracer log: Logger.new(f)                 # Overriding the default tracer
end

Datadog::Tracer.log.info { "this is typically called by tracing code" }

See the API documentation for more details.

To enable debug mode for the Datadog Go Tracer, enable the debug mode during the Start config:

package main

import "gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer"

func main() {
    tracer.Start(tracer.WithDebugMode(true))
    defer tracer.Stop()
}

To enable debug mode for the Datadog Node.js Tracer, enable it during its init:

const tracer = require('dd-trace').init({
  debug: true
})

Application Logs:

By default, logging from this library is disabled. In order to get debbuging information and errors sent to logs, the debug options should be set to true in the init() method.

The tracer will then log debug information to console.log() and errors to console.error(). This behavior can be changed by passing a custom logger to the tracer. The logger should contain debug() and error() methods that can handle messages and errors, respectively.

For example:

const bunyan = require('bunyan')
const logger = bunyan.createLogger({
  name: 'dd-trace',
  level: 'trace'
})

const tracer = require('dd-trace').init({
  logger: {
    debug: message => logger.trace(message),
    error: err => logger.error(err)
  },
  debug: true
})

Then check the Agent logs to see if there is more info about your issue:

  • If the trace was sent to the Agent properly, you should see Response from the Agent: OK log entries. This indicates that the tracer is working properly, therefore the problem may be with the Agent itself. Refer to the Agent troubleshooting guide for more information.

  • If an error was reported by the Agent (or the Agent could not be reached), you will see Error from the Agent log entries. In this case, validate your network configuration to ensure the Agent can be reached. If you are confident the network is functional and that the error is coming from the Agent, refer to the Agent troubleshooting guide.

If neither of these log entries is present, then no request was sent to the Agent, which means that the tracer is not instrumenting your application. In this case, contact Datadog support and provide the relevant log entries with a flare.

For more tracer settings, check out the API documentation.

To enable debug mode for the Datadog .NET Tracer, set the isDebugEnabled argument to true when creating a new tracer instance:

using Datadog.Trace;

var tracer = Tracer.Create(isDebugEnabled: true);

// optional: set the new tracer as the new default/global tracer
Tracer.Instance = tracer;

The environment variable DD_TRACE_DEBUG can also be set to true.

Logs files are saved in the following directories:

PlatformPath
Linux/var/log/datadog/
Windows%ProgramData%\Datadog .NET Tracer\logs\

To enable debug mode for the Datadog PHP Tracer, set the environment variable DD_TRACE_DEBUG=true. See the PHP configuration docs for details about how and when this environment variable value should be set in order to be properly handled by the tracer.

In order to tell PHP where it should put error_log messages, you can either set it at the server level, or as a PHP ini parameter, which is the standard way to configure PHP behavior.

If you are using an Apache server, use the ErrorLog directive. If you are using an NGINX server, use the error_log directive. If you are configuring instead at the PHP level, use PHP’s error_log ini parameter.

The release binary libraries are all compiled with debug symbols added to the optimized release. It is possible to use gdb or lldb to debug the library and to read core dumps. If you are building the library from source, pass the argument -DCMAKE_BUILD_TYPE=RelWithDebInfo to cmake to compile an optimized build with debug symbols.

cd .build
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo ..
make
make install

APM metrics sent by the Datadog Agent

Find below the list of out-of-the-box metrics sent by the Datadog Agent when APM is enabled:

Metric NameTypeDescription
datadog.trace_agent.obfuscationsCountIncrement by one every time an SQL obfuscation happens.
datadog.trace_agent.startedCountIncrement by one every time the Agent starts.
datadog.trace_agent.panicGaugeIncrement by one on every code panic.
datadog.trace_agent.heartbeatGaugeIncrement by one every 10 seconds.
datadog.trace_agent.heap_allocGaugeHeap allocations as reported by the Go runtime.
datadog.trace_agent.cpu_percentGaugeCPU usage (in cores), e.g. 50 (half a core), 200 (two cores), 250 (2.5 cores)
datadog.trace_agent.ratelimitGaugeIf lower than 1, it means payloads are being refused due to high resource usage (cpu or memory).
datadog.trace_agent.normalizer.spans_malformedCountNumber of spans having malformed fields that had to be altered in order for the system to accept them
datadog.trace_agent.receiver.traceCountNumber of traces received and accepted.
datadog.trace_agent.receiver.traces_receivedCountSame as above
datadog.trace_agent.receiver.traces_droppedCountTraces dropped due to normalization errors.
datadog.trace_agent.receiver.traces_filteredCountTraces filtered by ignored resources (as defined in datadog.yaml file).
datadog.trace_agent.receiver.traces_priorityCountTraces processed by priority sampler that have the priority tag.
datadog.trace_agent.receiver.traces_bytesCountTotal bytes of payloads accepted by the Agent.
datadog.trace_agent.receiver.spans_receivedCountTotal bytes of payloads received by the Agent.
datadog.trace_agent.receiver.spans_droppedCountTotal bytes of payloads dropped by the Agent.
datadog.trace_agent.receiver.spans_filteredCountTotal bytes of payloads filtered by the Agent
datadog.trace_agent.receiver.events_extractedCountTotal APM events sampled.
datadog.trace_agent.receiver.events_sampledCountTotal APM events sampled by the max_events_per_second parameter sampler.
datadog.trace_agent.receiver.payload_acceptedCountNumber of payloads accepted by the Agent.
datadog.trace_agent.receiver.payload_refusedCountNumber of payloads rejected by the receiver because of the sampling.
datadog.trace_agent.receiver.errorCountNumber of times that the API rejected a payload due to an error in either decoding, formatting or other.
datadog.trace_agent.receiver.oom_killCountNumber of times the Agent killed itself due to excessive memory use (150% of max_memory).
datadog.trace_agent.receiver.tcp_connectionsCountNumber of TCP connections coming in to the agent.
datadog.trace_agent.receiver.out_chan_fillGaugeInternal metric. Percentage of fill on the receiver’s output channel.
datadog.trace_agent.trace_writer.flush_durationGaugeTime it took to flush a payload to the Datadog API.
datadog.trace_agent.trace_writer.encode_msGaugeNumber of miliseconds it took to encode a trace payload.
datadog.trace_agent.trace_writer.compress_msGaugeNumber of miliseconds it took to compress an encoded trace payload.
datadog.trace_agent.trace_writer.payloadsCountNumber of payloads processed.
datadog.trace_agent.trace_writer.connection_fillHistogramPercentage of outgoing connections used by the trace writer.
datadog.trace_agent.trace_writer.queue_fillHistogramPercentage of outgoing payload queue fill.
datadog.trace_agent.trace_writer.droppedCountNumber of dropped payloads due to non retriable HTTP errors.
datadog.trace_agent.trace_writer.dropped_bytesCountNumber of dropped bytes due to non retriable HTTP errors.
datadog.trace_agent.trace_writer.payloadsCountNumber of payloads sent.
datadog.trace_agent.trace_writer.tracesCountNumber of traces processed.
datadog.trace_agent.trace_writer.eventsCountNumber of events processed.
datadog.trace_agent.trace_writer.spansCountNumber of spans processed.
datadog.trace_agent.trace_writer.bytesCountNumber of bytes sent (calculated after Gzip).
datadog.trace_agent.trace_writer.bytes_uncompressedCountNumber of bytes sent (calculated before Gzip).
datadog.trace_agent.trace_writer.bytes_estimatedCountNumber of bytes estimated by Agent internal algorithm.
datadog.trace_agent.trace_writer.retriesCountNumber of retries on failures to the Datadog API.
datadog.trace_agent.trace_writer.errorsCountErrors that could not be retried.
datadog.trace_agent.stats_writer.stats_bucketsCountNumber of stats buckets flushed.
datadog.trace_agent.stats_writer.bytesCountNumber of bytes sent (calculated after Gzip).
datadog.trace_agent.stats_writer.retriesCountNumber of retries on failures to the Datadog API
datadog.trace_agent.stats_writer.splitsCountNumber of times a payload was split into multiple ones.
datadog.trace_agent.stats_writer.errorsCountErrors that could not be retried.
datadog.trace_agent.stats_writer.encode_msHistogramTime it took to encode a stats payload.
datadog.trace_agent.stats_writer.connection_fillHistogramPercentage of outgoing connections used.
datadog.trace_agent.stats_writer.queue_fillHistogramPercentage of queue filled.
datadog.trace_agent.stats_writer.droppedCountNumber of payloads dropped due to non retriable HTTP errors.
datadog.trace_agent.stats_writer.dropped_bytesCountNumber of bytes dropped due to non retriable HTTP errors.
datadog.trace_agent.service_writer.servicesCountNumber of services flushed.
datadog.trace_agent.events.max_eps.max_rateGaugeSame as the Agent config’s max_events_per_second parameter.
datadog.trace_agent.events.max_eps.reached_maxGaugeIs set to 1 every time max_events_per_second is reached, otherwise it’s 0.
datadog.trace_agent.events.max_eps.current_rateGaugeCount of APM Events per second received by the Agent
datadog.trace_agent.events.max_eps.sample_rateGaugeSample rate applied by the Agent to Events it received

Further Reading