Active Directory

Supported OS Windows

Integration version4.3.0

Overview

Get metrics from Microsoft Active Directory to visualize and monitor its performances.

Setup

Installation

The Agent’s Active Directory check is included in the Datadog Agent package, so you don’t need to install anything else on your servers.

If installing the Datadog Agent on a domain environment, see the installation requirements for the Agent

Configuration

Metric collection

  1. Edit the active_directory.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting your Active Directory performance data. The default setup should already collect metrics for the localhost. See the sample active_directory.d/conf.yaml for all available configuration options.

  2. Restart the Agent

Note: Versions 1.13.0 or later of this check use a new implementation for metric collection, which requires Python 3. For hosts that are unable to use Python 3, or if you would like to use a legacy version of this check, refer to the following config.

Validation

Run the Agent’s status subcommand and look for active_directory under the Checks section.

Data Collected

Metrics

active_directory.dfsr.conflict_space_in_use
(gauge)
The total size of conflict files generated during DFS replication. Tagged by replication_group.
Shown as byte
active_directory.dfsr.deleted_space_in_use
(gauge)
The total size of files that have been deleted but are still in the staging folder. Tagged by replication_group.
Shown as byte
active_directory.dfsr.file_installs_retried
(count)
The total number of file installations that required retry during DFS replication. Tagged by replication_group.
active_directory.dfsr.staging_space_in_use
(gauge)
The current size of the DFS replication staging folder. Tagged by replication_group.
Shown as byte
active_directory.dhcp.failover.binding_updates_dropped
(count)
The total number of DHCP binding updates that were dropped due to queue overflow.
active_directory.dhcp.failover.binding_updates_pending
(gauge)
The current number of DHCP failover update messages waiting to be processed.
Shown as message
active_directory.dhcp.failover.binding_updates_received
(rate)
The rate of DHCP failover update messages received from the failover partner.
active_directory.dhcp.failover.binding_updates_sent
(rate)
The rate of DHCP failover update messages sent to the failover partner.
active_directory.dra.inbound.bytes.after_compression
(gauge)
The compressed size (in bytes) of compressed replication data inbound from directory system agents (DSAs) in other sites (per second).
Shown as byte
active_directory.dra.inbound.bytes.before_compression
(gauge)
The uncompressed size (in bytes) of compressed replication data inbound from DSAs in other sites (per second).
Shown as byte
active_directory.dra.inbound.bytes.not_compressed
(gauge)
The uncompressed size (in bytes) of replication data that was not compressed at the source - that is, inbound from other DSAs in the same site (per second).
Shown as byte
active_directory.dra.inbound.bytes.total
(gauge)
The total number of bytes (per second) received through replication. It is the sum of the number of bytes of uncompressed data (never compressed) and compressed data (after compression).
Shown as byte
active_directory.dra.inbound.objects.applied_persec
(gauge)
The number of objects received (per second) from replication partners and applied by the local directory service. This counter excludes changes that are received but not applied (for example, when the update is already made). This counter indicates how many replication updates are occurring on the server as a result of changes generated on other servers.
Shown as object
active_directory.dra.inbound.objects.filtered_persec
(gauge)
The number of objects received (per second) from replication partners that contained no updates that needed to be applied.
Shown as object
active_directory.dra.inbound.objects.persec
(gauge)
The number of objects received (per second) through inbound replication from replication partners.
Shown as object
active_directory.dra.inbound.objects.remaining
(gauge)
The number of objects remaining until the full synchronization process is completed.
Shown as object
active_directory.dra.inbound.objects.remaining_in_packet
(gauge)
The number of object updates received in the current directory replication update packet that have not yet been applied to the local server. This counter tells you whether the monitored server is receiving changes, but is taking a long time applying them to the database.
Shown as object
active_directory.dra.inbound.properties.applied_persec
(gauge)
The number of changes (per second) to object properties that are applied through inbound replication as a result of reconciliation logic.
active_directory.dra.inbound.properties.filtered_persec
(gauge)
The number of changes (per second) to object properties received during the replication that are already made.
active_directory.dra.inbound.properties.total_persec
(gauge)
The total number of changes (per second) to object properties received from replication partners.
active_directory.dra.inbound.values.dns_persec
(gauge)
The number of values of object properties received (per second) from replication partners in which the values are for object properties that belong to distinguished names. This number includes objects that reference other objects. A high number from this counter might explain why inbound changes are slow to be applied to the database.
active_directory.dra.inbound.values.total_persec
(gauge)
The total number of values of object properties received (per second) from replication partners. Each inbound object has one or more properties, and each property has zero or more values. A value of zero indicates that the property is to be removed.
active_directory.dra.outbound.bytes.after_compression
(gauge)
The compressed size (in bytes) of compressed replication data that is outbound to DSAs in other sites (per second).
Shown as byte
active_directory.dra.outbound.bytes.before_compression
(gauge)
The uncompressed size (in bytes) of compressed replication data outbound to DSAs in other sites (per second).
Shown as byte
active_directory.dra.outbound.bytes.not_compressed
(gauge)
The uncompressed size (in bytes) of outbound replication data that was not compressed - that is, outbound to DSAs in the same site (per second).
Shown as byte
active_directory.dra.outbound.bytes.total
(gauge)
The total number of bytes sent per second. It is the sum of the number of bytes of uncompressed data (never compressed) and compressed data (after compression).
Shown as byte
active_directory.dra.outbound.objects.filtered_persec
(gauge)
The number of objects (per second) acknowledged by outbound replication partners that required no updates. This counter includes objects that the outbound partner did not already have.
Shown as object
active_directory.dra.outbound.objects.persec
(gauge)
The number of objects sent (per second) though outbound replication to replication partners.
Shown as object
active_directory.dra.outbound.properties.persec
(gauge)
The number of properties sent per second. This counter tells you whether a source server is returning objects or not. Sometimes, the server might stop working correctly and not return objects quickly or at all.
active_directory.dra.outbound.values.dns_persec
(gauge)
The number values of object properties sent (per second) to replication partners in which the values are for object properties that belong to distinguished names.
active_directory.dra.outbound.values.total_persec
(gauge)
The total number of values of object properties sent (per second), to replication partners.
active_directory.dra.replication.pending_synchronizations
(gauge)
The number of directory synchronizations that are queued for this server that are not yet processed. This counter helps in determining replication backlog - the larger the number, the larger the backlog.
active_directory.dra.sync_requests_made
(gauge)
The number of synchronization requests made to replication partners since computer was last restarted.
Shown as request
active_directory.ds.client_binds_persec
(rate)
The rate of LDAP client bind operations per second including both successful and failed attempts.
active_directory.ds.threads_in_use
(gauge)
The current number of threads in use by the directory service (different from the number of threads in the directory service process). This counter represents the number of threads currently servicing API calls by clients, and you can use it to determine whether additional CPUs would be beneficial.
Shown as thread
active_directory.ldap.active_threads
(gauge)
The current number of threads in use by the LDAP subsystem for processing requests.
Shown as thread
active_directory.ldap.bind_time
(gauge)
The time (in milliseconds) required for the completion of the last successful LDAP binding.
Shown as millisecond
active_directory.ldap.client_sessions
(gauge)
The number of sessions of connected LDAP clients.
Shown as session
active_directory.ldap.searches_persec
(gauge)
The number of search operations per second performed by LDAP clients.
active_directory.ldap.successful_binds_persec
(gauge)
The number of LDAP bindings (per second) that occurred successfully.
active_directory.ldap.writes_persec
(rate)
The rate of LDAP write operations performed per second.
active_directory.netlogon.semaphore_acquires
(count)
The total number of times the Netlogon semaphore has been obtained since system startup.
active_directory.netlogon.semaphore_hold_time
(gauge)
The average time (in seconds) that the Netlogon semaphore is held. High values indicate slow authentication processing.
Shown as second
active_directory.netlogon.semaphore_holders
(gauge)
The number of threads currently holding the Netlogon semaphore.
Shown as thread
active_directory.netlogon.semaphore_timeouts
(count)
The total number of times a thread has timed out while waiting for the Netlogon semaphore.
Shown as timeout
active_directory.netlogon.semaphore_waiters
(gauge)
The number of threads waiting to obtain the Netlogon semaphore. A high value indicates authentication bottlenecks.
Shown as thread
active_directory.security.kerberos_authentications
(rate)
The rate of Kerberos authentications processed per second.
active_directory.security.ntlm_authentications
(rate)
The rate of NTLM authentications processed per second.

The integration collects metrics from the following Windows performance objects:

  • NTDS: Core Active Directory metrics including replication, LDAP operations, and directory service threads
  • Netlogon: Authentication performance metrics including semaphore statistics for monitoring authentication bottlenecks
  • Security System-Wide Statistics: Authentication protocol usage metrics (NTLM vs Kerberos)
  • DHCP Server: DHCP failover and binding update metrics (when DHCP Server role is installed)
  • DFS Replicated Folders: DFS replication health, conflicts, and staging metrics (when DFSR role is installed)
    • Note: Metrics are tagged with instance containing the DFS replication group name

Netlogon Metrics

The Netlogon metrics help monitor authentication performance and identify bottlenecks in domain controller authentication processing:

  • active_directory.netlogon.semaphore_waiters: Number of threads waiting for the authentication semaphore
  • active_directory.netlogon.semaphore_holders: Number of threads currently holding the semaphore
  • active_directory.netlogon.semaphore_acquires: Total number of semaphore acquisitions
  • active_directory.netlogon.semaphore_timeouts: Number of timeouts waiting for the semaphore
  • active_directory.netlogon.semaphore_hold_time: Average time (in seconds) the semaphore is held

These metrics are particularly useful for monitoring authentication load from network access control (NAC) devices, Wi-Fi authentication, and other authentication-heavy scenarios.

Use Cases

The Netlogon and Security metrics help address several monitoring scenarios:

  • Monitor authentication bottlenecks: Identify when authentication requests are queuing up, particularly from Cisco ISE NAC devices or high-volume Wi-Fi authentication
  • Track authentication processing times: Use semaphore_hold_time to determine if authentication is taking too long
  • Identify MaxConcurrentApi tuning needs: High semaphore_waiters values indicate the need to adjust the MaxConcurrentApi registry setting
  • Monitor authentication protocol usage: Track the ratio of NTLM vs Kerberos authentications to ensure proper protocol usage
  • Detect authentication timeouts and failures: Rising semaphore_timeouts indicate authentication infrastructure issues

Events

The Active Directory check does not include any events.

Service Checks

The Active Directory check does not include any service checks.

Troubleshooting

Need help? Contact Datadog support.