Active Directory

Supported OS Windows

Integration version4.4.1

Overview

Get metrics from Microsoft Active Directory to visualize and monitor its performances.

Minimum Agent version: 6.0.0

Setup

Installation

The Agent’s Active Directory check is included in the Datadog Agent package, so you don’t need to install anything else on your servers.

If installing the Datadog Agent on a domain environment, see the installation requirements for the Agent

Configuration

Metric collection

  1. Edit the active_directory.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting your Active Directory performance data. The default setup should already collect metrics for the localhost. See the sample active_directory.d/conf.yaml for all available configuration options.

  2. Restart the Agent

Note: Use version 4.4.0 or later of this check to collect the latest metrics.

Service checks

Datadog recommends enabling the Windows Services integration to also monitor the state of the Active Directory services.

Example configuration:

instances:
  - services:
    - ntds
    - netlogon
    - dhcp
    - dfsr
    - adws
    - kdc

Note: The Datadog Agent might not have access to all the services (e.g. NTDS). See Service permissions for more information to grant access.

Validation

Run the Agent’s status subcommand and look for active_directory under the Checks section.

Data Collected

Metrics

active_directory.dfsr.conflict_space_in_use
(gauge)
The total size (in bytes) of conflict files generated during DFS replication..
Shown as byte
active_directory.dfsr.deleted_space_in_use
(gauge)
The total size (in bytes) of files that have been deleted but are still in the staging folder.
Shown as byte
active_directory.dfsr.file_installs_retried
(gauge)
The total number of file installations that required retry during DFS replication.
Shown as event
active_directory.dfsr.staging_space_in_use
(gauge)
The current size (in bytes) of the DFS replication staging folder.
Shown as byte
active_directory.dhcp.failover.binding_updates_dropped
(gauge)
The number of DHCP binding update messages dropped.
Shown as message
active_directory.dhcp.failover.binding_updates_pending
(gauge)
The number of pending DHCP failover update messages.
Shown as message
active_directory.dhcp.failover.binding_updates_received
(gauge)
The number of DHCP failover update messages received (per second) from the failover partner.
Shown as message
active_directory.dhcp.failover.binding_updates_sent
(gauge)
The number of DHCP failover update messages sent (per second) to the failover partner.
Shown as message
active_directory.dra.inbound.bytes.after_compression
(gauge)
The compressed size (in bytes) of compressed replication data inbound from directory system agents (DSAs) in other sites (per second).
Shown as byte
active_directory.dra.inbound.bytes.before_compression
(gauge)
The uncompressed size (in bytes) of compressed replication data inbound from DSAs in other sites (per second).
Shown as byte
active_directory.dra.inbound.bytes.not_compressed
(gauge)
The uncompressed size (in bytes) of replication data that was not compressed at the source - that is, inbound from other DSAs in the same site (per second).
Shown as byte
active_directory.dra.inbound.bytes.total
(gauge)
The total number of bytes (per second) received through replication. It is the sum of the number of bytes of uncompressed data (never compressed) and compressed data (after compression).
Shown as byte
active_directory.dra.inbound.objects.applied_persec
(gauge)
The number of objects received (per second) from replication partners and applied by the local directory service. This counter excludes changes that are received but not applied (for example, when the update is already made). This counter indicates how many replication updates are occurring on the server as a result of changes generated on other servers.
Shown as object
active_directory.dra.inbound.objects.filtered_persec
(gauge)
The number of objects received (per second) from replication partners that contained no updates that needed to be applied.
Shown as object
active_directory.dra.inbound.objects.persec
(gauge)
The number of objects received (per second) through inbound replication from replication partners.
Shown as object
active_directory.dra.inbound.objects.remaining
(gauge)
The number of objects remaining until the full synchronization process is completed.
Shown as object
active_directory.dra.inbound.objects.remaining_in_packet
(gauge)
The number of object updates received in the current directory replication update packet that have not yet been applied to the local server. This counter tells you whether the monitored server is receiving changes, but is taking a long time applying them to the database.
Shown as object
active_directory.dra.inbound.properties.applied_persec
(gauge)
The number of changes (per second) to object properties that are applied through inbound replication as a result of reconciliation logic.
Shown as operation
active_directory.dra.inbound.properties.filtered_persec
(gauge)
The number of changes (per second) to object properties received during the replication that are already made.
Shown as operation
active_directory.dra.inbound.properties.total_persec
(gauge)
The total number of changes (per second) to object properties received from replication partners.
Shown as operation
active_directory.dra.inbound.values.dns_persec
(gauge)
The number of values of object properties received (per second) from replication partners in which the values are for object properties that belong to distinguished names. This number includes objects that reference other objects. A high number from this counter might explain why inbound changes are slow to be applied to the database.
Shown as operation
active_directory.dra.inbound.values.total_persec
(gauge)
The total number of values of object properties received (per second) from replication partners. Each inbound object has one or more properties, and each property has zero or more values. A value of zero indicates that the property is to be removed.
Shown as operation
active_directory.dra.outbound.bytes.after_compression
(gauge)
The compressed size (in bytes) of compressed replication data that is outbound to DSAs in other sites (per second).
Shown as byte
active_directory.dra.outbound.bytes.before_compression
(gauge)
The uncompressed size (in bytes) of compressed replication data outbound to DSAs in other sites (per second).
Shown as byte
active_directory.dra.outbound.bytes.not_compressed
(gauge)
The uncompressed size (in bytes) of outbound replication data that was not compressed - that is, outbound to DSAs in the same site (per second).
Shown as byte
active_directory.dra.outbound.bytes.total
(gauge)
The total number of bytes sent (per second). It is the sum of the number of bytes of uncompressed data (never compressed) and compressed data (after compression).
Shown as byte
active_directory.dra.outbound.objects.filtered_persec
(gauge)
The number of objects (per second) acknowledged by outbound replication partners that required no updates. This counter includes objects that the outbound partner did not already have.
Shown as object
active_directory.dra.outbound.objects.persec
(gauge)
The number of objects sent (per second) though outbound replication to replication partners.
Shown as object
active_directory.dra.outbound.properties.persec
(gauge)
The number of properties sent (per second). This counter tells you whether a source server is returning objects or not. Sometimes, the server might stop working correctly and not return objects quickly or at all.
Shown as operation
active_directory.dra.outbound.values.dns_persec
(gauge)
The number values of object properties sent (per second) to replication partners in which the values are for object properties that belong to distinguished names.
Shown as operation
active_directory.dra.outbound.values.total_persec
(gauge)
The total number of values of object properties sent (per second), to replication partners.
Shown as operation
active_directory.dra.replication.pending_synchronizations
(gauge)
The number of directory synchronizations that are queued for this server that are not yet processed. This counter helps in determining replication backlog - the larger the number, the larger the backlog.
Shown as event
active_directory.dra.sync_requests_made
(gauge)
The number of synchronization requests made to replication partners since computer was last restarted.
Shown as request
active_directory.ds.client_binds_persec
(gauge)
The number of LDAP client bind operations (per second) including both successful and failed attempts.
Shown as operation
active_directory.ds.threads_in_use
(gauge)
The current number of threads in use by the directory service (different from the number of threads in the directory service process). This counter represents the number of threads currently servicing API calls by clients, and you can use it to determine whether additional CPUs would be beneficial.
Shown as thread
active_directory.ldap.active_threads
(gauge)
The current number of threads in use by the LDAP subsystem for processing requests.
Shown as thread
active_directory.ldap.bind_time
(gauge)
The time (in milliseconds) required for the completion of the last successful LDAP binding.
Shown as millisecond
active_directory.ldap.client_sessions
(gauge)
The number of sessions of connected LDAP clients.
Shown as session
active_directory.ldap.searches_persec
(gauge)
The number of search operations (per second) performed by LDAP clients.
Shown as operation
active_directory.ldap.successful_binds_persec
(gauge)
The number of LDAP bindings (per second) that occurred successfully.
Shown as operation
active_directory.ldap.writes_persec
(gauge)
The number of LDAP write operations performed (per second).
Shown as operation
active_directory.netlogon.semaphore_acquires
(count)
The total number of times the Netlogon semaphore has been obtained since system startup.
Shown as event
active_directory.netlogon.semaphore_hold_time
(gauge)
The average time (in seconds) that the Netlogon semaphore is held. High values indicate slow authentication processing.
Shown as second
active_directory.netlogon.semaphore_holders
(gauge)
The number of threads currently holding the Netlogon semaphore.
Shown as thread
active_directory.netlogon.semaphore_timeouts
(count)
The total number of times a thread has timed out while waiting for the Netlogon semaphore.
Shown as timeout
active_directory.netlogon.semaphore_waiters
(gauge)
The number of threads waiting to obtain the Netlogon semaphore. A high value indicates authentication bottlenecks.
Shown as thread
active_directory.security.kerberos_authentications
(gauge)
The number of Kerberos authentications processed (per second).
Shown as operation
active_directory.security.ntlm_authentications
(gauge)
The number of NTLM authentications processed (per second).
Shown as operation

The integration collects metrics from the following Windows performance objects:

  • NTDS: Core Active Directory metrics including replication, LDAP operations, and directory service threads
  • Netlogon: Authentication performance metrics including semaphore statistics for monitoring authentication bottlenecks
  • Security System-Wide Statistics: Authentication protocol usage metrics (NTLM vs Kerberos)
  • DHCP Server: DHCP failover and binding update metrics (when DHCP Server role is installed)
  • DFS Replicated Folders: DFS replication health, conflicts, and staging metrics (when DFSR role is installed)
    • Note: Metrics are tagged with instance containing the DFS replication group name

Netlogon Metrics

The Netlogon metrics help monitor authentication performance and identify bottlenecks in domain controller authentication processing:

  • active_directory.netlogon.semaphore_waiters: Number of threads waiting for the authentication semaphore
  • active_directory.netlogon.semaphore_holders: Number of threads currently holding the semaphore
  • active_directory.netlogon.semaphore_acquires: Total number of semaphore acquisitions
  • active_directory.netlogon.semaphore_timeouts: Number of timeouts waiting for the semaphore
  • active_directory.netlogon.semaphore_hold_time: Average time (in seconds) the semaphore is held

These metrics are particularly useful for monitoring authentication load from network access control (NAC) devices, Wi-Fi authentication, and other authentication-heavy scenarios.

Use Cases

The Netlogon and Security metrics help address several monitoring scenarios:

  • Monitor authentication bottlenecks: Identify when authentication requests are queuing up, particularly from Cisco ISE NAC devices or high-volume Wi-Fi authentication
  • Track authentication processing times: Use semaphore_hold_time to determine if authentication is taking too long
  • Identify MaxConcurrentApi tuning needs: High semaphore_waiters values indicate the need to adjust the MaxConcurrentApi registry setting
  • Monitor authentication protocol usage: Track the ratio of NTLM vs Kerberos authentications to ensure proper protocol usage
  • Detect authentication timeouts and failures: Rising semaphore_timeouts indicate authentication infrastructure issues

Events

The Active Directory check does not include any events.

Troubleshooting

Need help? Contact Datadog support.