- 필수 기능
- 시작하기
- Glossary
- 표준 속성
- Guides
- Agent
- 통합
- 개방형텔레메트리
- 개발자
- Administrator's Guide
- API
- Datadog Mobile App
- CoScreen
- Cloudcraft
- 앱 내
- 서비스 관리
- 인프라스트럭처
- 애플리케이션 성능
- APM
- Continuous Profiler
- 스팬 시각화
- 데이터 스트림 모니터링
- 데이터 작업 모니터링
- 디지털 경험
- 소프트웨어 제공
- 보안
- AI Observability
- 로그 관리
- 관리
Users are trusted entities in your systems with access to sensitive information and the ability to perform sensitive actions. Malicious actors have identified users as an opportunity to target websites and steal valuable data and resources.
Datadog App and API Protection (AAP) provides built-in detection and protection capabilities to help you manage this threat.
This guide describes how to use AAP to prepare for and respond to account takeover (ATO) campaigns. This guide is divided into three phases:
To detect malicious patterns, AAP requires visibility into your users’ login activity. This phase describes how to enable and validate this visibility.
This step describes how to set up your service to use AAP.
Go to Software Catalog, click the Security lens, and search for your login service name.
Click on the service to open its details. If the Threat management pill is green, AAP is enabled and you may move to Step 1.3: Validating whether login information is automatically collected.
If AAP isn’t enabled, the panel displays the Discover AAP button.
To set up AAP, move to Step 1.2: Enabling AAP on login service.
To enable AAP on your login service, ensure you meet the following requirements:
To enable AAP using a new deployment, use the APPSEC_ENABLED
environment variable/library configuration or Remote Configuration. You can use either method, but Remote Configuration can be set up using the Datadog UI.
To enable AAP using Remote Configuration, and without having to restart your services, do the following:
When you see traces from your service in AAP Traces, move to Step 1.3: Validating login information is automatically collected.
For more detailed instructions on using a new deployment, see Enabling AAP Threat Detection using Datadog Tracing Libraries.
After you have enabled AAP, you can validate that login information is collected by Datadog.
Note: After AAP is enabled on a service, wait a few minutes for users to log into the service or log into the service yourself.
To validate login information is collected, do the following:
@appsec.security\activity:business\logic.users.login.*
.If you don’t see login activity from a service, go to Step 1.5: Manually instrumenting your services.
To validate that login metadata is collected, do the following:
business_logic.users.login.success
or business_logic.users.login.failure
.Review a few traces, both login successes and login failures. For login failures, look for traces with usr.exists
as true
(failed login attempt by an existing user) and false
.
The checks must be done whether or not the user exists.
In the event of a false user (usr.exists:false
), look for the following issues:
usr.login
and usr.exists
in the case of login failure, and usr.login
and usr.id
in the case of login success. If some metadata is missing, go to Step 1.5: Manually instrumenting your services.If the instrumentation is correct, go to Phase 2: Preparing for Account Takeover campaigns.
AAP collects login information and metadata using an SDK embedded in the Datadog libraries. Instrumentation is performed by calling the SDK when a user login is successful/fails and by providing the SDK with the metadata of the login. The SDK attaches the login and the metadata to the trace and sends it to Datadog where it is retained.
To manually instrument your services, do the following:
usr.login
: Mandatory for login success and failure. This field contains the name used to log into the account. The name might be an email address, a phone number, a username, or something else. The purpose of this field is to identify targeted accounts even if they don’t exist in your systems because a user might be able to change those accounts. Also, this field provides information on the location of the database used by the attacker. This value shouldn’t be confused with usr.id
.usr.exists
: Mandatory for login failures. This field is required for some default detections. The field helps to lower the priority of attempts targeted at accounts that don’t exist in your systems.usr.exists
: Mandatory for login failures. This field is required for some default detections. The field helps to lower the priority of attempts targeted at accounts that don’t exist in your systems.After deploying the code, validate the instrumentation is correct by following the steps in Step 1.4: Validating login metadata is automatically collected.
AAP can use custom In-App WAF rules to flag login attempts and extract the metadata from the request needed by detection rules.
This approach requires that Remote Configuration is enabled and working. Verify Remote Configuration is running for this service in Remote Configuration.
To use custom In-App WAF rules, do the following:
users.login.failure
for login failures and users.login.success
for login successes.POST
), the URI with a regex (^/login
), and the status code (403 for failures, 302 or 200 for success).usr.login
. Assuming the login was provided in the request, you can add a condition and set store value as tag
as the operator.Tag
field to the name of the tag where you want to save the value captured using usr.login
.To validate that the instrumentation is correct, see Step 1.4: Validating login metadata is automatically collected.
For more details, see Tracking business logic information without modifying the code.
After setting up instrumentation for your services, AAP monitors for attack campaigns. You can review the traffic in the Attacks overview Business logic section.
AAP detects multiple attacker strategies. Upon detecting an attack with a high level of confidence, the built-in detection rules generate a signal.
The severity of the signal is set based on the urgency of the threat: from Low in case of unsuccessful attacks to Critical in case of successful account compromises.
The actions covered in the next sections help you to identify and leverage detections faster.
Notifications provide a warning on your preferred channel when a signal is triggered. To create a notification rule, do the following:
category:account_takeover
, and expand the severities to include Medium
.In microservice environments, services are generally reached by internal hosts running other services. This internal environment makes it challenging to identify the unique traits of the original attacker’s request, such as IP, user agent, fingerprint, etc.
AAP Traces can help you validate that the login event is properly tagged with the source IPs, user agent, etc. To validate, review login traces in Traces and check for the following:
@http.client_ip
) are varied and public IPs.X-Forwarded-For
. You can use a custom header for better security and configure the tracer to read it using the DD_TRACE_CLIENT_IP_HEADER
environment variable.@http.user_agent
) is consistent with the expected traffic (web browser, mobile app, etc.)accept-encoding
) aren’t forwarded to the instrumented service. This impairs the generation of fingerprints (@appsec.fingerprint.*
) and degrades the signal’s ability to isolate an attacker’s activity.AAP automatic blocking can be used to block attacks at any time of the day. Automatic blocking can help block attacks before your team members are online, providing security during off hours. Within an ATO, automatic blocking can help mitigate the load issues caused by the increase in failed login attempts or prevent the attacker from using compromised accounts.
You can configure automatic blocking to block IPs identified as part of an attack. This is only a partial remediation because attackers can change IPs; however, it can give you more time to implement comprehensive remediation.
To configure automatic blocking, do the following:
tag:"category:account_takeover"
.Datadog does not recommend permanent blocking of IP addresses. Attackers are unlikely to reuse IPs and permanent blocking could result in blocking users. Moreover, AAP has a limit of how many IPs it can block (~10000
), and this could fill this list with unnecessary IPs.
This section describes common account takeover hacker behavior and how to triage, investigate, and monitor detections.
Eventually, your systems come under attack. The wave of malicious login attempts can often eclipse the volume of normal login activity the service is expecting. The load might increase causing availability problems and the attacker could at any time successfully log into an account.
The actions the attackers take depend on their strategy and the configurations of your systems. Some attackers might decide to immediately abuse their access to extract value before you’ve had time to freeze their compromised accounts. Others might keep the accounts dormant until a later time.
Many strategies are available, but it’s important to understand that the value chain of attacks is often carefully divided:
When an attack begins against your systems, the system generates signals labeled Credential Stuffing, Distributed Credential Stuffing, or Bruteforce, depending on the attacker’s strategy.
The first step is to confirm that the detection is correct. Certain behaviors, such as a security scan on a login endpoint or a lot of token rotation, might appear to the detection as an attack. The analysis depends on the signal, and the following examples provide general guidance that should be customized for your situation.
The signal is looking for an attempt to steal a user account by trying many different passwords for this account. Generally, a small number of accounts are targeted by these campaigns.
Review the accounts flagged as compromised. Click on a user to open a summary of recent activity.
Questions for triage:
If the answer to those questions is yes, the signal is likely legitimate.
You can adapt your response based on the sensitivity of the account. For example, a free account with limited access versus an admin account.
This signal is looking for a large number of accounts with failed logins coming from a small number of IPs. This is often caused by unsophisticated attackers.
Review the accounts flagged as targeted.
If they share attributes, such as all coming from one institution, check whether the IP might be a proxy for this institution by reviewing its past activity by hovering over it and opening the side panel.
Questions for triage:
If the answer to those questions is yes, the signal is likely legitimate.
You can adapt your response based on the scale of the attack and whether accounts are being compromised.
This signal is looking for a large increase in the overall number of login failures on a service. This is caused by sophisticated attackers leveraging a botnet.
Datadog tries to identify common attributes between the login failures in your service. This can surface defects in the attacker script that can be used to isolate the malicious activity. When found, a section called Attacker Attributes is shown. If present, review whether this is legitimate activity by selecting the cluster and clicking on Explore clusters.
If accurate, the activity of the cluster should closely match the increase in login failures while also being low/nonexistent before.
If no cluster is available, click Investigate in full screen and review the targeted users/IPs for outliers.
If the list is truncated, click View in AAP Protection Trace Explorer and run the investigation with the Traces explorer. For additional tools, see Step 3.3: Investigation.
If the conclusion of the triage is that the signal is a false positive, you can flag it as a false positive and close it.
If the false positive was caused by a unique setting in your service, you can add suppression filters to silence false positives.
If the signal is legitimate, move to step Step 3.2: Preliminary response.
If the attack is ongoing, you might want to disrupt the attacker as you investigate further. Disrupting the attacker slows down the attack and reduce the number of compromised accounts.
Enforcing this preliminary response requires that Remote Configuration is enabled for your services.
If you want to initiate a partial response, do the following:
The attackers are likely using a small number of IPs. To block them, open the signal and use Next Steps. You can set the duration of blocking.
Datadog recommends 12h, which is enough for the attack to stop and avoid blocking legitimate users when, after the attack, those IPs get recycled to legitimate users. Datadog does not recommend permanent blocking.
You can also block compromised users, although a better approach would be to extract them and reset their credentials using your own systems.
Finally, you can enable automated IP blocking from the Next Step section so that new IPs are automatically blocked while you’re running your investigation.
These attacks often use a large number of disposable IPs. Due to Datadog’s latency, it’s impractical to block login attempts by blocking the IP before the attacker drops it from their pool.
Instead, block traits of the request that are unique to the malicious attempt (a user agent, a specific header, a fingerprint, etc.).
In a Distributed Credential Stuffing campaign signal, Datadog automatically identifies clear traits and presents them as Attacker Attributes.
Before blocking, Datadog recommends that you review the activity from the cluster to confirm that the activity is indeed malicious.
The questions you’re trying to answer are:
To do so, select your cluster and click on Explore clusters.
The Investigate explorer appears and provides cluster traffic indicators: a large share of the traffic from the attack and a high proportion of IPs flagged by Threat Intelligence.
Those are two important indicators:
Click an indicator to see further information about the cluster traffic.
In Cluster Activity, there is a visualization of the volume of the overall APM traffic matching this cluster. While comparing it to the AAP data, beware the scale, since APM data may be sampled while AAP’s isn’t.
In the following example, a lot of traffic comes from before the attack. This means a legitimate activity matches this cluster in normal traffic and it would get blocked if you were to take action. You don’t need to escalate or click Block All Attacking IPs in the signal.
In a different example, the activity from the cluster started with the attack. This means there shouldn’t be collateral damage and you can proceed to block.
After confirming that the traits match the attackers, you can push an In-App WAF rule to block requests matching those traits. This is supported for user agent-based traits only.
To create the rule, do the following:
matches value in list
. If you want more flexibility, you can also use a regex.If no unexpected traces are shown, select a blocking mode and proceed to save the rule. The response is automatically pushed to tracers. Blocked traces appear in the Trace Explorer.
Multiple blocking actions are available. Depending on the sophistication of the attackers, you might want a more stealthy answer so that they don’t immediately realize they were blocked.
When you have disrupted the attacker as a preliminary response, you can identify the following:
The first step is to isolate the attacker activity from the overall traffic of the application.
While isolating attacker activity, ensure that your current filters are exhaustive through two tests:
Next, start by isolating the attack’s activity.
Extract the list of targeted users by going to Signals.
From this list of users, you can craft a Traces query to review all the activity from targeted users. Follow this template:
@appsec.security_activity:business_logic.users.login.* @appsec.events_data.usr.login:(<users>)
Successful logins should be considered suspicious.
This signal flagged a lot of activity coming from a few IPs and is closely related to its distributed variant. You might need to use the distributed credential stuffing method.
Start by extracting a list of suspicious IPs from the signal side panel
From the list of IPs, you can craft a Traces query to review all the activity from suspected IPs. Follow this template:
@appsec.security_activity:business_logic.users.login.* @http.client_ip:(<IPs>)
Successful logins should be considered suspicious.
This signal flagged a large increase in login failures in one service. If the attack is large enough, this signal might also trigger either the Bruteforce or Credential Stuffing signals. The signal is also able to detect diffuse attacks more comprehensively.
In the diffuse attacks case, attacker attributes are available in the signal.
This gets you to the trace explorer with filters set to the flagged attributes. You can start the investigation with the current query, but you should expand it to also match login successes on top of the failures. You can do that by replacing @appsec.security_activity:business_logic.users.login.failure
with @appsec.security_activity:business_logic.users.login.*
. Review the exhaustiveness and accuracy of the filter using the technique described above.
In the case those attributes are inaccurate or incomplete, you may try to identify further traits to isolate the attacker activity. The most useful traits are:
@http.user_agent
@http.client_ip_details.as.domain
@threat_intel.results.category
@http.url
@appsec.fingerprint.*
You may use Top List or Timeseries to identify the traits whose distribution most closely matches the attack.
You may need multiple sets of filters, each possibly including multiple traits. Behind the scenes, the attacker may be using multiple randomized templates. This work identifies the constants in those templates.
Reviewing login successes and failures helps to identify the following:
When attacker activity is isolated, review login successes and consider the following questions:
For the login failures, consider the following questions:
As your investigation progresses, you can go back and forth between this step and the next as you’re ready to enforce a response based on your findings.
Datadog’s investigation capabilities are enriched by data from its backend, which isn’t available to the library running the response. Because of that, not all fields are compatible with enforcing a response.
Motivated attackers try to circumvent your response as soon as they become aware of it. In anticipation of this approach, do the following:
You can either use Datadog’s built-in blocking capabilities to deny any request that matches some criteria, or export the data automatically to one of your systems to perform a response (credentials reset, mimic login failures upon blocking, etc.).
Users that are part of the traffic blocked by Datadog see a You’re blocked page, or receive a custom status code, such as a redirection. Blocking can be applied through two mechanisms, each with different performance characteristics: the Denylist and custom WAF rules.
The Denylist is an efficient way to block a large number of entries, but is limited to IPs and users. If your investigation uncovered a small set of IPs responsible for the attack (<1000
), blocking these IPs is the best course of action.
The Denylist can be managed and automated using the Datadog platform by clicking Automate Attacker Blocking in the signal.
Use the Automate Attacker Blocking or Block All Attacking IPs signal options to block all attacking IPs for a few hours, a week, or permanently. Similarly, you can block compromised users. As a reminder, Datadog doesn’t recommend blocking IPs permanently due to risks of blocking legitimate traffic after IPs get recycled into public pools.
The blocking can be rescinded or extended from the Denylist.
If the signal wasn’t accurate, you can extract the list or users or IPs and add it to the Denylist manually.
If the Denylist isn’t sufficient, you can create a WAF rule. A WAF rule evaluates slower than the Denylist, but it is more flexible. To create the rule, go to AAP > Protection > In-App WAF > Custom Rules.
To create a new rule, do the following:
The response is pushed to tracers automatically and blocked traces appear in the Traces explorer.
Multiple blocking actions are available. Depending on the sophistication of the attackers, you might want a stealthier response so that attackers don’t immediately realize they were blocked.
For more information, see In-App WAF Rules.
After the attacker introduces the response, they might suspend or adapt their attack. Keep monitoring the rate of login attempts after introducing the response, especially failures. Attacks might drop off only to resume after a few minutes, hours, or days.
If a large-scale attack resumes, the Distributed Credential Stuffing signal should re-execute. In this case, review the following considerations:
After a few days with no significant attacker activity, you might consider the attack over and move to a cleanup phase.
The goals of the cleanup phase are the following:
User blocking should be based on the timer you set when you selected Block All Attacking IPs in the signal. This user blocking configuration doesn’t require any further action.
If you configured permanent blocking, unblock users and IPs from the Denylist by doing the following:
To disable or delete In-App WAF rule(s), go to the custom In-App WAF rules page and disable the rules by clicking on Monitoring or Blocking, and selecting Disable Rule.
If the rule is no longer relevant, you can delete it by clicking more options (…) and selecting Delete.
To validate that no legitimate traffic is blocked, the volume of traffic should match that of the attack closely, with virtually no blocked traces outside the main waves.
To validate that no legitimate traffic is blocked, do the following:
@appsec.blocked:true
.Large ATO campaigns are rarely an isolated occurrence. You might want to leverage the time between attacks to harden your services and establish configurations you can leverage during subsequent attacks.
Here are some common hardening examples:
Attackers acquire lists of compromised accounts in bulk. By identifying the source of their database, you can proactively identify users at risk.
To identify the source of their database, export users impacted by the attack using one of these options:
@appsec.events_data.usr.login
. Set the limit to 10000 and use smaller time ranges to avoid the backend cap.When you have a list, review it for common attributes:
When the source of the database is identified, proactively force a password reset of those customers or flag them as higher risk. This increases confidence that future suspicious logins were indeed compromised.
Leveraging the signature from the attacker, expand filters to look at what non-login activity they performed.
This filter can be less accurate. For example, a filter that matches the signature of a mobile application with legitimate traffic but that was cloned by the attacker for their attack. The filter might show research done by the attacker ahead of time, and share hints on what the attacker may be looking to do next.
You can also pivot on the infrastructure used by the attacker. Did those malicious IPs do anything but logins? Are they accessing other sensitive APIs?
Account theft is a common threat but also much more complex than traditional injection exploits. Catching them requires tight integration with your systems and involves enough uncertainty that automated responses aren’t possible for the most advanced attacks.
In this guide, you did the following:
This is general guidance. Depending on your applications and environments, there might be a need for additional response strategies.