이 페이지는 아직 한국어로 제공되지 않으며 번역 작업 중입니다. 번역에 관한 질문이나 의견이 있으시면 언제든지 저희에게 연락해 주십시오.

Overview

Sensitive Data Scanner in the Cloud scans telemetry data, such as your application logs, APM events, RUM events, and events from Event Management. The data that can be scanned and redacted are:

  • Logs: All structured and unstructured log content, including log message and attribute values
  • APM: Span attribute values only
  • RUM: Event attribute values only
  • Events: Event attribute values only

You submit logs and events to the Datadog backend, so the data leaves your environment before it gets redacted. The logs and events are scanned and redacted in the Datadog backend during processing, so sensitive data is redacted before events are indexed and shown in the Datadog UI.

If you don’t want data to leave your environment before it gets redacted, use Observability Pipelines and the Sensitive Data Scanner processor to scan and redact sensitive data. See Set Up Pipelines for information on how to set up a pipeline and its components.

To use Sensitive Data Scanner in the Cloud, set up a scanning group to define what data to scan and then add scanning rules to determine what sensitive information to match within the data.

This document goes through the following:

Setup

Permissions

By default, users with the Datadog Admin role have access to view and set up scanning rules. To allow other users access, grant the data_scanner_read or data_scanner_write permissions under Compliance to a custom role. See Access Control for details on how to set up roles and permissions.

The compliance permissions sections showing data scanner read and writer permissions

Add a scanning group

A scanning group determines what data to scan. It consists of a query filter and a set of toggles to enable scanning for logs, APM, RUM, and events. See the Log Search Syntax documentation to learn more about query filters.

For Terraform, see the Datadog Sensitive Data Scanner group resource.

To set up a scanning group, perform the following steps:

  1. Navigate to the Sensitive Data Scanner settings page.
  2. Click Add scanning group. Alternatively, click the Add dropdown menu on the top right corner of the page and select Add Scanning Group.
  3. Enter a query filter for the data you want to scan. At the top, click APM Spans to preview the filtered spans. Click Logs to see the filtered logs.
  4. Enter a name and description for the group.
  5. Click the toggle buttons to enable Sensitive Data Scanner for the products you want (for example, logs, APM spans, RUM events, and Datadog events).
  6. Click Create.

By default, a newly-created scanning group is disabled. To enable a scanning group, click the corresponding toggle on the right side.

Add scanning rules

A scanning rule determines what sensitive information to match within the data defined by a scanning group. You can add predefined scanning rules from Datadog’s Scanning Rule Library or create your own rules using regex patterns. The data is scanned at ingestion time during processing. For logs, this means the scan is done before indexing and other routing decisions.

For Terraform, see the Datadog Sensitive Data Scanner rule resource.

To add scanning rules, perform the following steps:

  1. Navigate to the Sensitive Data Scanner settings page.
  2. Click the scanning group where you want to add the scanning rules.
  3. Click Add Scanning Rule. Alternatively, click the Add dropdown menu on the top right corner of the page and select Add Scanning Rule.
  4. Select whether you want to add a library rule or create a custom scanning rule.

The Scanning Rule Library contains predefined rules for detecting common patterns such as email addresses, credit card numbers, API keys, authorization tokens, and more.

  1. Select a scanning group if you did not create this rule within a scanning group.
  2. In the Add library rules to the scanning group section, select the library rules you want to use.
  3. 규칙 적용 대상 및 작업 정의 섹션에서 전체 이벤트 또는 특정 속성을 스캔할지 선택합니다.
    • 전체 이벤트를 스캔하는 경우 특정 속성을 스캔 대상에서 선택적으로 제외할 수 있습니다.
    • 특정 속성을 스캔하는 경우 스캔할 속성을 지정합니다.
  4. 일치하는 정보에 대한 작업 정의에서 일치하는 정보에 대해 수행하려는 작업을 선택합니다. 참고: 삭제, 부분 삭제, 해싱은 모두 되돌릴 수 없는 작업입니다.
    • 수정: 일치하는 모든 값을 대체 텍스트 필드에 지정한 텍스트로 바꿉니다.
    • 부분적으로 수정: 일치하는 모든 데이터의 지정된 부분을 대체합니다. 삭제** 섹션에서 삭제할 문자 수와 일치하는 데이터의 어느 부분을 삭제할지 지정합니다.
    • 해시: 일치하는 모든 데이터를 고유 식별자로 바꿉니다. 일치하는 UTF-8 바이트는 FarmHash의 64비트 지문으로 해시됩니다.
  5. 선택적으로 값이 지정된 정규식 패턴과 일치하는 이벤트와 연결하려는 태그 을 추가합니다. Datadog에서는 sensitive_datasensitive_data_category 태그를 사용할 것을 권장합니다. 이러한 태그는 검색, 대시보드 및 모니터에서 사용할 수 있습니다. 민감한 데이터가 있는 로그 에 대한 액세스 제어](#control-access-to-로그-with-sensitive-data)를 사용하여 민감한 정보가 포함된 로그에 액세스할 수 있는 사용자를 결정하는 방법에 대한 자세한 내용은 태그를 참조하세요.
  6. 우선순위 수준 설정의 경우 비즈니스 요구 사항에 따라 규칙의 우선순위 수준을 선택합니다.
  7. 검색 규칙 이름 및 설명 섹션에 규칙의 이름을 입력합니다. 원하는 경우 설명을 추가합니다.
  8. Click Add Rules.

Add additional keywords

After adding OOTB scanning rules, you can edit each rule separately and add additional keywords to the keyword dictionary.

  1. Navigate to the Sensitive Data Scanner settings page.
  2. Click the scanning group with the rule you want to edit.
  3. Hover over the rule, and then click the pencil icon.
  4. The recommend keywords are used by default. To add additional keywords, toggle Use recommended keywords, then add your keywords to the list. You can also require that these keywords be within a specified number of characters of a match. By default, keywords must be within 30 characters before a matched value.
  5. Click Update.

You can create custom scanning rules using regex patterns to scan for sensitive data.

  1. Select a scanning group if you did not create this rule within a scanning group.
  2. In the Define match conditions section, specify the regex pattern to use for matching against events in the Define the regex field. Enter sample data in the Add sample data field to verify that your regex pattern is valid.
    Sensitive Data Scanner supports Perl Compatible Regular Expressions (PCRE), but the following patterns are not supported:
    • Backreferences and capturing sub-expressions (lookarounds)
    • Arbitrary zero-width assertions
    • Subroutine references and recursive patterns
    • Conditional patterns
    • Backtracking control verbs
    • The \C “single-byte” directive (which breaks UTF-8 sequences)
    • The \R newline match
    • The \K start of match reset directive
    • Callouts and embedded code
    • Atomic grouping and possessive quantifiers
  3. For Create keyword dictionary, add keywords to refine detection accuracy when matching regex conditions. For example, if you are scanning for a sixteen-digit Visa credit card number, you can add keywords like visa, credit, and card. You can also require that these keywords be within a specified number of characters of a match. By default, keywords must be within 30 characters before a matched value.
  4. 규칙 적용 대상 및 작업 정의 섹션에서 전체 이벤트 또는 특정 속성을 스캔할지 선택합니다.
    • 전체 이벤트를 스캔하는 경우 특정 속성을 스캔 대상에서 선택적으로 제외할 수 있습니다.
    • 특정 속성을 스캔하는 경우 스캔할 속성을 지정합니다.
  5. 일치하는 정보에 대한 작업 정의에서 일치하는 정보에 대해 수행하려는 작업을 선택합니다. 참고: 삭제, 부분 삭제, 해싱은 모두 되돌릴 수 없는 작업입니다.
    • 수정: 일치하는 모든 값을 대체 텍스트 필드에 지정한 텍스트로 바꿉니다.
    • 부분적으로 수정: 일치하는 모든 데이터의 지정된 부분을 대체합니다. 삭제** 섹션에서 삭제할 문자 수와 일치하는 데이터의 어느 부분을 삭제할지 지정합니다.
    • 해시: 일치하는 모든 데이터를 고유 식별자로 바꿉니다. 일치하는 UTF-8 바이트는 FarmHash의 64비트 지문으로 해시됩니다.
  6. 선택적으로 값이 지정된 정규식 패턴과 일치하는 이벤트와 연결하려는 태그 을 추가합니다. Datadog에서는 sensitive_datasensitive_data_category 태그를 사용할 것을 권장합니다. 이러한 태그는 검색, 대시보드 및 모니터에서 사용할 수 있습니다. 민감한 데이터가 있는 로그 에 대한 액세스 제어](#control-access-to-로그-with-sensitive-data)를 사용하여 민감한 정보가 포함된 로그에 액세스할 수 있는 사용자를 결정하는 방법에 대한 자세한 내용은 태그를 참조하세요.
  7. 우선순위 수준 설정의 경우 비즈니스 요구 사항에 따라 규칙의 우선순위 수준을 선택합니다.
  8. 검색 규칙 이름 및 설명 섹션에 규칙의 이름을 입력합니다. 원하는 경우 설명을 추가합니다.
  9. Click Add Rule.

Notes:

  • Any rules that you add or update affect only data coming into Datadog after the rule was defined.
  • Sensitive Data Scanner does not affect any rules you define on the Datadog Agent directly.
  • After rules are added, ensure that the toggles for your scanning groups are enabled to begin scanning.

See Investigate Sensitive Data Issues for details on how to use the Summary page to triage your sensitive data issues.

Excluded namespaces

There are reserved keywords that the Datadog platform requires for functionality. If any of these words are in a log that is being scanned, the 30 characters after the matched word are ignored and not redacted. For example, what comes after the word date in a log is usually the event timestamp. If the timestamp is accidentally redacted, that would result in issues with processing the log and being able to query it later. Therefore, the behavior for excluded namespaces is to prevent unintentionally redacting important information for product functionality.

The excluded namespaces are:

  • host
  • hostname
  • syslog.hostname
  • service
  • status
  • env
  • dd.trace_id
  • trace_id
  • trace id
  • dd.span_id
  • span_id
  • span id
  • @timestamp
  • timestamp
  • _timestamp
  • Timestamp
  • date
  • published_date
  • syslog.timestamp
  • error.fingerprint
  • x-datadog-parent-id

Edit scanning rules

  1. Navigate to the Sensitive Data Scanner settings page.
  2. Hover over the scanning rule you want to edit and click the Edit (pencil) icon. The Define match conditions section shows either the regular expression you wrote for your custom rule or an explanation of the library scanning rule you chose along with examples of matched sensitive information.
  3. To make sure that a rule matches your data, you can provide a sample in the Add sample data section. If the rule finds matches in the sample data, a green Match label appears next to the input field.
  4. Under Create keyword dictionary, you can add keywords to refine detection accuracy. For example, if you are scanning for a sixteen-digit Visa credit card number, you can add keywords like visa, credit, and card.
  5. Choose the number of characters before a match that the keyword must appear in. By default, keywords must be within 30 characters before a match.
  6. Optionally, under Define rule target and action, edit the tags that you want to associate with events where the values match the rule. Datadog recommends using sensitive_data and sensitive_data_category tags, which can be used in searches, dashboards, and monitors. See Control access to logs with sensitive data for information on how to use tags to determine who can access logs that contain sensitive data.
  7. For Set priority level, choose a value based on your business needs.
  8. Click Update.

Control access to logs with sensitive data

To control who can access logs containing sensitive data, use tags added by the Sensitive Data Scanner to build queries with role-based access control (RBAC). You can restrict access to specific individuals or teams until the data ages out after the retention period. See How to Set Up RBAC for Logs for more information.

Redact sensitive data in tags

To redact sensitive data contained in tags, you must remap the tag to an attribute and then redact the attribute. Uncheck Preserve source attribute in the remapper processor so that the tag is not preserved during the remapping.

To remap the tag to an attribute:

  1. Navigate to your log pipeline.
  2. Click Add Processor.
  3. Select Remapper in the processor type dropdown menu.
  4. Name the processor.
  5. Select Tag key(s).
  6. Enter the tag key.
  7. Enter a name for the attribute the tag key is remapped to.
  8. Disable Preserve source attribute.
  9. Click Create.

To redact the attribute:

  1. Navigate to your scanning group.
  2. Click Add Scanning Rule.
  3. Check the library rules you want to use.
  4. Select Specific Attributes for Scan entire event or portion of it.
  5. Enter the name of the attribute you created earlier to specify that you want it scanned.
  6. Select the action you want when there’s a match.
  7. Optionally, add tags.
  8. Click Add Rules.

Disable Sensitive Data Scanner

To turn off Sensitive Data Scanner entirely, set the toggle to off for each Scanning Group so that they are disabled.

Further reading