HTTP Client용 민감한 데이터 삭제

This product is not supported for your selected Datadog site. ().

개요

신용 카드 번호, 은행 라우팅 번호, API 키와 같은 민감한 데이터는 종종 로그에 의도치 않게 노출되어 조직을 재무 및 프라이버시 위험에 노출시킬 수 있습니다.

Observability Pipelines로 다른 대상 및 인프라스트럭처 외부로 로그를 라우팅하기 전에 민감한 정보를 식별하고 태그를 설정하거나 옵션으로 삭제하거나 해싱할 수 있습니다. 기본 제공 스캔 규칙으로 이메일 주소, 신용카드 번호, API 키, 인증 토큰 등과 같은 일반 패턴을 감지할 수 있습니다. 또는 정규식 패턴으로 커스텀 스캔 규칙을 생성하고 매칭을 통해 민감한 정보를 식별할 수 있습니다.

The log sources, processors, and destinations available for this use case

본 문서에서는 다음 단계를 안내합니다.

  1. Observability Pipelines를 설정하는 데 필요한 사전 필수 조건
  2. Observability Pipelines 설정하기

사전 필수 조건

To use Observability Pipelines’ HTTP/S Client source, you need the following information available:

  1. The full path of the HTTP Server endpoint that the Observability Pipelines Worker collects log events from. For example, https://127.0.0.8/logs.
  2. The HTTP authentication token or password.

The HTTP/S Client source pulls data from your upstream HTTP server. Your HTTP server must support GET requests for the HTTP Client endpoint URL that you set as an environment variable when you install the Worker.

Observability Pipelines 설정

  1. Observability Pipelines로 이동합니다.
  2. Sensitive Data Redactions 템플릿을 선택하여 새 파이프라인을 생성합니다.
  3. 소스로 HTTP Client를 선택합니다.

소스 설정하기

To configure your HTTP/S Client source:

  1. Select your authorization strategy.
  2. Select the decoder you want to use on the HTTP messages. Logs pulled from the HTTP source must be in this format.
  3. Optionally, toggle the switch to enable TLS. If you enable TLS, the following certificate and key files are required.
    Note: All file paths are made relative to the configuration data directory, which is /var/lib/observability-pipelines-worker/config/ by default. See Advanced Configurations for more information. The file must be owned by the observability-pipelines-worker group and observability-pipelines-worker user, or at least readable by the group or user.
    • Server Certificate Path: The path to the certificate file that has been signed by your Certificate Authority (CA) Root File in DER or PEM (X.509) format.
    • CA Certificate Path: The path to the certificate file that is your Certificate Authority (CA) Root File in DER or PEM (X.509) format.
    • Private Key Path: The path to the .key private key file that belongs to your Server Certificate Path in DER or PEM (PKCS#8) format.
  4. Enter the interval between scrapes.
    • Your HTTP Server must be able to handle GET requests at this interval.
    • Since requests run concurrently, if a scrape takes longer than the interval given, a new scrape is started, which can consume extra resources. Set the timeout to a value lower than the scrape interval to prevent this from happening.
  5. Enter the timeout for each scrape request.

대상 설정하기

선택한 로그 대상에 따라 다음 정보를 입력합니다.

Datadog 목적지에 대한 설정 단계가 없습니다.

  • Splunk HEC 주소:
    • 관측 가능성 파이프라인 Worker가 로그를 수신하기 위해 수신 대기하는 바인딩 주소는 원래 Splunk 인덱서용입니다. 예: 0.0.0.0:8088 참고: /services/collector/event는 엔드포인트에 자동으로 추가됩니다.
    • 환경 변수 DD_OP_SOURCE_SPLUNK_HEC_ADDRESS에 저장됩니다.

다음 필드는 선택 항목입니다.

  1. 인코딩 드롭다운 메뉴에서 파이프라인의 출력을 JSON, Logfmt, 또는 Raw 텍스트로 인코딩할지 선택합니다. 디코딩을 선택하지 않으면 디코딩은 기본적으로 JSON으로 설정됩니다.
  2. 소스 이름을 입력하여 Sumo Logic 컬렉터(Collector) 소스에 대해 구성된 기본값 name 을 재정의합니다.
  3. 호스트 이름을 입력하면 기본값인 host 값을 재정의하여 Sumo Logic 컬렉터(Collector) 소스에 대한 기본값을 재정의할 수 있습니다.
  4. 카테고리 이름을 입력하여 Sumo Logic 컬렉터(Collector) 소스에 대해 구성된 기본값 category을 재정의합니다.
  5. 커스텀 헤더 필드와 값을 추가하려면 헤더 추가를 클릭합니다.
The rsyslog and syslog-ng destinations support the RFC5424 format.

The rsyslog and syslog-ng destinations match these log fields to the following Syslog fields:

Log EventSYSLOG FIELDDefault
log[“message”]MESSAGENIL
log[“procid”]PROCIDThe running Worker’s process ID.
log[“appname”]APP-NAMEobservability_pipelines
log[“facility”]FACILITY8 (log_user)
log[“msgid”]MSGIDNIL
log[“severity”]SEVERITYinfo
log[“host”]HOSTNAMENIL
log[“timestamp”]TIMESTAMPCurrent UTC time.

The following destination settings are optional:

  1. Toggle the switch to enable TLS. If you enable TLS, the following certificate and key files are required:
    • Server Certificate Path: The path to the certificate file that has been signed by your Certificate Authority (CA) Root File in DER or PEM (X.509).
    • CA Certificate Path: The path to the certificate file that is your Certificate Authority (CA) Root File in DER or PEM (X.509).
    • Private Key Path: The path to the .key private key file that belongs to your Server Certificate Path in DER or PEM (PKCS#8) format.
  2. Enter the number of seconds to wait before sending TCP keepalive probes on an idle connection.
  3. Optionally, toggle the switch to enable Buffering Options.
    Note: Buffering options is in Preview. Contact your account manager to request access.
    • If left disabled, the maximum size for buffering is 500 events.
    • If enabled:
      1. Select the buffer type you want to set (Memory or Disk).
      2. Enter the buffer size and select the unit.

To set up the Worker’s Google Chronicle destination:

  1. Enter the customer ID for your Google Chronicle instance.
  2. If you have a credentials JSON file, enter the path to your credentials JSON file. The credentials file must be placed under DD_OP_DATA_DIR/config. Alternatively, you can use the GOOGLE_APPLICATION_CREDENTIALS environment variable to provide the credential path.
  3. Select JSON or Raw encoding in the dropdown menu.
  4. Enter the log type. See template syntax if you want to route logs to different log types based on specific fields in your logs.
  5. Optionally, toggle the switch to enable Buffering Options.
    Note: Buffering options is in Preview. Contact your account manager to request access.
    • If left disabled, the maximum size for buffering is 500 events.
    • If enabled:
      1. Select the buffer type you want to set (Memory or Disk).
      2. Enter the buffer size and select the unit.

Note: Logs sent to the Google Chronicle destination must have ingestion labels. For example, if the logs are from a A10 load balancer, it must have the ingestion label A10_LOAD_BALANCER. See Google Cloud’s Support log types with a default parser for a list of available log types and their respective ingestion labels.

The following fields are optional:

  1. Enter the name for the Elasticsearch index. See template syntax if you want to route logs to different indexes based on specific fields in your logs.
  2. Enter the Elasticsearch version.
  3. Optionally, toggle the switch to enable Buffering Options.
    Note: Buffering options is in Preview. Contact your account manager to request access.
    • If left disabled, the maximum size for buffering is 500 events.
    • If enabled:
      1. Select the buffer type you want to set (Memory or Disk).
      2. Enter the buffer size and select the unit.
  1. Optionally, enter the name of the OpenSearch index. See template syntax if you want to route logs to different indexes based on specific fields in your logs.
  2. Optionally, toggle the switch to enable Buffering Options.
    Note: Buffering options is in Preview. Contact your account manager to request access.
    • If left disabled, the maximum size for buffering is 500 events.
    • If enabled:
      1. Select the buffer type you want to set (Memory or Disk).
      2. Enter the buffer size and select the unit.
  1. Optionally, enter the name of the Amazon OpenSearch index. See template syntax if you want to route logs to different indexes based on specific fields in your logs.
  2. Select an authentication strategy, Basic or AWS. For AWS, enter the AWS region.
  3. Optionally, toggle the switch to enable Buffering Options.
    Note: Buffering options is in Preview. Contact your account manager to request access.
    • If left disabled, the maximum size for buffering is 500 events.
    • If enabled:
      1. Select the buffer type you want to set (Memory or Disk).
      2. Enter the buffer size and select the unit.

프로세서 설정하기

There are pre-selected processors added to your processor group out of the box. You can add additional processors or delete any existing ones based on your processing needs.

Processor groups are executed from top to bottom. The order of the processors is important because logs are checked by each processor, but only logs that match the processor’s filters are processed. To modify the order of the processors, use the drag handle on the top left corner of the processor you want to move.

Filter query syntax

Each processor has a corresponding filter query in their fields. Processors only process logs that match their filter query. And for all processors except the filter processor, logs that do not match the query are sent to the next step of the pipeline. For the filter processor, logs that do not match the query are dropped.

For any attribute, tag, or key:value pair that is not a reserved attribute, your query must start with @. Conversely, to filter reserved attributes, you do not need to append @ in front of your filter query.

For example, to filter out and drop status:info logs, your filter can be set as NOT (status:info). To filter out and drop system-status:info, your filter must be set as NOT (@system-status:info).

Filter query examples:

  • NOT (status:debug): This filters for only logs that do not have the status DEBUG.
  • status:ok service:flask-web-app: This filters for all logs with the status OK from your flask-web-app service.
    • This query can also be written as: status:ok AND service:flask-web-app.
  • host:COMP-A9JNGYK OR host:COMP-J58KAS: This filter query only matches logs from the labeled hosts.
  • @user.status:inactive: This filters for logs with the status inactive nested under the user attribute.

Queries run in the Observability Pipelines Worker are case sensitive. Learn more about writing filter queries in Datadog’s Log Search Syntax.

프로세서 추가

사용하려는 프로세서에 대한 정보를 입력합니다. 프로세서를 추가하려면 Add 버튼을 클릭합니다. 프로세서를 삭제하려면 프로세서 오른쪽에 있는 케밥 메뉴 아이콘을 클릭하고 Delete를 선택하세요.

The log processors available

이 프로세서는 지정된 필터 쿼리와 일치하는 로그를 필터링하고 일치하지 않는 모든 로그를 제외합니다 이 프로세스에서 로그가 제외되면, 아래 나와 있는 프로세서가 모두 해당 로그를 수신하지 않습니다. 이 프로세서는 디버그 또는 경고 로그와 같은 불필요한 로그를 필터링할 수 있습니다.

필터 프로세서를 설정하려면 다음과 같이 하세요.

  • 필터 쿼리를 정의합니다. 필터를 지정한 쿼리은 필터와 일치하는 로그만 전달하고 다른 모든 로그는 제외합니다.

The remap processor can add, drop, or rename fields within your individual log data. Use this processor to enrich your logs with additional context, remove low-value fields to reduce volume, and standardize naming across important attributes. Select add field, drop field, or rename field in the dropdown menu to get started.

See the Remap Reserved Attributes guide on how to use the Edit Fields processor to remap attributes.

Add field

Use add field to append a new key-value field to your log.

To set up the add field processor:

  1. Define a filter query. Only logs that match the specified filter query are processed. All logs, regardless of whether they do or do not match the filter query, are sent to the next step in the pipeline.
  2. Enter the field and value you want to add. To specify a nested field for your key, use the path notation: <OUTER_FIELD>.<INNER_FIELD>. All values are stored as strings. Note: If the field you want to add already exists, the Worker throws an error and the existing field remains unchanged.
Drop field

Use drop field to drop a field from logging data that matches the filter you specify below. It can delete objects, so you can use the processor to drop nested keys.

To set up the drop field processor:

  1. Define a filter query. Only logs that match the specified filter query are processed. All logs, regardless of whether they do or do not match the filter query, are sent to the next step in the pipeline.
  2. Enter the key of the field you want to drop. To specify a nested field for your specified key, use the path notation: <OUTER_FIELD>.<INNER_FIELD>. Note: If your specified key does not exist, your log will be unimpacted.
Rename field

Use rename field to rename a field within your log.

To set up the rename field processor:

  1. Define a filter query. Only logs that match the specified filter query are processed. All logs, regardless of whether they do or do not match the filter query, are sent to the next step in the pipeline.
  2. Enter the name of the field you want to rename in the Source field. To specify a nested field for your key, use the path notation: <OUTER_FIELD>.<INNER_FIELD>. Once renamed, your original field is deleted unless you enable the Preserve source tag checkbox described below.
    Note: If the source key you specify doesn’t exist, a default null value is applied to your target.
  3. In the Target field, enter the name you want the source field to be renamed to. To specify a nested field for your specified key, use the path notation: <OUTER_FIELD>.<INNER_FIELD>.
    Note: If the target field you specify already exists, the Worker throws an error and does not overwrite the existing target field.
  4. Optionally, check the Preserve source tag box if you want to retain the original source field and duplicate the information from your source key to your specified target key. If this box is not checked, the source key is dropped after it is renamed.
Path notation example

For the following message structure:

{
    "outer_key": {
        "inner_key": "inner_value",
        "a": {
            "double_inner_key": "double_inner_value",
            "b": "b value"
        },
        "c": "c value"
    },
    "d": "d value"
}
  • Use outer_key.inner_key to refer to the key with the value inner_value.
  • Use outer_key.inner_key.double_inner_key to refer to the key with the value double_inner_value.

This processor samples your logging traffic for a representative subset at the rate that you define, dropping the remaining logs. As an example, you can use this processor to sample 20% of logs from a noisy non-critical service.

The sampling only applies to logs that match your filter query and does not impact other logs. If a log is dropped at this processor, none of the processors below receives that log.

To set up the sample processor:

  1. Define a filter query. Only logs that match the specified filter query are sampled at the specified retention rate below. The sampled logs and the logs that do not match the filter query are sent to the next step in the pipeline.
  2. Enter your desired sampling rate in the Retain field. For example, entering 2 means 2% of logs are retained out of all the logs that match the filter query.
  3. Optionally, enter a Group By field to create separate sampling groups for each unique value for that field. For example, status:error and status:info are two unique field values. Each bucket of events with the same field is sampled independently. Click Add Field if you want to add more fields to partition by. See the group-by example.
Group-by example

If you have the following setup for the sample processor:

  • Filter query: env:staging
  • Retain: 40% of matching logs
  • Group by: status and host
The sample processor with example values

Then, 40% of logs for each unique combination of status and service from env:staging is retained. For example:

  • 40% of logs with status:info and service:networks are retained.
  • 40% of logs with status:info and service:core-web are retained.
  • 40% of logs with status:error and service:networks are retained.
  • 40% of logs with status:error and service:core-web are retained.

This processor parses logs using the grok parsing rules that are available for a set of sources. The rules are automatically applied to logs based on the log source. Therefore, logs must have a source field with the source name. If this field is not added when the log is sent to the Observability Pipelines Worker, you can use the Add field processor to add it.

If the source field of a log matches one of the grok parsing rule sets, the log’s message field is checked against those rules. If a rule matches, the resulting parsed data is added in the message field as a JSON object, overwriting the original message.

If there isn’t a source field on the log, or no rule matches the log message, then no changes are made to the log and it is sent to the next step in the pipeline.

Datadog’s Grok patterns differ from the standard Grok pattern, where Datadog’s Grok implementation provides:

  • Matchers that include options for how you define parsing rules
  • Filters for post-processing of extracted data
  • A set of built-in patterns tailored to common log formats

See Parsing for more information on Datadog’s Grok patterns.

To set up the grok parser, define a filter query. Only logs that match the specified filter query are processed. All logs, regardless of whether they match the filter query, are sent to the next step in the pipeline.

To test log samples for out-of-the-box rules:

  1. Click the Preview Library Rules button.
  2. Search or select a source in the dropdown menu.
  3. Enter a log sample to test the parsing rules for that source.

To add a custom parsing rule:

  1. Click Add Custom Rule.
  2. If you want to clone a library rule, select Clone library rule and then the library source from the dropdown menu.
  3. If you want to create a custom rule, select Custom and then enter the source. The parsing rules are applied to logs with that source.
  4. Enter log samples to test the parsing rules.
  5. Enter the rules for parsing the logs. See Parsing for more information on writing parsing rules with Datadog Grok patterns.
    Note: The url, useragent, and csv filters are not available.
  6. Click Advanced Settings if you want to add helper rules. See Using helper rules to factorize multiple parsing rules for more information.
  7. Click Add Rule.

The quota processor measures the logging traffic for logs that match the filter you specify. When the configured daily quota is met inside the 24-hour rolling window, the processor can either keep or drop additional logs, or send them to a storage bucket. For example, you can configure this processor to drop new logs or trigger an alert without dropping logs after the processor has received 10 million events from a certain service in the last 24 hours.

You can also use field-based partitioning, such as service, env, status. Each unique fields uses a separate quota bucket with its own daily quota limit. See Partition example for more information.

Notes:

  • Each pipeline can have up to 1000 buckets. If you need to increase the bucket limit, contact your account manager.
  • The pipeline uses the name of the quota to identify the quota across multiple Remote Configuration deployments of the Worker.

To set up the quota processor:

  1. Enter a name for the quota processor.
  2. Define a filter query. Only logs that match the specified filter query are counted towards the daily limit.
    • Logs that match the quota filter and are within the daily quota are sent to the next step in the pipeline.
    • Logs that do not match the quota filter are sent to the next step of the pipeline.
  3. In the Unit for quota dropdown menu, select if you want to measure the quota by the number of Events or by the Volume in bytes.
  4. Set the daily quota limit and select the unit of magnitude for your desired quota.
  5. Optional, Click Add Field if you want to set a quota on a specific service or region field.
    a. Enter the field name you want to partition by. See the Partition example for more information.
    i. Select the Ignore when missing if you want the quota applied only to events that match the partition. See the Ignore when missing example for more information.
    ii. Optional: Click Overrides if you want to set different quotas for the partitioned field.
    - Click Download as CSV for an example of how to structure the CSV.
    - Drag and drop your overrides CSV to upload it. You can also click Browse to select the file to upload it. See the Overrides example for more information.
    b. Click Add Field if you want to add another partition.
  6. In the When quota is met dropdown menu, select if you want to drop events, keep events, or send events to overflow destination, when the quota has been met.
    1. If you select send events to overflow destination, an overflow destination is added with the following cloud storage options: Amazon S3, Azure Blob, and Google Cloud.
    2. Select the cloud storage you want to send overflow logs to. See the setup instructions for your cloud storage: Amazon S3, Azure Blob Storage, or Google Cloud Storage.

Examples

Partition example

Use Partition by if you want to set a quota on a specific service or region. For example, if you want to set a quota for 10 events per day and group the events by the service field, enter service into the Partition by field.

Example for the “ignore when missing” option

Select Ignore when missing if you want the quota applied only to events that match the partition. For example, if the Worker receives the following set of events:

{"service":"a", "source":"foo", "message": "..."}
{"service":"b", "source":"bar", "message": "..."}
{"service":"b", "message": "..."}
{"source":"redis", "message": "..."}
{"message": "..."}

And the Ignore when missing is selected, then the Worker:

  • creates a set for logs with service:a and source:foo
  • creates a set for logs with service:b and source:bar
  • ignores the last three events

The quota is applied to the two sets of logs and not to the last three events.

If the Ignore when missing is not selected, the quota is applied to all five events.

Overrides example

If you are partitioning by service and have two services: a and b, you can use overrides to apply different quotas for them. For example, if you want service:a to have a quota limit of 5,000 bytes and service:b to have a limit of 50 events, the override rules look like this:

ServiceTypeLimit
aBytes5,000
bEvents50

The reduce processor groups multiple log events into a single log, based on the fields specified and the merge strategies selected. Logs are grouped at 10-second intervals. After the interval has elapsed for the group, the reduced log for that group is sent to the next step in the pipeline.

To set up the reduce processor:

  1. Define a filter query. Only logs that match the specified filter query are processed. Reduced logs and logs that do not match the filter query are sent to the next step in the pipeline.
  2. In the Group By section, enter the field you want to group the logs by.
  3. Click Add Group by Field to add additional fields.
  4. In the Merge Strategy section:
    • In On Field, enter the name of the field you want to merge the logs on.
    • Select the merge strategy in the Apply dropdown menu. This is the strategy used to combine events. See the following Merge strategies section for descriptions of the available strategies.
    • Click Add Merge Strategy to add additional strategies.
Merge strategies

These are the available merge strategies for combining log events.

NameDescription
ArrayAppends each value to an array.
ConcatConcatenates each string value, delimited with a space.
Concat newlineConcatenates each string value, delimited with a newline.
Concat rawConcatenates each string value, without a delimiter.
DiscardDiscards all values except the first value that was received.
Flat uniqueCreates a flattened array of all unique values that were received.
Longest arrayKeeps the longest array that was received.
MaxKeeps the maximum numeric value that was received.
MinKeeps the minimum numeric value that was received.
RetainDiscards all values except the last value that was received. Works as a way to coalesce by not retaining `null`.
Shortest arrayKeeps the shortest array that was received.
SumSums all numeric values that were received.

The deduplicate processor removes copies of data to reduce volume and noise. It caches 5,000 messages at a time and compares your incoming logs traffic against the cached messages. For example, this processor can be used to keep only unique warning logs in the case where multiple identical warning logs are sent in succession.

To set up the deduplicate processor:

  1. Define a filter query. Only logs that match the specified filter query are processed. Deduped logs and logs that do not match the filter query are sent to the next step in the pipeline.
  2. In the Type of deduplication dropdown menu, select whether you want to Match on or Ignore the fields specified below.
    • If Match is selected, then after a log passes through, future logs that have the same values for all of the fields you specify below are removed.
    • If Ignore is selected, then after a log passes through, future logs that have the same values for all of their fields, except the ones you specify below, are removed.
  3. Enter the fields you want to match on, or ignore. At least one field is required, and you can specify a maximum of three fields.
    • Use the path notation <OUTER_FIELD>.<INNER_FIELD> to match subfields. See the Path notation example below.
  4. Click Add field to add additional fields you want to filter on.
Path notation example

For the following message structure:

{
    "outer_key": {
        "inner_key": "inner_value",
        "a": {
            "double_inner_key": "double_inner_value",
            "b": "b value"
        },
        "c": "c value"
    },
    "d": "d value"
}
  • Use outer_key.inner_key to refer to the key with the value inner_value.
  • Use outer_key.inner_key.double_inner_key to refer to the key with the value double_inner_value.

민감 데이터 프로세서는 로그를 스캔해 PII, PCI, 커스텀 민감 데이터와 같은 민감 정보를 감지, 수정, 또는 해시합니다. 사전 정의된 규칙 라이브러리에서 고르거나 커스텀 Regex 규칙을 입력해 민감 데이터를 스캔할 수 있습니다.

민감 데이터 스캐너 프로세서를 설정하려면 다음을 따르세요.

  1. 필터 쿼리를 정의합니다. 특정 필터 쿼리와 일치하는 로그만 스캔 및 처리됩니다. 필터 쿼리와 일치하는지 여부와 관계 없이 모든 로그는 파이프라인 다음 단계로 전송됩니다.
  2. Add Scanning Rule을 클릭합니다.
  3. 스캐닝 규칙 이름을 지정합니다.
  4. Select scanning rule type 필드에서 라이브러리에서 규칙을 생성할지, 혹은 커스텀 규칙을 생성할지 선택합니다.
    • 라이브러리에서 규칙을 생성할 경우 사용하고 싶은 라이브러리 패턴을 선택합니다.
    • 커스텀 규칙을 생성할 경우 데이터에 대비해 확인할 regex 패턴을 입력합니다.
  5. Scan entire part of event 섹션의 드롭다운에서 Entire Event, *Specific Attributes, 또는 Exclude Attributes 중 스캔하고자 하는 방법을 선택합니다.
    • Specific Attributes를 선택하면 Add Field를 클릭하고 스캔하고자 하는 특정 속성을 입력합니다. 최대 3개 필드까지 추가할 수 있습니다. 경로 표기법(outer_key.inner_key)을 사용해 중첩된 키에 접근할 수 있습니다. 중첩된 데이터로 지정된 속성의 경우 모든 중첩된 데이터가 스캔됩니다.
    • Exclude Attributes를 선택하면 Add Field를 클릭하고 스캔에서 제외하고자 하는 특정 속성을 입력합니다. 최대 3개 필드까지 추가할 수 있습니다. 경로 표기법(outer_key.inner_key)을 사용해 중첩된 키에 접근할 수 있습니다. 중첩된 데이터로 지정된 속성의 경우, 모든 중첩된 데이터가 예외가 됩니다.
  6. Define action on match 섹션에서 일치하는 정보에 실행하려는 작업을 선택합니다. 삭제, 부분 삭제, 해싱은 모두 되돌릴 수 없는 작업입니다.
    • 정보를 수정하는 경우, 일치하는 데이터를 대체할 텍스트를 지정하세요.
    • 정보를 일부만 수정하는 경우, 수정하려는 문자 수를 지정하고 일치하는 데이터의 첫 부분 또는 마지막 부분을 수정할 것인지 지정합니다.
    • 참고: 해싱을 선택하는 경우, 일치하는 UTF-8 바이트는 FarmHash의 64비트 지문으로 해시됩니다.
  7. 또는 regex와 일치하는 모든 이벤트에 태그를 추가하여 이벤트를 필터, 분석, 알림 설정할 수 있습니다.

This processor adds a field with the name of the host that sent the log. For example, hostname: 613e197f3526. Note: If the hostname already exists, the Worker throws an error and does not overwrite the existing hostname.

To set up this processor:

  • Define a filter query. Only logs that match the specified filter query are processed. All logs, regardless of whether they do or do not match the filter query, are sent to the next step in the pipeline.

This processor parses the specified JSON field into objects. For example, if you have a message field that contains stringified JSON:

{
    "foo": "bar",
    "team": "my-team",
    "message": "{\"level\":\"info\",\"timestamp\":\"2024-01-15T10:30:00Z\",\"service\":\"user-service\",\"user_id\":\"12345\",\"action\":\"login\",\"success\":true,\"ip_address\":\"192.168.1.100\"}"
    "app_id":"streaming-services",
    "ddtags": [
    "kube_service:my-service",
    "k8_deployment :your-host"
    ]
}

Use the Parse JSON processor to parse the message field so the message field has all the attributes within a nested object.

The parse json processor with message as the field to parse on

This output contains the message field with the parsed JSON:

{
    "foo": "bar",
    "team": "my-team",
    "message": {
        "action": "login",
        "ip_address": "192.168.1.100",
        "level": "info",
        "service": "user-service",
        "success": true,
        "timestamp": "2024-01-15T10:30:00Z",
        "user_id": "12345"
    }
    "app_id":"streaming-services",
    "ddtags": [
    "kube_service:my-service",
    "k8_deployment :your-host"
    ]
}

To set up this processor:

  1. Define a filter query. Only logs that match the specified filter query are processed. All logs, regardless of whether they do or do not match the filter query, are sent to the next step in the pipeline.
  2. Enter the name of the field you want to parse JSON on.
    Note: The parsed JSON overwrites what was originally contained in the field.

Use this processor to enrich your logs with information from a reference table, which could be a local file or database.

To set up the enrichment table processor:

  1. Define a filter query. Only logs that match the specified filter query are processed. All logs, regardless of whether they do or do not match the filter query, are sent to the next step in the pipeline.
  2. Enter the source attribute of the log. The source attribute’s value is what you want to find in the reference table.
  3. Enter the target attribute. The target attribute’s value stores, as a JSON object, the information found in the reference table.
  4. Select the type of reference table you want to use, File or GeoIP.
    • For the File type:
      1. Enter the file path.
        Note: All file paths are made relative to the configuration data directory, which is /var/lib/observability-pipelines-worker/config/ by default. See Advanced Configurations for more information. The file must be owned by the observability-pipelines-worker group and observability-pipelines-worker user, or at least readable by the group or user.
      2. Enter the column name. The column name in the enrichment table is used for matching the source attribute value. See the Enrichment file example.
    • For the GeoIP type, enter the GeoIP path.
Enrichment file example

For this example, merchant_id is used as the source attribute and merchant_info as the target attribute.

This is the example reference table that the enrichment processor uses:

merch_idmerchant_namecitystate
803Andy’s OttomansBoiseIdaho
536Cindy’s CouchesBoulderColorado
235Debra’s BenchesLas VegasNevada

merch_id is set as the column name the processor uses to find the source attribute’s value. Note: The source attribute’s value does not have to match the column name.

If the enrichment processor receives a log with "merchant_id":"536":

  • The processor looks for the value 536 in the reference table’s merch_id column.
  • After it finds the value, it adds the entire row of information from the reference table to the merchant_info attribute as a JSON object:
merchant_info {
    "merchant_name":"Cindy's Couches",
    "city":"Boulder",
    "state":"Colorado"
}

Observability Pipelines Worker 설치하기

  1. Choose your installation platform 드롭다운 메뉴에서 플랫폼을 선택합니다.

  2. HTTP/S 엔드포인트 URL의 전체 경로를 입력합니다(예: https://127.0.0.8/logs). Observability Pipelines Worker는 이 엔드포인트에서 로그 이벤트를 수집합니다.

  3. 선택한 각 대상의 환경 변수를 입력합니다. 자세한 내용은 사전 필수 조건을 참조하세요.

    Datadog 로그 관리에 대해 설정할 환경 변수가 없습니다.

    Splunk HEC 토큰과 Splunk 인스턴스의 기본 URL을 입력합니다. 자세한 내용은 필수 구성 요소를 참조하세요.

    Worker는 HEC 토큰을 Splunk 수집 엔드포인트로 전달합니다. Observability Pipelines Worker가 로그를 처리한 후 로그를 지정된 Splunk 인스턴스 URL로 전송합니다.

    참고: Splunk HEC 대상은 대상을 설정하는지에 관계없이 모든 로그를 /services/collector/event 엔드포인트로 전달하여 출력을 JSON 또는 raw로 인코딩합니다.

    Sumo Logic HTTP 컬렉터(Collector) URL을 입력합니다. 자세한 내용은 전제 조건을 참조하세요.

    Enter the rsyslog or syslog-ng endpoint URL. For example, 127.0.0.1:9997. The Observability Pipelines Worker sends logs to this address and port.

    Enter the Google Chronicle endpoint URL. For example, https://chronicle.googleapis.com.

    1. Elasticsearch 인증 사용자 이름을 입력합니다.
    2. Elasticsearch 인증 비밀번호를 입력합니다.
    3. Elasticsearch 엔드포인트 URL을 입력합니다. 예: http://CLUSTER_ID.LOCAL_HOST_IP.ip.es.io:9200
    1. OpenSearch 인증 사용자 이름을 입력합니다.
    2. OpenSearch 인증 비밀번호를 입력합니다.
    3. OpenSearch 엔드포인트 URL을 입력합니다. 예: http://<hostname.IP>:9200
    1. Amazon OpenSearch 인증 사용자 이름을 입력합니다.
    2. Amazon OpenSearch 인증 비밀번호를 입력합니다.
    3. Amazon OpenSearch 엔드포인트 URL을 입력합니다. 예: http://<hostname.IP>:9200

  4. 환경에 맞는 지침에 따라 Worker를 설치합니다.

    1. API 키 선택을 클릭해 사용하고 싶은 Datadog API 키를 선택하세요.
    2. 안내된 명령을 UI에 실행해 Worker를 설치하세요. 이 명령을 사용하면 이전에 입력한 환경 변수가 자동으로 채워집니다.
      docker run -i -e DD_API_KEY=<DATADOG_API_KEY> \
          -e DD_OP_PIPELINE_ID=<PIPELINE_ID> \
          -e DD_SITE=<DATADOG_SITE> \
          -e <SOURCE_ENV_VARIABLE> \
          -e <DESTINATION_ENV_VARIABLE> \
          -p 8088:8088 \
          datadog/observability-pipelines-worker run
      
      참고: 기본적으로 docker run 명령은 작업자가 수신 중인 포트와 동일한 포트를 노출시킵니다. 작업자의 컨테이너 포트를 도커(Docker) 호스트의 다른 포트에 매핑하려면 명령에 -p | --publish 옵션을 사용하세요:
      -p 8282:8088 datadog/observability-pipelines-worker run
      
    3. 관측 가능성 파이프라인 설치 페이지로 돌아가서 배포를 클릭하세요.

    파이프라인 설정을 변경하려면 기존 파이프라인 업데이트를 참조하세요.

    1. Amazon EKS용 Helm 차트 값 파일을 다운로드하세요.
    2. API 키 선택을 클릭해 사용하고 싶은 Datadog API 키를 선택하세요.
    3. Helm에 Datadog 차트 리포지토리를 추가하세요.
      helm repo add datadog https://helm.datadoghq.com
      
      이미 Datadog 차트 리포지토리를 가지고 있는 경우에는 다음 명령을 실행해 리포지토리가 최신 버전인지 확인하세요.
      helm repo update
      
    4. 안내된 명령을 UI에 실행해 Worker를 설치하세요. 이 명령을 사용하면 이전에 입력한 환경 변수가 자동으로 채워집니다.
      helm upgrade --install opw \
      -f aws_eks.yaml \
      --set datadog.apiKey=<DATADOG_API_KEY> \
      --set datadog.pipelineId=<PIPELINE_ID> \
      --set <SOURCE_ENV_VARIABLES> \
      --set <DESTINATION_ENV_VARIABLES> \
      --set service.ports[0].protocol=TCP,service.ports[0].port=<SERVICE_PORT>,service.ports[0].targetPort=<TARGET_PORT> \
      datadog/observability-pipelines-worker
      
      참고: 기본적으로 Kubernetes Service에서는 수신 포트 <SERVICE_PORT>를 Worker 포트가 수신 중인 포트(<TARGET_PORT>)에 매핑합니다. Worker 포드 포트를 Kubernetes Service의 다른 수신 포트로 매핑하고 싶으면 명령에서 다음 service.ports[0].portservice.ports[0].targetPort 값을 사용하세요.
      --set service.ports[0].protocol=TCP,service.ports[0].port=8088,service.ports[0].targetPort=8282
      
    5. 관측 가능성 파이프라인 설치 페이지로 돌아가서 배포를 클릭하세요.

    파이프라인 구성을 변경하고 싶으면 기존 파이프라인 업데이트을 참고하세요.

    1. Azure AKS용 헬름(Helm) 차트 값 파일을 다운로드하세요.
    2. API 키 선택을 클릭해 사용하고 싶은 Datadog API 키를 선택하세요.
    3. Helm에 Datadog 차트 리포지토리를 추가하세요.
      helm repo add datadog https://helm.datadoghq.com
      
      이미 Datadog 차트 리포지토리를 가지고 있는 경우에는 다음 명령을 실행해 리포지토리가 최신 버전인지 확인하세요.
      helm repo update
      
    4. 안내된 명령을 UI에 실행해 Worker를 설치하세요. 이 명령을 사용하면 이전에 입력한 환경 변수가 자동으로 채워집니다.
      helm upgrade --install opw \
      -f azure_aks.yaml \
      --set datadog.apiKey=<DATADOG_API_KEY> \
      --set datadog.pipelineId=<PIPELINE_ID> \
      --set <SOURCE_ENV_VARIABLES> \
      --set <DESTINATION_ENV_VARIABLES> \
      --set service.ports[0].protocol=TCP,service.ports[0].port=<SERVICE_PORT>,service.ports[0].targetPort=<TARGET_PORT> \
      datadog/observability-pipelines-worker
      
      참고: 기본적으로 Kubernetes Service에서는 수신 포트 <SERVICE_PORT>를 Worker 포트가 수신 중인 포트(<TARGET_PORT>)에 매핑합니다. Worker 포드 포트를 Kubernetes Service의 다른 수신 포트로 매핑하고 싶으면 명령에서 다음 service.ports[0].portservice.ports[0].targetPort 값을 사용하세요.
      --set service.ports[0].protocol=TCP,service.ports[0].port=8088,service.ports[0].targetPort=8282
      
    5. 관측 가능성 파이프라인 설치 페이지로 돌아가서 배포를 클릭하세요.

    파이프라인 설정을 변경하려면 기존 파이프라인 업데이트를 참조하세요.

    1. Google GKE용 Helm 차트 값 파일을 다운로드하세요.
    2. API 키 선택을 클릭해 사용하고 싶은 Datadog API 키를 선택하세요.
    3. Helm에 Datadog 차트 리포지토리를 추가하세요.
      helm repo add datadog https://helm.datadoghq.com
      
      이미 Datadog 차트 리포지토리를 가지고 있는 경우에는 다음 명령을 실행해 리포지토리가 최신 버전인지 확인하세요.
      helm repo update
      
    4. 안내된 명령을 UI에 실행해 Worker를 설치하세요. 이 명령을 사용하면 이전에 입력한 환경 변수가 자동으로 채워집니다.
      helm upgrade --install opw \
      -f google_gke.yaml \
      --set datadog.apiKey=<DATADOG_API_KEY> \
      --set datadog.pipelineId=<PIPELINE_ID> \
      --set <SOURCE_ENV_VARIABLES> \
      --set <DESTINATION_ENV_VARIABLES> \
      --set service.ports[0].protocol=TCP,service.ports[0].port=<SERVICE_PORT>,service.ports[0].targetPort=<TARGET_PORT> \
      datadog/observability-pipelines-worker
      
      참고: 기본적으로 Kubernetes Service에서는 수신 포트 <SERVICE_PORT>를 Worker 포트가 수신 중인 포트(<TARGET_PORT>)에 매핑합니다. Worker 포드 포트를 Kubernetes Service의 다른 수신 포트로 매핑하고 싶으면 명령에서 다음 service.ports[0].portservice.ports[0].targetPort 값을 사용하세요.
      --set service.ports[0].protocol=TCP,service.ports[0].port=8088,service.ports[0].targetPort=8282
      
    5. 관측 가능성 파이프라인 설치 페이지로 돌아가서 배포를 클릭하세요.

    파이프라인 구성을 변경하고 싶으면 기존 파이프라인 업데이트을 참고하세요.

    1. API 키 선택을 클릭해 사용하고 싶은 Datadog API 키를 선택하세요.

    2. UI에 제공된 원스텝 명령을 실행하여 Worker를 설치합니다.

      참고: /etc/default/observability-pipelines-worker에서 Worker가 사용하는 환경 변수는 설치 스크립트를 실행할 때 업데이트되지 않습니다. 변경이 필요한 경우 파일을 수동으로 업데이트하고 Worker를 다시 시작하세요.

    한 줄 설치 스크립트를 사용하지 않으려면 다음 단계별 지침을 따르세요.

    1. HTTPS를 사용하여 다운로드할 수 있도록 APT 전송을 설정합니다:
      sudo apt-get update
      sudo apt-get install apt-transport-https curl gnupg
      
    2. 다음 명령을 실행하여 시스템에 Datadog deb 리포지토리를 설정하고 Datadog 아카이브 키링을 생성합니다:
      sudo sh -c "echo 'deb [signed-by=/usr/share/keyrings/datadog-archive-keyring.gpg] https://apt.datadoghq.com/ stable observability-pipelines-worker-2' > /etc/apt/sources.list.d/datadog-observability-pipelines-worker.list"
      sudo touch /usr/share/keyrings/datadog-archive-keyring.gpg
      sudo chmod a+r /usr/share/keyrings/datadog-archive-keyring.gpg
      curl https://keys.datadoghq.com/DATADOG_APT_KEY_CURRENT.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
      curl https://keys.datadoghq.com/DATADOG_APT_KEY_06462314.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
      curl https://keys.datadoghq.com/DATADOG_APT_KEY_F14F620E.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
      curl https://keys.datadoghq.com/DATADOG_APT_KEY_C0962C7D.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
      
    3. 다음 명령을 실행하여 로컬 apt 리포지토리를 업데이트하고 Worker를 설치합니다:
      sudo apt-get update
      sudo apt-get install observability-pipelines-worker datadog-signing-keys
      
    4. Worker의 환경 파일에 키, 사이트(예: US1의 경우 datadoghq.com), 소스 및 대상 환경 변수를 추가합니다.
      sudo cat &lt;<EOF > /etc/default/observability-pipelines-worker
      DD_API_KEY=<DATADOG_API_KEY>
      DD_OP_PIPELINE_ID=<PIPELINE_ID>
      DD_SITE=<DATADOG_SITE>
      <SOURCE_ENV_VARIABLES>
      <DESTINATION_ENV_VARIABLES>
      EOF
      
    5. Worker를 시작합니다.
      sudo systemctl restart observability-pipelines-worker
      

    파이프라인의 설정을 변경하려면 기존 파이프라인 업데이트를 참조하세요.

    1. API 키 선택을 클릭해 사용하고 싶은 Datadog API 키를 선택하세요.

    2. UI에 제공된 원스텝 명령을 실행하여 Worker를 설치합니다.

      참고: /etc/default/observability-pipelines-worker에서 Worker가 사용하는 환경 변수는 설치 스크립트를 실행할 때 업데이트되지 않습니다. 변경이 필요한 경우 파일을 수동으로 업데이트하고 Worker를 다시 시작하세요.

    한 줄 설치 스크립트를 사용하지 않으려면 다음 단계별 지침을 따르세요.

    1. 아래 명령어로 시스템에 Datadog rpm 리포지토리를 설정합니다. 참고: RHEL 8.1 또는 CentOS 8.1을 실행하는 경우, 아래 설정 에서 repo_gpgcheck=1 대신 repo_gpgcheck=0 을 사용하세요.
      cat &lt;<EOF > /etc/yum.repos.d/datadog-observability-pipelines-worker.repo
      [observability-pipelines-worker]
      name = Observability Pipelines Worker
      baseurl = https://yum.datadoghq.com/stable/observability-pipelines-worker-2/\$basearch/
      enabled=1
      gpgcheck=1
      repo_gpgcheck=1
      gpgkey=https://keys.datadoghq.com/DATADOG_RPM_KEY_CURRENT.public
          https://keys.datadoghq.com/DATADOG_RPM_KEY_B01082D3.public
      EOF
      
    2. 패키지를 업데이트하고 Worker를 설치합니다.
      sudo yum makecache
      sudo yum install observability-pipelines-worker
      
    3. Worker의 환경 파일에 키, 사이트(예: US1의 경우 datadoghq.com), 소스 및 대상 환경 변수를 추가합니다.
      sudo cat &lt;&lt;-EOF > /etc/default/observability-pipelines-worker
      DD_API_KEY=<API_KEY>
      DD_OP_PIPELINE_ID=<PIPELINE_ID>
      DD_SITE=<SITE>
      <SOURCE_ENV_VARIABLES>
      <DESTINATION_ENV_VARIABLES>
      EOF
      
    4. Worker를 시작합니다.
      sudo systemctl restart observability-pipelines-worker
      
    5. 관측 가능성 파이프라인 설치 페이지로 돌아가서 배포를 클릭하세요.

    파이프라인의 설정을 변경하려면 기존 파이프라인 업데이트를 참조하세요.

    1. 파이프라인의 예상 로그 볼륨을 입력하려면 드롭다운 메뉴 옵션 중 하나를 선택하세요.

      옵션설명
      Unsure로그 볼륨을 예상할 수 없거나 Worker를 테스트하고 싶을 경우 이 옵션을 선택하세요. 이 옵션의 경우 일반적인 목적의 t4g.large 인스턴스 2개를 최댓값으로 EC2 Auto Scaling 그룹을 프로비저닝합니다.
      1-5 TB/day이 옵션을 선택하면 컴퓨팅 최적화된 c6g.large 인스턴스 2개를 최댓값으로 EC2 Auto Scaling 그룹을 프로비저닝합니다.
      5-10 TB/day이 옵션을 선택하면 컴퓨팅 최적화된 c6g.large 인스턴스 2개를 최솟값으로, 5개를 최댓값으로 EC2 Auto Scaling 그룹을 프로비저닝합니다.
      >10 TB/dayDatadog에서는 대용량 프로덕션 배포에 이 옵션을 사용할 것을 권고합니다. 이 옵션을 선택하면 컴퓨팅 최적화된 c6g.xlarge 인스턴스 2개를 최솟값으로, 10개를 최댓값으로 EC2 Auto Scaling 그룹을 프로비저닝합니다.

      참고: 다른 파라미터는 Worker 배포에 적절한 기본값으로 설정되어 있습니다. 그러나 스택을 생성하기 전에 AWS 콘솔에서 내 사용 사례에 맞게 값을 조정할 수 있습니다.

    2. Worker를 설치할 때 사용할 AWS 리전을 선택하세요.

    3. API 키 선택을 클릭해 사용하고 싶은 Datadog API 키를 선택하세요.

    4. Launch CloudFormation Template을 클릭해 AWS 콘솔로 이동해 스택 구성을 검토한 후 실행하세요. CloudFormation 파라미터가 올바른지 다시 확인하세요.

    5. Worker를 설치할 때 사용할 VPC와 서브넷을 선택하세요.

    6. IAM과 관련해 필요한 권한 체크 상자가 모두 선택되어 있는지 검토하세요. Submit을 클릭해 스택을 생성하세요. 이 지점부터 CloudFormation에서 설치를 처리합니다. Worker 인스턴스가 실행되고, 필요한 소프트웨어가 설치되며, Worker가 자동으로 시작됩니다.

    7. 관측 가능성 파이프라인 설치 페이지로 돌아가서 배포를 클릭하세요.

    파이프라인 구성을 변경하고 싶으면 기존 파이프라인 업데이트을 참고하세요.