Log Enrichment for HTTP Server

Docs > Observability Pipelines（観測データの制御） > パイプラインをセットアップ > ログエンリッチメント > Log Enrichment for HTTP Server

このページは日本語には対応しておりません。随時翻訳に取り組んでいます。
翻訳に関してご質問やご意見ございましたら、お気軽にご連絡ください。

Overview

Use the Observability Pipelines Worker to enrich and transform your HTTP client logs before routing them to their destination.

The log sources, processors, and destinations available for this use case

This document walks you through the following:

The prerequisites needed to set up Observability Pipelines
Setting up Observability Pipelines

Prerequisites

Observability Pipelines の HTTP/S Server ソースを使用するには、次の情報が必要です。

The HTTP/S server address, such as 0.0.0.0:9997. The Observability Pipelines Worker listens to this socket address for your HTTP client logs.
HTTP 認証パスワード。

Set up Observability Pipelines

Navigate to Observability Pipelines.
Select the Log Enrichment template to create a new pipeline.
Select the HTTP Server source.

Set up the source

To configure your HTTP/S Server source, enter the following:

Select your authorization strategy.
Select the decoder you want to use on the HTTP messages. Your HTTP client logs must be in this format. Note: If you select bytes decoding, the raw log is stored in the message field.
Optionally, toggle the switch to enable TLS. If you enable TLS, the following certificate and key files are required.
Note: All file paths are made relative to the configuration data directory, which is /var/lib/observability-pipelines-worker/config/ by default. See Advanced Configurations for more information. The file must be owned by the observability-pipelines-worker group and observability-pipelines-worker user, or at least readable by the group or user.
- Server Certificate Path: The path to the certificate file that has been signed by your Certificate Authority (CA) Root File in DER or PEM (X.509).
- CA Certificate Path: The path to the certificate file that is your Certificate Authority (CA) Root File in DER or PEM (X.509).
- Private Key Path: The path to the .key private key file that belongs to your Server Certificate Path in DER or PEM (PKCS #8) format.

Set up the destinations

Enter the following information based on your selected logs destinations.

Optionally, enter the name of the Amazon OpenSearch index. See template syntax if you want to route logs to different indexes based on specific fields in your logs.
Select an authentication strategy, Basic or AWS. For AWS, enter the AWS region.
Optionally, toggle the switch to enable Buffering Options.
Note: Buffering options is in Preview. Contact your account manager to request access.
- If left disabled, the maximum size for buffering is 500 events.
- If enabled:
  1. Select the buffer type you want to set (Memory or Disk).
  2. Enter the buffer size and select the unit.

Prerequisites

Follow the Getting Started with Amazon Security Lake to set up Amazon Security Lake, and make sure to:
- Enable Amazon Security Lake for the AWS account.
- Select the AWS regions where S3 buckets will be created for OCSF data.
Follow Collecting data from custom sources in Security Lake to create a custom source in Amazon Security Lake.
- When you configure a custom log source in Security Lake in the AWS console:
  - Enter a source name.
  - Select the OCSF event class for the log source and type.
  - Enter the account details for the AWS account that will write logs to Amazon Security Lake:
    - AWS account ID
    - External ID
- Select Create and use a new service for service access.
- Take note of the name of the bucket that is created because you need it when you set up the Amazon Security Lake destination later on.
  - To find the bucket name, navigate to Custom Sources. The bucket name is in the location for your custom source. For example, if the location is s3://aws-security-data-lake-us-east-2-qjh9pr8hy/ext/op-api-activity-test, the bucket name is aws-security-data-lake-us-east-2-qjh9pr8hy.

Set up the destination

Enter your S3 bucket name.
Enter the AWS region.
Enter the custom source name.
Optionally, select an AWS authentication option.
1. Enter the ARN of the IAM role you want to assume.
2. Optionally, enter the assumed role session name and external ID.
Optionally, toggle the switch to enable TLS. If you enable TLS, the following certificate and key files are required.
Note: All file paths are made relative to the configuration data directory, which is /var/lib/observability-pipelines-worker/config/ by default. See Advanced Configurations for more information. The file must be owned by the observability-pipelines-worker group and observability-pipelines-worker user, or at least readable by the group or user.
- Server Certificate Path: The path to the certificate file that has been signed by your Certificate Authority (CA) Root File in DER or PEM (X.509).
- CA Certificate Path: The path to the certificate file that is your Certificate Authority (CA) Root File in DER or PEM (X.509).
- Private Key Path: The path to the .key private key file that belongs to your Server Certificate Path in DER or PEM (PKCS#8) format.
Optionally, toggle the switch to enable Buffering Options.
Note: Buffering options is in Preview. Contact your account manager to request access.
- If left disabled, the maximum size for buffering is 500 events.
- If enabled:
  1. Select the buffer type you want to set (Memory or Disk).
  2. Enter the buffer size and select the unit.

Notes:

When you add the Amazon Security Lake destination, the OCSF processor is automatically added so that you can convert your logs to Parquet before they are sent to Amazon Security Lake. See Remap to OCSF documentation for setup instructions.
Only logs formatted by the OCSF processor are converted to Parquet.

Observability Pipelines Worker for Google Chronicle を認証するには、Google Security Operations の担当者に連絡して Google Developer Service Account Credential を取得してください。このクレデンシャルは JSON ファイルであり、DD_OP_DATA_DIR/config に配置する必要があります。詳細については、API 認証情報の取得を参照してください。

Worker の Google Chronicle 宛先を設定するには、以下の手順を行います:

Google Chronicle インスタンスのカスタマー ID を入力します。
先ほどダウンロードした資格情報 JSON ファイルへのパスを入力します。
ドロップダウンメニューで JSON または Raw エンコーディングを選択します。
ログタイプを入力します。ログの特定のフィールドに基づいて異なるログタイプに振り分けたい場合は、テンプレート構文を参照してください。

注: Google Chronicle 宛先に送信するログにはインジェスションラベルが必須です。たとえば、A10 ロードバランサーのログであれば、インジェスションラベルとして A10_LOAD_BALANCER を付与する必要があります。利用可能なログタイプと対応するインジェスションラベルの一覧については、Google Cloud のデフォルトパーサーでログタイプをサポートするを参照してください。

To use the CrowdStrike NG-SIEM destination, you need to set up a CrowdStrike data connector using the HEC/HTTP Event Connector. See Step 1: Set up the HEC/HTTP event data connector for instructions. When you set up the data connector, you are given a HEC API key and URL, which you use when you configure the Observability Pipelines Worker later on.

Select JSON or Raw encoding in the dropdown menu.
Optionally, enable compressions and select an algorithm (gzip or zlib) in the dropdown menu.
Optionally, toggle the switch to enable TLS. If you enable TLS, the following certificate and key files are required.
Note: All file paths are made relative to the configuration data directory, which is /var/lib/observability-pipelines-worker/config/ by default. See Advanced Configurations for more information. The file must be owned by the observability-pipelines-worker group and observability-pipelines-worker user, or at least readable by the group or user.
- Server Certificate Path: The path to the certificate file that has been signed by your Certificate Authority (CA) Root File in DER or PEM (X.509).
- CA Certificate Path: The path to the certificate file that is your Certificate Authority (CA) Root File in DER or PEM (X.509).
- Private Key Path: The path to the .key private key file that belongs to your Server Certificate Path in DER or PEM (PKCS#8) format.
Optionally, toggle the switch to enable Buffering Options.
Note: Buffering options is in Preview. Contact your account manager to request access.
- If left disabled, the maximum size for buffering is 500 events.
- If enabled:
  1. Select the buffer type you want to set (Memory or Disk).
  2. Enter the buffer size and select the unit.

Datadog の宛先に関して必要な構成手順はありません。

ワーカーが取り込んでいるログが、Datadog Agent 以外から来て、観測可能性パイプラインの Datadog Archives の宛先を使用してアーカイブに送信されている場合、それらのログは予約済み属性でタグ付けされません。また、Datadog にリハイドレートされたログには、標準属性がマッピングされません。つまり、ログをログ管理にリハイドレートする際に、ログをアーカイブにルーティングする前にログを観測可能性パイプラインで構造化して、リマッピングしない場合、Datadog テレメトリー、ログを簡単に検索する機能、統合サービスタグ付けのメリットが失われる可能性があります。

例えば、シスログが Datadog Archives に送信され、それらのログのステータスに予約済み属性 status の代わりに severity というタグが付けられ、ホストに予約済み属性 hostname の代わりに host-name というタグが付けられているとします。これらのログが Datadog にリハイドレートされると、各ログのステータスは info に設定され、どのログもホスト名のタグを持たないことになります。

If you do not have a Datadog Log Archive configured for Observability Pipelines, configure a Log Archive for your cloud provider (Amazon S3, Google Cloud Storage, or Azure Storage).

Note: You need to have the Datadog integration for your cloud provider installed to set up Datadog Log Archives. See the AWS integration, Google Cloud Platform, and Azure integration documentation for more information.

To set up the destination, follow the instructions for the cloud provider you are using to archive your logs.

Amazon S3

Enter your S3 bucket name. If you configured Log Archives, it’s the name of the bucket you created earlier.
Enter the AWS region the S3 bucket is in.
Enter the key prefix.
- Prefixes are useful for partitioning objects. For example, you can use a prefix as an object key to store objects under a particular directory. If using a prefix for this purpose, it must end in / to act as a directory path; a trailing / is not automatically added.
- See template syntax if you want to route logs to different object keys based on specific fields in your logs.
- Note: Datadog recommends that you start your prefixes with the directory name and without a lead slash (/). For example, app-logs/ or service-logs/.
Select the storage class for your S3 bucket in the Storage Class dropdown menu. If you are going to archive and rehydrate your logs:
- Note: Rehydration only supports the following storage classes:
  - Standard
  - Intelligent-Tiering, only if the optional asynchronous archive access tiers are both disabled.
  - Standard-IA
  - One Zone-IA
- If you wish to rehydrate from archives in another storage class, you must first move them to one of the supported storage classes above.
- See the Example destination and log archive setup section of this page for how to configure your Log Archive based on your Amazon S3 destination setup.
Optionally, select an AWS authentication option. If you are only using the user or role you created earlier for authentication, do not select Assume role. The Assume role option should only be used if the user or role you created earlier needs to assume a different role to access the specific AWS resource and that permission has to be explicitly defined.
If you select Assume role:
1. Enter the ARN of the IAM role you want to assume.
2. Optionally, enter the assumed role session name and external ID.
- Note: The user or role you created earlier must have permission to assume this role so that the Worker can authenticate with AWS.
Optionally, toggle the switch to enable Buffering Options.
Note: Buffering options is in Preview. Contact your account manager to request access.
- If left disabled, the maximum size for buffering is 500 events.
- If enabled:
  1. Select the buffer type you want to set (Memory or Disk).
  2. Enter the buffer size and select the unit.

Example destination and log archive setup

If you enter the following values for your Amazon S3 destination:

S3 Bucket Name: test-op-bucket
Prefix to apply to all object keys: op-logs
Storage class for the created objects: Standard

The Amazon S3 destination setup with the example values

Then these are the values you enter for configuring the S3 bucket for Log Archives:

S3 bucket: test-op-bucket
Path: op-logs
Storage class: Standard

The log archive configuration with the example values

Google Cloud Storage

Enter the name of your Google Cloud storage bucket. If you configured Log Archives, it’s the bucket you created earlier.
If you have a credentials JSON file, enter the path to your credentials JSON file. If you configured Log Archives it’s the credentials you downloaded earlier. The credentials file must be placed under DD_OP_DATA_DIR/config. Alternatively, you can use the GOOGLE_APPLICATION_CREDENTIALS environment variable to provide the credential path.
- If you’re using workload identity on Google Kubernetes Engine (GKE), the GOOGLE_APPLICATION_CREDENTIALS is provided for you.
- The Worker uses standard Google authentication methods.
Select the storage class for the created objects.
Select the access level of the created objects.
Optionally, enter in the prefix.
- Prefixes are useful for partitioning objects. For example, you can use a prefix as an object key to store objects under a particular directory. If using a prefix for this purpose, it must end in / to act as a directory path; a trailing / is not automatically added.
- See template syntax if you want to route logs to different object keys based on specific fields in your logs.
- Note: Datadog recommends that you start your prefixes with the directory name and without a lead slash (/). For example, app-logs/ or service-logs/.
Optionally, click Add Header to add metadata.
Optionally, toggle the switch to enable Buffering Options.
Note: Buffering options is in Preview. Contact your account manager to request access.
- If left disabled, the maximum size for buffering is 500 events.
- If enabled:
  1. Select the buffer type you want to set (Memory or Disk).
  2. Enter the buffer size and select the unit.

Azure Storage

Enter the name of the Azure container you created earlier.
Optionally, enter a prefix.
- Prefixes are useful for partitioning objects. For example, you can use a prefix as an object key to store objects under a particular directory. If using a prefix for this purpose, it must end in / to act as a directory path; a trailing / is not automatically added.
- See template syntax if you want to route logs to different object keys based on specific fields in your logs.
- Note: Datadog recommends that you start your prefixes with the directory name and without a lead slash (/). For example, app-logs/ or service-logs/.
Optionally, toggle the switch to enable Buffering Options.
Note: Buffering options is in Preview. Contact your account manager to request access.
- If left disabled, the maximum size for buffering is 500 events.
- If enabled:
  1. Select the buffer type you want to set (Memory or Disk).
  2. Enter the buffer size and select the unit.

The following fields are optional:

Enter the name for the Elasticsearch index. See template syntax if you want to route logs to different indexes based on specific fields in your logs.
Enter the Elasticsearch version.
Optionally, toggle the switch to enable Buffering Options.
Note: Buffering options is in Preview. Contact your account manager to request access.
- If left disabled, the maximum size for buffering is 500 events.
- If enabled:
  1. Select the buffer type you want to set (Memory or Disk).
  2. Enter the buffer size and select the unit.

Prerequisites

To set up the Microsoft Sentinel destination, you need to create a Workspace in Azure if you haven’t already. In that workspace:

Add Microsoft Sentinel to the workspace.
Create a Data Collection Endpoint (DCE).
Create a Log Analytics Workspace in the workspace if you haven’t already.
In the Log Analytics Workspace, navigate to Settings > Tables.
1. Click + Create.
2. Define a custom table (for example, MyOPWLogs).
  - Notes:
    - After the table is configured, the prefix Custom- and suffix _CL are automatically appended to the table name. For example, if you defined the table name in Azure to be MyOPWLogs, the full table name is stored as Custom-MyOPWLogs_CL. You must use the full table name when you set up the Observability Pipelines Microsoft Sentinel destination.
    -The full table name can be found in the resource JSON of the DCR under streamDeclarations.
    - You can also use an Azure Table instead of a custom table.
3. Select New Custom Log (DCR-based).
4. Click Create a new data collection rule and select the DCE you created earlier.
5. Click Next.
6. Upload a sample JSON Log. For this example, the following JSON is used for the Schema and Transformation, where TimeGenerated is required:
```
{
    "TimeGenerated": "2024-07-22T11:47:51Z",
    "event": {}
}
```
7. Click Create.
In Azure, navigate to Microsoft Entra ID.
1. Click Add > App Registration.
2. Click Create.
3. On the overview page, click Client credentials: Add a certificate or secret.
4. Click New client secret.
5. Enter a name for the secret and click Add. Note: Make sure to take note of the client secret, which gets obfuscated after 10 minutes.
6. Also take note of the Tenant ID and Client ID. You need this information, along with the client secret, when you set up the Observability Pipelines Microsoft Sentinel destination.
In Azure Portal’s Data Collection Rules page, search for and select the DCR you created earlier.
1. Click Access Control (IAM) in the left nav.
2. Click Add and select Add role assignment.
3. Add the Monitoring Metrics Publisher role.
4. On the Members page, select User, group, or service principal.
5. Click Select Members and search for the application you created in the app registration step.
6. Click Review + Assign. Note: It can take up to 10 minutes for the IAM change to take effect.
Optionally, toggle the switch to enable Buffering Options.
Note: Buffering options is in Preview. Contact your account manager to request access.
- If left disabled, the maximum size for buffering is 500 events.
- If enabled:
  1. Select the buffer type you want to set (Memory or Disk).
  2. Enter the buffer size and select the unit.

The table below summarizes the Azure and Microsoft Sentinel information you need when you set up the Observability Pipelines Microsoft Sentinel destination:

Name	Description
Application (client) ID	The Azure Active Directory (AD) application’s client ID. See Register an application in Microsoft Entra ID for more information. Example: `550e8400-e29b-41d4-a716-446655440000`
Directory (tenant) ID	The Azure AD tenant ID. See Register an application in Microsoft Entra ID for more information. Example: `72f988bf-86f1-41af-91ab-2d7cd011db47`
Table (Stream) Name	The name of the stream which matches the table chosen when configuring the Data Collection Rule (DCR). Note: The full table name can be found in the resource JSON of the DCR under `streamDeclarations`. Example: `Custom-MyOPWLogs_CL`
Data Collection Rule (DCR) immutable ID	This is the immutable ID of the DCR where logging routes are defined. It is the Immutable ID shown on the DCR Overview page. Note: Ensure the Monitoring Metrics Publisher role is assigned in the DCR IAM settings. Example: `dcr-000a00a000a00000a000000aa000a0aa` See Data collection rules (DCRs) in Azure Monitor to learn more about creating or viewing DCRs.

Set up the destination in Observability Pipelines

To set up the Microsoft Sentinel destination in Observability Pipelines:

Enter the client ID for your application, such as 550e8400-e29b-41d4-a716-446655440000.
Enter the directory ID for your tenant, such as 72f988bf-86f1-41af-91ab-2d7cd011db47. This is the Azure AD tenant ID.
Enter the full table name to which you are sending logs. An example table name: Custom-MyOPWLogs_CL.
Enter the Data Collection Rule (DCR) immutable ID, such as dcr-000a00a000a00000a000000aa000a0aa.

Select the data center region (US or EU) of your New Relic account.
Optionally, toggle the switch to enable Buffering Options.
Note: Buffering options is in Preview. Contact your account manager to request access.
- If left disabled, the maximum size for buffering is 500 events.
- If enabled:
  1. Select the buffer type you want to set (Memory or Disk).
  2. Enter the buffer size and select the unit.

Optionally, enter the name of the OpenSearch index. See template syntax if you want to route logs to different indexes based on specific fields in your logs.
Optionally, toggle the switch to enable Buffering Options.
Note: Buffering options is in Preview. Contact your account manager to request access.
- If left disabled, the maximum size for buffering is 500 events.
- If enabled:
  1. Select the buffer type you want to set (Memory or Disk).
  2. Enter the buffer size and select the unit.

Select your SentinelOne logs environment in the dropdown menu.
Optionally, toggle the switch to enable Buffering Options.
Note: Buffering options is in Preview. Contact your account manager to request access.
- If left disabled, the maximum size for buffering is 500 events.
- If enabled:
  1. Select the buffer type you want to set (Memory or Disk).
  2. Enter the buffer size and select the unit.

In the Mode dropdown menu, select the socket type to use.
In the Encoding dropdown menu, select either JSON or Raw message as the output format.
Optionally, toggle the switch to enable TLS. If you enable TLS, the following certificate and key files are required:
- Server Certificate Path: The path to the certificate file that has been signed by your Certificate Authority (CA) Root File in DER or PEM (X.509).
- CA Certificate Path: The path to the certificate file that is your Certificate Authority (CA) Root File in DER or PEM (X.509).
- Private Key Path: The path to the .key private key file that belongs to your Server Certificate Path in DER or PEM (PKCS#8) format.
Optionally, toggle the switch to enable Buffering Options.
Note: Buffering options is in Preview. Contact your account manager to request access.
- If left disabled, the maximum size for buffering is 500 events.
- If enabled:
  1. Select the buffer type you want to set (Memory or Disk).
  2. Enter the buffer size and select the unit.

Splunk HEC アドレス:
- Observability Pipelines Worker がリッスンして、本来 Splunk インデクサー向けであるログを受信するバインドアドレスです。例えば、0.0.0.0:8088。
  注: /services/collector/event は自動的にエンドポイントに付加されます。
- 環境変数 DD_OP_SOURCE_SPLUNK_HEC_ADDRESS に格納されます。

以下のフィールドはオプションです。

Encoding ドロップダウンメニューで、パイプラインの出力を JSON、Logfmt、または Raw テキストでエンコードするかを選択します。デコードが選択されていない場合、デフォルトで JSON にデコードされます。
Sumo Logic コレクターのソースに設定されたデフォルトの name 値を上書きするには、source name を入力します。
Sumo Logic コレクターのソースに設定されたデフォルトの host 値を上書きするには、host name を入力します。
Sumo Logic コレクターのソースに設定されたデフォルトの category 値を上書きするには、category name を入力します。
カスタムヘッダーフィールドと値を追加するには、Add Header をクリックします。

The rsyslog and syslog-ng destinations support the RFC5424 format.

The rsyslog and syslog-ng destinations match these log fields to the following Syslog fields:

Log Event	SYSLOG FIELD	Default
log[“message”]	MESSAGE	`NIL`
log[“procid”]	PROCID	The running Worker’s process ID.
log[“appname”]	APP-NAME	`observability_pipelines`
log[“facility”]	FACILITY	`8 (log_user)`
log[“msgid”]	MSGID	`NIL`
log[“severity”]	SEVERITY	`info`
log[“host”]	HOSTNAME	`NIL`
log[“timestamp”]	TIMESTAMP	Current UTC time.

The following destination settings are optional:

Toggle the switch to enable TLS. If you enable TLS, the following certificate and key files are required:
- Server Certificate Path: The path to the certificate file that has been signed by your Certificate Authority (CA) Root File in DER or PEM (X.509).
- CA Certificate Path: The path to the certificate file that is your Certificate Authority (CA) Root File in DER or PEM (X.509).
- Private Key Path: The path to the .key private key file that belongs to your Server Certificate Path in DER or PEM (PKCS#8) format.
Enter the number of seconds to wait before sending TCP keepalive probes on an idle connection.
Optionally, toggle the switch to enable Buffering Options.
Note: Buffering options is in Preview. Contact your account manager to request access.
- If left disabled, the maximum size for buffering is 500 events.
- If enabled:
  1. Select the buffer type you want to set (Memory or Disk).
  2. Enter the buffer size and select the unit.

Add additional destinations

Click the plus sign (+) to the left of the destinations to add additional destinations to the same set of processors.

To delete a destination, click on the pencil icon to the top right of the destination, and select Delete destination. If you delete a destination from a processor group that has multiple destinations, only the deleted destination is removed. If you delete a destination from a processor group that only has one destination, both the destination and the processor group are removed.

Notes:

A pipeline must have at least one destination. If a processor group only has one destination, that destination cannot be deleted.
You can add a total of three destinations for a pipeline.
A specific destination can only be added once. For example, you cannot add multiple Splunk HEC destinations.

Set up processors

There are pre-selected processors added to your processor group out of the box. You can add additional processors or delete any existing ones based on your processing needs.

Processor groups are executed from top to bottom. The order of the processors is important because logs are checked by each processor, but only logs that match the processor’s filters are processed. To modify the order of the processors, use the drag handle on the top left corner of the processor you want to move.

Filter query syntax

Each processor has a corresponding filter query in their fields. Processors only process logs that match their filter query. And for all processors except the filter processor, logs that do not match the query are sent to the next step of the pipeline. For the filter processor, logs that do not match the query are dropped.

For any attribute, tag, or key:value pair that is not a reserved attribute, your query must start with @. Conversely, to filter reserved attributes, you do not need to append @ in front of your filter query.

For example, to filter out and drop status:info logs, your filter can be set as NOT (status:info). To filter out and drop system-status:info, your filter must be set as NOT (@system-status:info).

Filter query examples:

NOT (status:debug): This filters for only logs that do not have the status DEBUG.
status:ok service:flask-web-app: This filters for all logs with the status OK from your flask-web-app service.
- This query can also be written as: status:ok AND service:flask-web-app.
host:COMP-A9JNGYK OR host:COMP-J58KAS: This filter query only matches logs from the labeled hosts.
@user.status:inactive: This filters for logs with the status inactive nested under the user attribute.

Queries run in the Observability Pipelines Worker are case sensitive. Learn more about writing filter queries in Datadog’s Log Search Syntax.

プロセッサを追加

使用するプロセッサの情報を入力します。 Add ボタンをクリックして、プロセッサを追加します。プロセッサを削除するには、プロセッサの右側にあるケバブメニューをクリックし、Delete を選択します。

Use this processor to add a field name and value of an environment variable to the log message.

To set up this processor:

Define a filter query. Only logs that match the specified filter query are processed. All logs, regardless of whether they match the filter query, are sent to the next step in the pipeline.
Enter the field name for the environment variable.
Enter the environment variable name.
Click Add Environment Variable if you want to add another environment variable.

Blocked environment variables

Environment variables that match any of the following patterns are blocked from being added to log messages because the environment variable could contain sensitive data.

CONNECTIONSTRING / CONNECTION-STRING / CONNECTION_STRING
AUTH
CERT
CLIENTID / CLIENT-ID / CLIENT_ID
CREDENTIALS
DATABASEURL / DATABASE-URL / DATABASE_URL
DBURL / DB-URL / DB_URL
KEY
OAUTH
PASSWORD
PWD
ROOT
SECRET
TOKEN
USER

The environment variable is matched to the pattern and not the literal word. For example, PASSWORD blocks environment variables like USER_PASSWORD and PASSWORD_SECRET from getting added to the log messages.

Allowlist

After you have added processors to your pipeline and clicked Next: Install, in the Add environment variable processor(s) allowlist field, enter a comma-separated list of environment variables you want to pull values from and use with this processor.

The allowlist is stored in the environment variable DD_OP_PROCESSOR_ADD_ENV_VARS_ALLOWLIST.

This processor adds a field with the name of the host that sent the log. For example, hostname: 613e197f3526. Note: If the hostname already exists, the Worker throws an error and does not overwrite the existing hostname.

To set up this processor:

Define a filter query. Only logs that match the specified filter query are processed. All logs, regardless of whether they do or do not match the filter query, are sent to the next step in the pipeline.

Use this processor with Vector Remap Language (VRL) to modify and enrich your logs. VRL is an expression-oriented, domain specific language designed for transforming logs. It features a simple syntax and built-in functions for observability use cases. You can use custom functions in the following ways:

Manipulate arrays, strings, and other data types.
Encode and decode values using Codec.
Encrypt and decrypt values.
Coerce one datatype to another datatype (for example, from an integer to a string).
Convert syslog values to read-able values.
Enrich values by using enrichment tables.
Manipulate IP values.
Parse values with custom rules (for example, grok, regex, and so on) and out-of-the-box functions (for example, syslog, apache, VPC flow logs, and so on).
Manipulate event metadata and paths.

See Custom functions for the full list of available functions.

See Remap Reserved Attributes on how to use the Custom Processor to manually and dynamically remap attributes.

To set up this processor:

If you have not created any functions yet, click Add custom processor and follow the instructions in Add a function to create a function.
If you have already added custom functions, click Manage custom processors. Click on a function in the list to edit or delete it. You can use the search bar to find a function by its name. Click Add Custom Processor to add a function.

Add a function

Enter a name for your custom processor.
Add your script to modify your logs using custom functions. You can also click Autofill with Example and select one of the common use cases to get started. Click the copy icon for the example script and paste it into your script. See Get Started with the Custom Processor for more information.
Optionally, check Drop events on error if you want to drop events that encounter an error during processing.
Enter a sample log event.
Click Run to preview how the functions process the log. After the script has run, you can see the output for the log.
Click Save.

重複排除 (deduplicate) プロセッサーは、ボリュームとノイズを削減するためにデータの重複を削除します。一度に 5,000 件のメッセージをキャッシュし、そこに対して受信ログを比較します。たとえば、複数の同一の警告ログが連続で送信される場合に、ユニークな警告ログのみを保持したいときに使用できます。

Deduplicate プロセッサーの設定方法

フィルタークエリを定義します。指定したフィルタークエリに一致するログだけが処理されます。重複排除されたログと、フィルタークエリに一致しないログはパイプラインの次のステップに送られます。
Type of deduplication ドロップダウンメニューで、下記のフィールドについて Match するか、または Ignore するかを選択します。
- Match を選択した場合、ログが通過した後、下記で指定するすべてのフィールドの値が同一である将来のログが削除されます。
- Ignore を選択した場合、ログが通過した後、下記で指定するフィールドを除いたすべてのフィールドの値が同一である将来のログが削除されます。
Match する、または Ignore するフィールドを入力します。最低 1 つのフィールドが必要で、最大 3 つのフィールドを指定できます。
- サブフィールドを指定する場合は、<OUTER_FIELD>.<INNER_FIELD> のようなパス表記を使用します。詳細は、パス表記の例を参照してください。
Add field をクリックして、フィルターの対象にしたいフィールドを追加します。

パス表記の例

以下のようなメッセージ構造があった場合、double_inner_value という値を持つキーを指すには、outer_key.inner_key.double_inner_key のように記述します:

{
    "outer_key": {
        "inner_key": "inner_value",
        "a": {
            "double_inner_key": "double_inner_value",
            "b": "b value"
        },
        "c": "c value"
    },
    "d": "d value"
}

リマッププロセッサーは、個々のログデータ内のフィールドを追加 (add)、削除 (drop)、またはリネーム (rename) できます。このプロセッサーを使用して、ログに追加のコンテキストを付与したり、ログ容量を削減するために価値の低いフィールドを削除したり、重要な属性全体で命名を標準化したりします。はじめるには、ドロップダウンメニューから add field、drop field、または rename field を選択してください。

フィールドを追加

add field を使用して、ログに新しいキーと値のペアを追加します。

add field プロセッサーの設定方法

filter query を定義します。指定されたフィルタークエリに一致するログだけが処理されます。クエリに一致する・しないに関係なく、すべてのログはパイプラインの次のステップに送られます。
追加したいフィールド名と値を入力します。ネストされたフィールドを指定する場合は、パス表記の <OUTER_FIELD>.<INNER_FIELD> 形式を使用してください。すべての値は文字列として格納されます。注: 追加しようとしているフィールドがすでに存在する場合、Worker はエラーをスローし、既存のフィールドは変更されません。

フィールドを削除

drop field を使用して、指定したフィルタに一致したログデータからフィールドを削除します。オブジェクトを削除することも可能なため、ネストされたキーを削除する場合にもこのプロセッサーを使用できます。

drop field プロセッサーの設定方法

filter query を定義します。指定されたフィルタークエリに一致するログだけが処理されます。クエリに一致する・しないに関係なく、すべてのログはパイプラインの次のステップに送られます。
削除したいフィールドのキーを入力します。ネストされたフィールドを指定する場合は、パス表記の <OUTER_FIELD>.<INNER_FIELD> 形式を使用してください。注: 指定したキーが存在しない場合、そのログには変更が加えられません。

フィールドをリネーム

rename field を使用して、ログ内のフィールド名を変更します。

rename field プロセッサーの設定方法

filter query を定義します。指定されたフィルタークエリに一致するログだけが処理されます。クエリに一致する・しないに関係なく、すべてのログはパイプラインの次のステップに送られます。
Source field にリネームしたいフィールドの名前を入力します。ネストされたフィールドを指定する場合は、パス表記の <OUTER_FIELD>.<INNER_FIELD> 形式を使用してください。リネーム後は元のフィールドが削除されますが、後述の Preserve source tag チェックボックスを有効にすると元のフィールドを保持できます。
注: 指定したソースキーが存在しない場合、デフォルトで null 値がターゲットに適用されます。
Target field にソースフィールドをリネーム後の名前として入力します。ネストされたフィールドを指定する場合は、パス表記の <OUTER_FIELD>.<INNER_FIELD> 形式を使用してください。
注: 指定したターゲットフィールドがすでに存在する場合、Worker はエラーをスローし、既存のターゲットフィールドを上書きしません。
必要に応じて、Preserve source tag チェックボックスをオンにすると、元のソースフィールドを保持して、ソースキーの情報を指定したターゲットキーに重複させることができます。このボックスをオフにした場合は、リネーム後にソースキーが削除されます。

パス表記例

以下のようなメッセージ構造があった場合、double_inner_value という値を持つキーを指すには、outer_key.inner_key.double_inner_key のように記述します:

{
    "outer_key": {
        "inner_key": "inner_value",
        "a": {
            "double_inner_key": "double_inner_value",
            "b": "b value"
        },
        "c": "c value"
    },
    "d": "d value"
}

Use this processor to enrich your logs with information from a reference table, which could be a local file or database.

To set up the enrichment table processor:

Define a filter query. Only logs that match the specified filter query are processed. All logs, regardless of whether they do or do not match the filter query, are sent to the next step in the pipeline.
Enter the source attribute of the log. The source attribute’s value is what you want to find in the reference table.
Enter the target attribute. The target attribute’s value stores, as a JSON object, the information found in the reference table.
Select the type of reference table you want to use, File or GeoIP.
- For the File type:
  1. Enter the file path.
    Note: All file paths are made relative to the configuration data directory, which is /var/lib/observability-pipelines-worker/config/ by default. See Advanced Configurations for more information. The file must be owned by the observability-pipelines-worker group and observability-pipelines-worker user, or at least readable by the group or user.
  2. Enter the column name. The column name in the enrichment table is used for matching the source attribute value. See the Enrichment file example.
- For the GeoIP type, enter the GeoIP path.

Enrichment file example

For this example, merchant_id is used as the source attribute and merchant_info as the target attribute.

This is the example reference table that the enrichment processor uses:

merch_id	merchant_name	city	state
803	Andy’s Ottomans	Boise	Idaho
536	Cindy’s Couches	Boulder	Colorado
235	Debra’s Benches	Las Vegas	Nevada

merch_id is set as the column name the processor uses to find the source attribute’s value. Note: The source attribute’s value does not have to match the column name.

If the enrichment processor receives a log with "merchant_id":"536":

The processor looks for the value 536 in the reference table’s merch_id column.
After it finds the value, it adds the entire row of information from the reference table to the merchant_info attribute as a JSON object:

merchant_info {
    "merchant_name":"Cindy's Couches",
    "city":"Boulder",
    "state":"Colorado"
}

このプロセッサーは、指定されたフィルタークエリに一致するログをフィルタリングし、一致しないすべてのログを削除します。このプロセッサーでログが削除された場合、このプロセッサーより下位のプロセッサーはいずれも、そのログを受け取りません。このプロセッサーは、デバッグや警告などの不要なログを除外することができます。

フィルタープロセッサーをセットアップするには

フィルタークエリを定義します。クエリを指定すると、それに一致するログのみを通過させ、それ以外のログはすべて削除します。

多くの種類のログは、KPI などのトレンドを長期間にわたって追跡するテレメトリーとして使用されます。CDN ログ、VPC フローログ、ファイアウォールログ、ネットワークログなど、大量のログデータを要約する手段として、ログからメトリクスを生成することはコスト効率が高い方法です。generate metrics プロセッサを使用すると、クエリに一致するログをカウントするカウントメトリクス、またはリクエストの所要時間などログに含まれる数値値を分布として集計するディストリビューションメトリクスを生成できます。

注: 生成されたメトリクスはカスタムメトリクスとして課金されます。詳細はカスタムメトリクスの課金を参照してください。

To set up the processor:

Click Manage Metrics to create new metrics or edit existing metrics. This opens a side panel.

If you have not created any metrics yet, enter the metric parameters as described in the Add a metric section to create a metric.
If you have already created metrics, click on the metric’s row in the overview table to edit or delete it. Use the search bar to find a specific metric by its name, and then select the metric to edit or delete it. Click Add Metric to add another metric.

Add a metric

Enter a filter query. Only logs that match the specified filter query are processed. All logs, regardless of whether they match the filter query, are sent to the next step in the pipeline. Note: Since a single processor can generate multiple metrics, you can define a different filter query for each metric.
Enter a name for the metric.
Define parameters セクションで、メトリクスのタイプ (count、gauge、distribution) を選択します。カウントメトリクスの例とディストリビューションメトリクスの例を参照してください。詳細はメトリクスのタイプも参照してください。
- For gauge and distribution metric types, select a log field which has a numeric (or parseable numeric string) value that is used for the value of the generated metric.
- For the distribution metric type, the log field’s value can be an array of (parseable) numerics, which is used for the generated metric’s sample set.
- Group by フィールドでは、メトリクス値をどのようにグループ化するかを決定します。たとえば、4 つのリージョンに数百台のホストが分散している場合、リージョンでグループ化すると各リージョンごとに 1 本の線をグラフ化できます。Group by 設定で指定したフィールドは、構成されたメトリクスのタグとして設定されます。
Click Add Metric.

メトリクスタイプ

これらのタイプのメトリクスはログに対して生成できます。詳細はメトリクスのタイプとディストリビューションのドキュメントを参照してください。

メトリクスタイプ	説明	例
COUNT	Represents the total number of event occurrences in one time interval. This value can be reset to zero, but cannot be decreased.	You want to count the number of logs with `status:error`.
GAUGE	Represents a snapshot of events in one time interval.	You want to measure the latest CPU utilization per host for all logs in the production environment.
DISTRIBUTION	Represent the global statistical distribution of a set of values calculated across your entire distributed infrastructure in one time interval.	You want to measure the average time it takes for an API call to be made.

カウントメトリクスの例

status:error ログの例の場合:

{"status": "error", "env": "prod", "host": "ip-172-25-222-111.ec2.internal"}

"status":"error" を含むログの数をカウントし、それらを env と host でグループ化するカウントメトリクスを作成するには、次の情報を入力します:

入力パラメーター	値
Filter query	`@status:error`
メトリクス名	`status_error_total`
メトリクスタイプ	カウント
グループ化	`env`, `prod`

ディストリビューションメトリクスの例

次の API レスポンスログの例の場合:

{
    "timestamp": "2018-10-15T17:01:33Z",
    "method": "GET",
    "status": 200,
    "request_body": "{"information"}",
    "response_time_seconds: 10
}

API 呼び出しに要する平均時間を測定するディストリビューションメトリクスを作成するには、次の情報を入力します:

入力パラメーター	値
Filter query	`@method`
メトリクス名	`status_200_response`
メトリクスタイプ	Distribution
ログ属性を選択	`response_time_seconds`
グループ化	`method`

This processor parses logs using the grok parsing rules that are available for a set of sources. The rules are automatically applied to logs based on the log source. Therefore, logs must have a source field with the source name. If this field is not added when the log is sent to the Observability Pipelines Worker, you can use the Add field processor to add it.

If the source field of a log matches one of the grok parsing rule sets, the log’s message field is checked against those rules. If a rule matches, the resulting parsed data is added in the message field as a JSON object, overwriting the original message.

If there isn’t a source field on the log, or no rule matches the log message, then no changes are made to the log and it is sent to the next step in the pipeline.

Datadog’s Grok patterns differ from the standard Grok pattern, where Datadog’s Grok implementation provides:

Matchers that include options for how you define parsing rules
Filters for post-processing of extracted data
A set of built-in patterns tailored to common log formats

See Parsing for more information on Datadog’s Grok patterns.

To set up the grok parser, define a filter query. Only logs that match the specified filter query are processed. All logs, regardless of whether they match the filter query, are sent to the next step in the pipeline.

To test log samples for out-of-the-box rules:

Click the Preview Library Rules button.
Search or select a source in the dropdown menu.
Enter a log sample to test the parsing rules for that source.

To add a custom parsing rule:

Click Add Custom Rule.
If you want to clone a library rule, select Clone library rule and then the library source from the dropdown menu.
If you want to create a custom rule, select Custom and then enter the source. The parsing rules are applied to logs with that source.
Enter log samples to test the parsing rules.
Enter the rules for parsing the logs. See Parsing for more information on writing parsing rules with Datadog Grok patterns.
Note: The url, useragent, and csv filters are not available.
Click Advanced Settings if you want to add helper rules. See Using helper rules to factorize multiple parsing rules for more information.
Click Add Rule.

This processor parses the specified JSON field into objects. For example, if you have a message field that contains stringified JSON:

{
    "foo": "bar",
    "team": "my-team",
    "message": "{\"level\":\"info\",\"timestamp\":\"2024-01-15T10:30:00Z\",\"service\":\"user-service\",\"user_id\":\"12345\",\"action\":\"login\",\"success\":true,\"ip_address\":\"192.168.1.100\"}"
    "app_id":"streaming-services",
    "ddtags": [
    "kube_service:my-service",
    "k8_deployment :your-host"
    ]
}

Use the Parse JSON processor to parse the message field so the message field has all the attributes within a nested object.

The parse json processor with message as the field to parse on

This output contains the message field with the parsed JSON:

{
    "foo": "bar",
    "team": "my-team",
    "message": {
        "action": "login",
        "ip_address": "192.168.1.100",
        "level": "info",
        "service": "user-service",
        "success": true,
        "timestamp": "2024-01-15T10:30:00Z",
        "user_id": "12345"
    }
    "app_id":"streaming-services",
    "ddtags": [
    "kube_service:my-service",
    "k8_deployment :your-host"
    ]
}

To set up this processor:

Define a filter query. Only logs that match the specified filter query are processed. All logs, regardless of whether they do or do not match the filter query, are sent to the next step in the pipeline.
Enter the name of the field you want to parse JSON on.
Note: The parsed JSON overwrites what was originally contained in the field.

This processor parses Extensible Markup Language (XML) so the data can be processed and sent to different destinations. XML is a log format used to store and transport structured data. It is organized in a tree-like structure to represent nested information and uses tags and attributes to define the data. For example, this is XML data using only tags (<recipe>,<type>, and <name>) and no attributes:

<recipe>
    <type>pasta</type>
    <name>Carbonara</name>
</recipe>

This is an XML example where the tag recipe has the attribute type:

<recipe>
    <recipe type="pasta">
    <name>Carbonara</name>
</recipe>

The following image shows a Windows Event 4625 log in XML, next to the same log parsed and output in JSON. By parsing the XML log, the size of the log event was reduced by approximately 30%.

The XML log and the resulting parsed log in JSON

To set up this processor:

Define a filter query. Only logs that match the specified filter query are processed. All logs, regardless of whether they match the filter query, are sent to the next step in the pipeline.
Enter the path to the log field on which you want to parse XML. Use the path notation <OUTER_FIELD>.<INNER_FIELD> to match subfields. See the Path notation example below.
Optionally, in the Enter text key field, input the key name to use for the text node when XML attributes are appended. See the text key example. If the field is left empty, value is used as the key name.
Optionally, select Always use text key if you want to store text inside an object using the text key even when no attributes exist.
Optionally, toggle Include XML attributes on if you want to include XML attributes. You can then choose to add the attribute prefix you want to use. See attribute prefix example. If the field is left empty, the original attribute key is used.
Optionally, select if you want to convert data types into numbers, Booleans, or nulls.
- If Numbers is selected, numbers are parsed as integers and floats.
- If Booleans is selected, true and false are parsed as Booleans.
- If Nulls is selected, the string null is parsed as null.

Path notation example

For the following message structure:

{
    "outer_key": {
        "inner_key": "inner_value",
        "a": {
            "double_inner_key": "double_inner_value",
            "b": "b value"
        },
        "c": "c value"
    },
    "d": "d value"
}

Use outer_key.inner_key to refer to the key with the value inner_value.
Use outer_key.inner_key.double_inner_key to refer to the key with the value double_inner_value.

Always use text key example

If Always use text key is selected, the text key is the default (value), and you have the following XML:

<recipe>
    <recipe type="pasta">
    <name>Carbonara</name>
</recipe>

The XML is converted to:

{
    "recipe": {
        "type": "pasta",
        "value": "Carbonara"
        }
}

Text key example

If the key is text and you have the following XML:

<recipe>
    <recipe type="pasta">
    <name>Carbonara</name>
</recipe>

The XML is converted to:

{
    "recipe": {
        "type": "pasta",
        "text": "Carbonara"
        }
}

Attribute prefix example

If you enable Include XML attributes, the attribute is added as a prefix to each XML attribute. For example, if the attribute prefix is @ and you have the following XML:

<recipe type="pasta">Carbonara</recipe>

Then it is converted to the JSON:

{
    "recipe": {
        "@type": "pasta",
        "<text key>": "Carbonara"
        }
}

クォータプロセッサは、指定したフィルターに一致するログのロギングトラフィックを測定します。構成された日次クォータが 24 時間のローリングウィンドウ内で達成されると、プロセッサは追加のログをドロップするか、Datadog モニターを使用してアラートを送信することができます。プロセッサは、総ボリュームまたはイベントの総数を追跡するように構成できます。パイプラインは、Worker の複数の Remote Configuration デプロイメント間でクォータを識別するために、クォータの名前を使用します。

例えば、このプロセッサを構成して、過去 24 時間以内に特定のサービスから 1000 万件のイベントを受信した後に、新しいログを削除するか、削除せずにアラートをトリガーするように構成できます。

クォータプロセッサを設定するには、次の手順に従ってください、

クォータプロセッサの名前を入力します。
フィルタークエリを定義します。指定したフィルタークエリに一致するログのみが日次制限にカウントされます。
- クォータフィルターに一致し、かつ日次クォータ内のログは、パイプラインの次のステップに送信されます。
- クォータフィルターに一致しないログも、パイプラインの次のステップに送信されます。
Unit for quota (クォータの単位) ドロップダウンメニューで、クォータを Events の数か、バイト単位の Volume で測定するかを選択します。
日次クォータの上限を設定し、希望するクォータの大きさの単位を選択します。
クォータが上限に達した場合にすべてのイベントを削除したい場合は、Drop events (イベントを削除) のチェックボックスをオンにします。クォータが上限に達したときにアラートを送信するモニターをセットアップする場合は、チェックを外しておきます。
- 日次クォータの上限に達した後にクォータフィルターに一致するログが受信され、Drop events (イベントを削除) オプションが選択されている場合、それらのログは削除されます。この場合、フィルタークエリに一致しなかったログのみがパイプラインの次のステップに送信されます。
- 日次クォータの上限に達した後でも Drop events (イベントを削除) オプションが選択されていない場合、クォータフィルターに一致するログと一致しなかったログが共にパイプラインの次のステップに送信されます。
オプション: 特定のサービスまたはリージョンフィールドにクォータを設定したい場合は、Add Field をクリックします。 a. パーティション分割したいフィールド名を入力します。詳細はパーティションの例を参照してください。 i. クォータをパーティションに一致するイベントのみに適用したい場合は、Ignore when missing を選択します。詳細は「欠落時に無視」オプションの例を参照してください。 ii. オプション: パーティション化されたフィールドに異なるクォータを設定したい場合は、Overrides をクリックします。

CSV の構造例については、Download as CSV をクリックします。
オーバーライド CSV をドラッグアンドドロップしてアップロードします。または、Browse をクリックしてファイルを選択してアップロードすることもできます。詳細はオーバーライドの例を参照してください。 b. もう 1 つのパーティションを追加したい場合は、Add Field をクリックします。

例

パーティションの例

特定のサービスまたはリージョンにクォータを設定したい場合は、Partition by を使用します。例えば、1 日に 10 件のイベントのクォータを設定し、service フィールドでイベントをグループ化したい場合、service を Partition by フィールドに入力します。

「欠落時に無視」オプションの例

クォータをパーティションに一致するイベントのみに適用したい場合は、Ignore when missing を選択します。例えば、Worker が次のイベントセットを受信した場合:

{"service":"a", "source":"foo", "message": "..."}
{"service":"b", "source":"bar", "message": "..."}
{"service":"b", "message": "..."}
{"source":"redis", "message": "..."}
{"message": "..."}

そして Ignore when missing が選択されている場合、Worker は:

service:a と source:foo を持つログのセットを作成します
service:b と source:bar を持つログのセットを作成します
最後の 3 つのイベントを無視します

クォータは 2 つのログセットに適用され、最後の 3 つのイベントには適用されません。

Ignore when missing が選択されていない場合、クォータは 5 つのすべてのイベントに適用されます。

オーバーライドの例

service でパーティション分割し、2 つのサービス a と b がある場合、オーバーライドを使用してそれぞれに異なるクォータを適用できます。例えば、service:a に 5,000 バイトのクォータ制限、service:b に 50 イベントの制限を設定したい場合、オーバーライドルールは次のようになります。

サービス	タイプ	Limit
`a`	Bytes	5,000
`b`	イベント	50

The reduce processor groups multiple log events into a single log, based on the fields specified and the merge strategies selected. Logs are grouped at 10-second intervals. After the interval has elapsed for the group, the reduced log for that group is sent to the next step in the pipeline.

To set up the reduce processor:

Define a filter query. Only logs that match the specified filter query are processed. Reduced logs and logs that do not match the filter query are sent to the next step in the pipeline.
In the Group By section, enter the field you want to group the logs by.
Click Add Group by Field to add additional fields.
In the Merge Strategy section:
- In On Field, enter the name of the field you want to merge the logs on.
- Select the merge strategy in the Apply dropdown menu. This is the strategy used to combine events. See the following Merge strategies section for descriptions of the available strategies.
- Click Add Merge Strategy to add additional strategies.

Merge strategies

These are the available merge strategies for combining log events.

Name	Description
Array	Appends each value to an array.
Concat	Concatenates each string value, delimited with a space.
Concat newline	Concatenates each string value, delimited with a newline.
Concat raw	Concatenates each string value, without a delimiter.
Discard	Discards all values except the first value that was received.
Flat unique	Creates a flattened array of all unique values that were received.
Longest array	Keeps the longest array that was received.
Max	Keeps the maximum numeric value that was received.
Min	Keeps the minimum numeric value that was received.
Retain	Discards all values except the last value that was received. Works as a way to coalesce by not retaining `null`.
Shortest array	Keeps the shortest array that was received.
Sum	Sums all numeric values that were received.

Use this processor to remap logs to Open Cybersecurity Schema Framework (OCSF) events. OCSF schema event classes are set for a specific log source and type. You can add multiple mappings to one processor. Note: Datadog recommends that the OCSF processor be the last processor in your pipeline, so that remapping is done after the logs have been processed by all the other processors.

To set up this processor:

Click Manage mappings. This opens a modal:

If you have already added mappings, click on a mapping in the list to edit or delete it. You can use the search bar to find a mapping by its name. Click Add Mapping if you want to add another mapping. Select Library Mapping or Custom Mapping and click Continue.
If you have not added any mappings yet, select Library Mapping or Custom Mapping. Click Continue.

Library mapping

Add a mapping

Select the log type in the dropdown menu.
Define a filter query. Only logs that match the specified filter query are remapped. All logs, regardless of whether they do or do not match the filter query, are sent to the next step in the pipeline.
Review the sample source log and the resulting OCSF output.
Click Save Mapping.

Library mappings

These are the library mappings available:

Log Source	Log Type	OCSF Category	Supported OCSF versions
AWS CloudTrail	Type: Management EventName: ChangePassword	Account Change (3001)	1.3.0 1.1.0
Google Cloud Audit	SetIamPolicy	Account Change (3001)	1.3.0 1.1.0
Google Cloud Audit	CreateSink	Account Change (3001)	1.3.0 1.1.0
Google Cloud Audit	UpdateSync	Account Change (3001)	1.3.0 1.1.0
Google Cloud Audit	CreateBucket	Account Change (3001)	1.3.0 1.1.0
GitHub	Create User	Account Change (3001)	1.1.0
Google Workspace Admin	addPrivilege	User Account Management (3005)	1.1.0
Okta	User session start	Authentication (3002)	1.1.0
Microsoft 365 Defender	Incident	Incident Finding (2005)	1.3.0 1.1.0
Palo Alto Networks	Traffic	Network Activity (4001)	1.1.0

Custom mapping

When you set up a custom mapping, if you try to close or exit the modal, you are prompted to export your mapping. Datadog recommends that you export your mapping to save what you have set up so far. The exported mapping is saved as a JSON file.

To set up a custom mapping:

Optionally, add a name for the mapping. The default name is Custom Authentication.
Define a filter query. Only logs that match the specified filter query are remapped. All logs, regardless of whether they match the filter query, are sent to the next step in the pipeline.
Select the OCSF event category from the dropdown menu.
Select the OCSF event class from the dropdown menu.
Enter a log sample so that you can reference it when you add fields.
Click Continue.
Select any OCSF profiles that you want to add. See OCSF Schema Browser for more information.
All required fields are shown. Enter the required Source Logs Fields and Fallback Values for them. If you want to manually add additional fields, click + Field. Click the trash can icon to delete a field. Note: Required fields cannot be deleted.
- The fallback value is used for the OCSF field if the log doesn’t have the source log field.
- You can add multiple fields for Source Log Fields. For example, Okta’s user.system.start logs have either the eventType or legacyEventType field. You can map both fields to the same OCSF field.
- If you have your own OCSF mappings in JSON or saved a previous mapping that you want to use, click Import Configuration File.
Click Continue.
Some log source values must be mapped to OCSF values. For example, the values of a source log’s severity field that is mapped to the OCSF’s severity_id field, must be mapped to the OCSF severity_id’s values. See severity_id in Authentication [3002] for a list of OCSF values. An example of mapping severity values:
Log source value OCSF value
INFO Informational
WARN Medium
ERROR High
All values that are required to be mapped to an OCSF value are listed. Click + Add Row if you want to map additional values.
Click Save Mapping.

Log source value	OCSF value
`INFO`	`Informational`
`WARN`	`Medium`
`ERROR`	`High`

Filter query syntax

Filter query examples:

NOT (status:debug): This filters for only logs that do not have the status DEBUG.
status:ok service:flask-web-app: This filters for all logs with the status OK from your flask-web-app service.
- This query can also be written as: status:ok AND service:flask-web-app.
host:COMP-A9JNGYK OR host:COMP-J58KAS: This filter query only matches logs from the labeled hosts.
@user.status:inactive: This filters for logs with the status inactive nested under the user attribute.

Queries run in the Observability Pipelines Worker are case sensitive. Learn more about writing filter queries in Datadog’s Log Search Syntax.

This processor samples your logging traffic for a representative subset at the rate that you define, dropping the remaining logs. As an example, you can use this processor to sample 20% of logs from a noisy non-critical service.

The sampling only applies to logs that match your filter query and does not impact other logs. If a log is dropped at this processor, none of the processors below receives that log.

To set up the sample processor:

Define a filter query. Only logs that match the specified filter query are sampled at the specified retention rate below. The sampled logs and the logs that do not match the filter query are sent to the next step in the pipeline.
Enter your desired sampling rate in the Retain field. For example, entering 2 means 2% of logs are retained out of all the logs that match the filter query.
Optionally, enter a Group By field to create separate sampling groups for each unique value for that field. For example, status:error and status:info are two unique field values. Each bucket of events with the same field is sampled independently. Click Add Field if you want to add more fields to partition by. See the group-by example.

Group-by example

If you have the following setup for the sample processor:

Filter query: env:staging
Retain: 40% of matching logs
Group by: status and host

The sample processor with example values

Then, 40% of logs for each unique combination of status and service from env:staging is retained. For example:

40% of logs with status:info and service:networks are retained.
40% of logs with status:info and service:core-web are retained.
40% of logs with status:error and service:networks are retained.
40% of logs with status:error and service:core-web are retained.

Sensitive Data Scanner プロセッサはログをスキャンして、PII、PCI、カスタムの機密データなどの機密情報を検出し、マスキングまたはハッシュ化します。機密データをスキャンするには、当社のライブラリから定義済みのルールを選択するか、カスタムの正規表現ルールを入力できます。

Sensitive Data Scanner プロセッサをセットアップするには

フィルタークエリを定義します。指定したフィルタークエリに一致するログのみがスキャンおよび処理されます。フィルタークエリに一致するかどうかにかかわらず、すべてのログはパイプラインの次のステップに送信されます。
Add Scanning Rule をクリックします。
スキャンルールに名前を付けます。
Select scanning rule type フィールドで、ライブラリからルールを作成するか、カスタムルールを作成するかを選択します。
- ライブラリからルールを作成する場合は、使用するライブラリパターンを選択します。
- カスタムルールを作成する場合は、データに対して確認する正規表現パターンを入力します。
Scan entire or part of event セクションで、ドロップダウンメニューの Entire Event (イベント全体)、Specific Attributes (特定の属性)、Exclude Attributes (属性の除外) からスキャン対象を選択します。
- Specific Attributes (特定の属性) を選択した場合は、Add Field をクリックし、スキャンする特定の属性を入力します。最大 3 つのフィールドを追加できます。ネストされたキーにアクセスするには、パス記法 (outer_key.inner_key) を使用します。ネストされたデータを持つ特定の属性では、すべてのネストされたデータがスキャンされます。
- Exclude Attributes (属性の除外) を選択した場合は、Add Field をクリックし、スキャンから除外する特定の属性を入力します。最大 3 つのフィールドを追加できます。ネストされたキーにアクセスするには、パス記法 (outer_key.inner_key) を使用します。ネストされたデータを持つ指定された属性については、すべてのネストされたデータが除外されます。
Define action on match セクションで、一致した情報に対して実行するアクションを選択します。注: マスキング、部分的なマスキング、およびハッシュ化はすべて元に戻せないアクションです。
- 情報をマスキングする場合は、一致したデータを置き換えるテキストを指定します。
- 情報を部分的にマスキングする場合は、マスキングする文字数を指定し、部分的なマスキングを一致したデータの先頭または末尾に適用するかどうかを指定します。
- 注: ハッシュ化を選択した場合、一致した UTF-8 バイトは FarmHash の 64 ビットフィンガープリントでハッシュ化されます。
オプションとして、正規表現に一致するすべてのイベントにタグを追加し、イベントのフィルタリング、分析、アラートを行うことができます。

Add rules from the library

In the dropdown menu, select the library rule you want to use.
Recommended keywords are automatically added based on the library rule selected. After the scanning rule has been added, you can add additional keywords or remove recommended keywords.
In the Define rule target and action section, select if you want to scan the Entire Event, Specific Attributes, or Exclude Attributes in the dropdown menu.
- If you are scanning the entire event, you can optionally exclude specific attributes from getting scanned. Use path notation (outer_key.inner_key) to access nested keys. For specified attributes with nested data, all nested data is excluded.
- If you are scanning specific attributes, specify which attributes you want to scan. Use path notation (outer_key.inner_key) to access nested keys. For specified attributes with nested data, all nested data is scanned.
For Define actions on match, select the action you want to take for the matched information. Note: Redaction, partial redaction, and hashing are all irreversible actions.
- Redact: Replaces all matching values with the text you specify in the Replacement text field.
- Partially Redact: Replaces a specified portion of all matched data. In the Redact section, specify the number of characters you want to redact and which part of the matched data to redact.
- Hash: Replaces all matched data with a unique identifier. The UTF-8 bytes of the match are hashed with the 64-bit fingerprint of FarmHash.
Optionally, click Add Field to add tags you want to associate with the matched events.
Add a name for the scanning rule.
Optionally, add a description for the rule.
Click Save.

Path notation example

For the following message structure:

{
    "outer_key": {
        "inner_key": "inner_value",
        "a": {
            "double_inner_key": "double_inner_value",
            "b": "b value"
        },
        "c": "c value"
    },
    "d": "d value"
}

Use outer_key.inner_key to refer to the key with the value inner_value.
Use outer_key.inner_key.double_inner_key to refer to the key with the value double_inner_value.

Add additional keywords

After adding scanning rules from the library, you can edit each rule separately and add additional keywords to the keyword dictionary.

Navigate to your pipeline.
In the Sensitive Data Scanner processor with the rule you want to edit, click Manage Scanning Rules.
Toggle Use recommended keywords if you want the rule to use them. Otherwise, add your own keywords to the Create keyword dictionary field. You can also require that these keywords be within a specified number of characters of a match. By default, keywords must be within 30 characters before a matched value.
Click Update.

Add a custom rule

In the Define match conditions section, specify the regex pattern to use for matching against events in the Define the regex field. Enter sample data in the Add sample data field to verify that your regex pattern is valid. Sensitive Data Scanner supports Perl Compatible Regular Expressions (PCRE), but the following patterns are not supported:
- Backreferences and capturing sub-expressions (lookarounds)
- Arbitrary zero-width assertions
- Subroutine references and recursive patterns
- Conditional patterns
- Backtracking control verbs
- The \C “single-byte” directive (which breaks UTF-8 sequences)
- The \R newline match
- The \K start of match reset directive
- Callouts and embedded code
- Atomic grouping and possessive quantifiers
For Create keyword dictionary, add keywords to refine detection accuracy when matching regex conditions. For example, if you are scanning for a sixteen-digit Visa credit card number, you can add keywords like visa, credit, and card. You can also require that these keywords be within a specified number of characters of a match. By default, keywords must be within 30 characters before a matched value.
In the Define rule target and action section, select if you want to scan the Entire Event, Specific Attributes, or Exclude Attributes in the dropdown menu.
- If you are scanning the entire event, you can optionally exclude specific attributes from getting scanned. Use path notation (outer_key.inner_key) to access nested keys. For specified attributes with nested data, all nested data is excluded.
- If you are scanning specific attributes, specify which attributes you want to scan. Use path notation (outer_key.inner_key) to access nested keys. For specified attributes with nested data, all nested data is scanned.
For Define actions on match, select the action you want to take for the matched information. Note: Redaction, partial redaction, and hashing are all irreversible actions.
- Redact: Replaces all matching values with the text you specify in the Replacement text field.
- Partially Redact: Replaces a specified portion of all matched data. In the Redact section, specify the number of characters you want to redact and which part of the matched data to redact.
- Hash: Replaces all matched data with a unique identifier. The UTF-8 bytes of the match is hashed with the 64-bit fingerprint of FarmHash.
Optionally, click Add Field to add tags you want to associate with the matched events.
Add a name for the scanning rule.
Optionally, add a description for the rule.
Click Add Rule.

Path notation example

For the following message structure:

{
    "outer_key": {
        "inner_key": "inner_value",
        "a": {
            "double_inner_key": "double_inner_value",
            "b": "b value"
        },
        "c": "c value"
    },
    "d": "d value"
}

Use outer_key.inner_key to refer to the key with the value inner_value.
Use outer_key.inner_key.double_inner_key to refer to the key with the value double_inner_value.

This processor splits nested arrays into distinct events so that you can query, filter, alert, and visualize data within an array. The arrays need to already be parsed. For example, the processor can process [item_1, item_2], but cannot process "[item_1, item2]". The items in the array can be JSON objects, strings, integers, floats, or Booleans. All unmodified fields are added to the child events. For example, if you are sending the following items to the Observability Pipelines Worker:

{
    "host": "my-host",
    "env": "prod",
    "batched_items": [item_1, item_2]
}

Use the Split Array processor to send each item in batched_items as a separate event:

{
    "host": "my-host",
    "env": "prod",
    "batched_items": item_1
}

{
    "host": "my-host",
    "env": "prod",
    "batched_items": item_2
}

See the split array example for a more detailed example.

To set up this processor:

Click Manage arrays to split to add an array to split or edit an existing array to split. This opens a side panel.

If you have not created any arrays yet, enter the array parameters as described in the Add a new array section below.
If you have already created arrays, click on the array’s row in the table to edit or delete it. Use the search bar to find a specific array, and then select the array to edit or delete it. Click Add Array to Split to add a new array.

Add a new array

Define a filter query. Only logs that match the specified filter query are processed. All logs, regardless of whether they match the filter query, are sent to the next step in the pipeline.
Enter the path to the array field. Use the path notation <OUTER_FIELD>.<INNER_FIELD> to match subfields. See the Path notation example below.
Click Save.

Split array example

This is an example event:

{
    "ddtags": ["tag1", "tag2"],
    "host": "my-host",
    "env": "prod",
    "message": {
        "isMessage": true,
        "myfield" : {
            "timestamp":14500000,
            "firstarray":["one", 2]
        },
    },
    "secondarray": [
    {
        "some":"json",
        "Object":"works"
    }, 44]
}

If the processor is splitting the arrays "message.myfield.firstarray" and "secondarray", it outputs child events that are identical to the parent event, except for the values of "message.myfield.firstarray" and "secondarray", which becomes a single item from their respective original array. Each child event is a unique combination of items from the two arrays, so four child events (2 items * 2 items = 4 combinations) are created in this example.

{
    "ddtags": ["tag1", "tag2"],
    "host": "my-host",
    "env": "prod",
    "message": {
        "isMessage": true,
        "myfield" : {"timestamp":14500000, "firstarray":"one"},
    },
    "secondarray": {
        "some":"json",
        "Object":"works"
    }
}

{
    "ddtags": ["tag1", "tag2"],
    "host": "my-host",
    "env": "prod",
    "message": {
        "isMessage": true,
        "myfield" : {"timestamp":14500000, "firstarray":"one"},
        },
    "secondarray": 44
}

{
    "ddtags": ["tag1", "tag2"],
    "host": "my-host",
    "env": "prod",
    "message": {
        "isMessage": true,
        "myfield" : {"timestamp":14500000, "firstarray":2},
        },
    "secondarray": {
            "some":"json",
            "object":"works"
        }
}

{
    "ddtags": ["tag1", "tag2"],
    "host": "my-host",
    "env": "prod",
    "message": {
        "isMessage": true,
        "myfield" : {"timestamp":14500000, "firstarray":2},
        },
    "secondarray": 44
}

Path notation example

For the following message structure:

{
    "outer_key": {
        "inner_key": "inner_value",
        "a": {
            "double_inner_key": "double_inner_value",
            "b": "b value"
        },
        "c": "c value"
    },
    "d": "d value"
}

Use outer_key.inner_key to refer to the key with the value inner_value.
Use outer_key.inner_key.double_inner_key to refer to the key with the value double_inner_value.

For logs coming from the Datadog Agent, use this processor to exclude or include specific tags in the Datadog tags (ddtags) array. Tags that are excluded or not included are dropped and may reduce your outbound log volume.

To set up the processor:

Define a filter query. Only matching logs are processed by this processor, but all logs continue to the next step in the pipeline.
Optionally, input a Datadog tags array for the Configure tags section. The supported formats are ["key:value", "key"]. See Define Tags for more information about the key:value format.
In the Configure tags section, choose whether to Exclude tags or Include tags. If you provided a tag array in the previous step, select the tag keys you want to configure. You can also manually add tag keys. Note: You can select up to 100 tags.

Use this processor to set a limit on the number of logs sent within a specific time window. For example, you can set a limit so that only 100 logs are sent per second. Setting a rate limit can help you catch any spikes in log ingestion and prevent unexpected billing costs.

To set up the processor:

Define a filter query. Only logs that match the specified filter query are processed. All matched logs get throttled. Logs that are sent within the throttle limit and logs that do not match the filter are sent to the next step. Logs sent after the throttle limit has been reached, are dropped.
Set the throttling rate. This is the number of events allowed for a given bucket during the set time window. Note: This rate limit is applied on a per-worker level. If you scale the number of workers up or down, you may want to adjust the processor rate limit accordingly. You can update the rate limit programmatically using the Observability Pipelines API.
Set the time window.
Optionally, click Add Field if you want to group by a field.

Add another set of processors and destinations

Click the plus sign (+) to the left of the processors to add another set of processors and destinations to the source. See Add additional destinations on adding additional destinations to the processor group.

To delete a processor group, you need to delete all destinations linked to that processor group. When the last destination is deleted, the processor group is removed with it.

Install the Observability Pipelines Worker

Select your platform in the Choose your installation platform dropdown menu.
Enter the HTTP/S server address, such as 0.0.0.0:9997. The Observability Pipelines Worker listens to this socket address for your HTTP client logs.
Provide the environment variables for each of your selected destinations. See the prerequisites for more information.
1. Enter the Amazon OpenSearch authentication username.
2. Enter the Amazon OpenSearch authentication password.
3. Enter the Amazon OpenSearch endpoint URL. For example, http://<hostname.IP>:9200.
There are no environment variables to configure for the Amazon Security Lake destination.
Enter the Google Chronicle endpoint URL. For example, https://chronicle.googleapis.com.
1. Enter the CrowdStrike HEC ingestion URL.
2. Enter the CrowdStrike HEC API key.
Datadog ログ管理で構成する環境変数はありません。
For the Datadog Archives destination, follow the instructions for the cloud provider you are using to archive your logs.
Amazon S3
There are no environment variables to configure.
Google Cloud Storage
There are no environment variables to configure.
Azure Storage
Enter the Azure connection string you created earlier. The connection string gives the Worker access to your Azure Storage bucket.
To get the connection string:
Navigate to Azure Storage accounts.
Click Access keys under Security and networking in the left navigation menu.
Copy the connection string for the storage account and paste it into the Azure connection string field on the Observability Pipelines Worker installation page.
1. Enter the Elasticsearch authentication username.
2. Enter the Elasticsearch authentication password.
3. Enter the Elasticsearch endpoint URL. For example, http://CLUSTER_ID.LOCAL_HOST_IP.ip.es.io:9200.
1. Enter the data collection endpoint (DCE).
2. Enter the client secret.
1. Enter your New Relic account ID.
2. Enter your New Relic license key.
1. Enter the OpenSearch authentication username.
2. Enter the OpenSearch authentication password.
3. Enter the OpenSearch endpoint URL. For example, http://<hostname.IP>:9200.
Enter your SentinelOne write access token. To find your write access token:
1. Log into the S1 console.
2. Navigate to the Singularity Data Lake (SDL) API Keys page. To access it from the console, click Visibility on the left menu to go to SDL. Click on your username and then API Keys.
3. Copy the Logs Access write key and paste it into the SentinelOne Write Access Token field on the Install Observability Pipelines Worker page.
After you’ve installed the Observability Pipelines Worker and finished setting up the pipeline, see View logs in a SentinelOne cluster for instructions on how to see the logs you sent from Observability Pipelines to the SentinelOne destination.
Enter the socket destination address, such as 92.12.333.224:5000 or https://somehost:5000. The address must include a port.
Splunk HEC トークンと Splunk インスタンスのベース URL を入力してください。詳細については、前提条件を参照してください。
Worker は HEC トークンを Splunk のコレクションエンドポイントに渡します。Observability Pipelines Worker がログを処理した後、指定された Splunk インスタンスの URL にログを送信します。
注: Splunk HEC の送信先は、Splunk HEC の送信先で出力形式を JSON または raw に設定しているかどうかにかかわらず、すべてのログを /services/collector/event エンドポイントに転送します。
Sumo Logic HTTP コレクターの URLを入力します。詳細は、前提条件を参照してください。
Enter the rsyslog or syslog-ng endpoint URL. For example, 127.0.0.1:9997. The Observability Pipelines Worker sends logs to this address and port.

Follow the instructions for your environment to install the Worker.

Select API key をクリックして、使用する Datadog API キーを選択します。
UI で提供されるコマンドを実行して Worker をインストールします。コマンドには、先ほど入力した環境変数が自動的に入力されます。
```
docker run -i -e DD_API_KEY=<DATADOG_API_KEY> \
    -e DD_OP_PIPELINE_ID=<PIPELINE_ID> \
    -e DD_SITE=<DATADOG_SITE> \
    -e <SOURCE_ENV_VARIABLE> \
    -e <DESTINATION_ENV_VARIABLE> \
    -p 8088:8088 \
    datadog/observability-pipelines-worker run
```
注: デフォルトでは、docker run コマンドは Worker がリッスンしているのと同じポートを公開します。ワーカーのコンテナポートを Docker ホストの別のポートにマッピングしたい場合は、コマンドで -p | --publish オプションを使用します。
```
-p 8282:8088 datadog/observability-pipelines-worker run
```
Observability Pipelines のインストールページに戻り、Deploy をクリックします。

パイプラインの構成を変更したい場合は、既存のパイプラインの更新を参照してください。

The Observability Pipelines Worker supports all major Kubernetes distributions, such as:

Amazon Elastic Kubernetes Service (EKS)
Azure Kubernetes Service (AKS)
Google Kubernetes Engine (GKE)
Red Hat Openshift
Rancher

Download the Helm chart values file. See the full list of configuration options available.
- If you are not using a managed service, see Self-hosted and self-managed Kubernetes clusters before continuing to the next step.
Click Select API key to choose the Datadog API key you want to use.
Add the Datadog chart repository to Helm:
```
helm repo add datadog https://helm.datadoghq.com
```
If you already have the Datadog chart repository, run the following command to make sure it is up to date:
```
helm repo update
```
Run the command provided in the UI to install the Worker. The command is automatically populated with the environment variables you entered earlier.
```
helm upgrade --install opw \
-f values.yaml \
--set datadog.apiKey=<DATADOG_API_KEY> \
--set datadog.pipelineId=<PIPELINE_ID> \
--set <SOURCE_ENV_VARIABLES> \
--set <DESTINATION_ENV_VARIABLES> \
--set service.ports[0].protocol=TCP,service.ports[0].port=<SERVICE_PORT>,service.ports[0].targetPort=<TARGET_PORT> \
datadog/observability-pipelines-worker
```
Note: By default, the Kubernetes Service maps incoming port <SERVICE_PORT> to the port the Worker is listening on (<TARGET_PORT>). If you want to map the Worker’s pod port to a different incoming port of the Kubernetes Service, use the following service.ports[0].port and service.ports[0].targetPort values in the command:
```
--set service.ports[0].protocol=TCP,service.ports[0].port=8088,service.ports[0].targetPort=8282
```
Navigate back to the Observability Pipelines installation page and click Deploy.

See Update Existing Pipelines if you want to make changes to your pipeline’s configuration.

Self-hosted and self-managed Kubernetes clusters

If you are running a self-hosted and self-managed Kubernetes cluster, and defined zones with node labels using topology.kubernetes.io/zone, then you can use the Helm chart values file as is. However, if you are not using the label topology.kubernetes.io/zone, you need to update the topologyKey in the values.yaml file to match the key you are using. Or if you run your Kubernetes install without zones, remove the entire topology.kubernetes.io/zone section.

Click Select API key to choose the Datadog API key you want to use.
Run the one-step command provided in the UI to install the Worker.
Note: The environment variables used by the Worker in /etc/default/observability-pipelines-worker are not updated on subsequent runs of the install script. If changes are needed, update the file manually and restart the Worker.

If you prefer not to use the one-line installation script, follow these step-by-step instructions:

Set up APT transport for downloading using HTTPS:

sudo apt-get update
sudo apt-get install apt-transport-https curl gnupg

Run the following commands to set up the Datadog deb repo on your system and create a Datadog archive keyring:

sudo sh -c "echo 'deb [signed-by=/usr/share/keyrings/datadog-archive-keyring.gpg] https://apt.datadoghq.com/ stable observability-pipelines-worker-2' > /etc/apt/sources.list.d/datadog-observability-pipelines-worker.list"
sudo touch /usr/share/keyrings/datadog-archive-keyring.gpg
sudo chmod a+r /usr/share/keyrings/datadog-archive-keyring.gpg
curl https://keys.datadoghq.com/DATADOG_APT_KEY_CURRENT.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
curl https://keys.datadoghq.com/DATADOG_APT_KEY_06462314.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
curl https://keys.datadoghq.com/DATADOG_APT_KEY_F14F620E.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
curl https://keys.datadoghq.com/DATADOG_APT_KEY_C0962C7D.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch

Run the following commands to update your local apt repo and install the Worker:

sudo apt-get update
sudo apt-get install observability-pipelines-worker datadog-signing-keys

Add your keys, site (for example, datadoghq.com for US1), source, and destination environment variables to the Worker’s environment file:

sudo cat &lt;<EOF > /etc/default/observability-pipelines-worker
DD_API_KEY=<DATADOG_API_KEY>
DD_OP_PIPELINE_ID=<PIPELINE_ID>
DD_SITE=<DATADOG_SITE>
<SOURCE_ENV_VARIABLES>
<DESTINATION_ENV_VARIABLES>
EOF

Start the worker:

sudo systemctl restart observability-pipelines-worker

See Update Existing Pipelines if you want to make changes to your pipeline’s configuration.

For RHEL and CentOS, the Observability Pipelines Worker supports versions 8.0 or later.

Click Select API key to choose the Datadog API key you want to use.
Run the one-step command provided in the UI to install the Worker.
Note: The environment variables used by the Worker in /etc/default/observability-pipelines-worker are not updated on subsequent runs of the install script. If changes are needed, update the file manually and restart the Worker.

If you prefer not to use the one-line installation script, follow these step-by-step instructions:

Set up the Datadog rpm repo on your system with the below command. Note: If you are running RHEL 8.1 or CentOS 8.1, use repo_gpgcheck=0 instead of repo_gpgcheck=1 in the configuration below.

cat &lt;<EOF > /etc/yum.repos.d/datadog-observability-pipelines-worker.repo
[observability-pipelines-worker]
name = Observability Pipelines Worker
baseurl = https://yum.datadoghq.com/stable/observability-pipelines-worker-2/\$basearch/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://keys.datadoghq.com/DATADOG_RPM_KEY_CURRENT.public
    https://keys.datadoghq.com/DATADOG_RPM_KEY_B01082D3.public
EOF

Update your packages and install the Worker:

sudo yum makecache
sudo yum install observability-pipelines-worker

Add your keys, site (for example, datadoghq.com for US1), source, and destination environment variables to the Worker’s environment file:

sudo cat &lt;&lt;-EOF > /etc/default/observability-pipelines-worker
DD_API_KEY=<API_KEY>
DD_OP_PIPELINE_ID=<PIPELINE_ID>
DD_SITE=<SITE>
<SOURCE_ENV_VARIABLES>
<DESTINATION_ENV_VARIABLES>
EOF

Start the worker:

sudo systemctl restart observability-pipelines-worker

Navigate back to the Observability Pipelines installation page and click Deploy.

See Update Existing Pipelines if you want to make changes to your pipeline’s configuration.

ドロップダウンのオプションを 1 つ選択し、パイプラインで予想されるログの量を入力します。

オプション	説明
Unsure	ログの量を予想できない場合、または Worker をテストしたい場合は、このオプションを使用します。このオプションは、最大 2 つの汎用 `t4g.large` インスタンスで EC2 オートスケーリンググループをプロビジョニングします。
1-5 TB/day	このオプションは、最大 2 つのコンピュート最適化インスタンス `c6g.large` で EC2 オートスケーリンググループをプロビジョニングします。
5-10 TB/day	このオプションは、最低 2 つ、最大 5 つのコンピュート最適化インスタンス `c6g.large` で EC2 オートスケーリンググループをプロビジョニングします。
>10 TB/day	Datadog は大規模な本番デプロイでこのオプションを推奨しています。このオプションは、最低 2 つ、最大 10 個のコンピュート最適化インスタンス `c6g.xlarge` で EC2 オートスケーリンググループをプロビジョニングします。

注: その他のパラメーターは、すべて Worker デプロイメントに適したデフォルト値に設定されていますが、スタックを作成する前に AWS コンソールで必要に応じてユースケースに合わせて調整できます。

Worker のインストールに使用する AWS リージョンを選択します。
Select API key をクリックして、使用する Datadog API キーを選択します。
Launch CloudFormation Template をクリックして AWS コンソールに移動し、スタックの構成を確認してから起動します。CloudFormation パラメーターが想定通りであることを確認してください。
Worker のインストールに使用する VPC とサブネットを選択します。
IAM の必要な権限のチェックボックスを見直して確認します。Submit をクリックしてスタックを作成します。ここでは、CloudFormation がインストールを処理し、Worker インスタンスが起動され、必要なソフトウェアがダウンロードされ、Worker が自動的に開始します。
Observability Pipelines のインストールページに戻り、Deploy をクリックします。

パイプラインの構成を変更したい場合は、既存のパイプラインの更新を参照してください。

Log Enrichment for HTTP Server

Overview

Prerequisites

Set up Observability Pipelines

Set up the source

Set up the destinations

Prerequisites

Set up the destination

Amazon S3

Example destination and log archive setup

Google Cloud Storage

Azure Storage

Prerequisites

Set up the destination in Observability Pipelines

Add additional destinations

Set up processors

Filter query syntax

プロセッサを追加

Blocked environment variables

Allowlist

Add a function

パス表記の例

フィールドを追加

フィールドを削除

フィールドをリネーム

パス表記例

Enrichment file example

Add a metric

メトリクスタイプ

カウント メトリクスの例

ディストリビューション メトリクスの例

Path notation example

Always use text key example

Text key example

Attribute prefix example

例

パーティションの例

「欠落時に無視」オプションの例

オーバーライドの例

Merge strategies

Library mapping

Add a mapping

Library mappings

Custom mapping

Filter query syntax

Group-by example

Add rules from the library

Path notation example

Add additional keywords

Add a custom rule

Path notation example

Add a new array

Split array example

Path notation example

Add another set of processors and destinations

Install the Observability Pipelines Worker

Amazon S3

Google Cloud Storage

Azure Storage

Self-hosted and self-managed Kubernetes clusters

カウントメトリクスの例

ディストリビューションメトリクスの例