Deduplicate Processor

Docs > Observability Pipelines（観測データの制御） > プロセッサー > Deduplicate Processor

次で利用可能:

ログ

The deduplicate processor removes copies of data to reduce volume and noise. It caches 5,000 messages at a time and compares your incoming logs traffic against the cached messages. For example, this processor can be used to keep only unique warning logs in the case where multiple identical warning logs are sent in succession.

To set up the deduplicate processor:

Define a filter query. Only logs that match the specified filter query are processed. Deduped logs and logs that do not match the filter query are sent to the next step in the pipeline.
In the Type of deduplication dropdown menu, select whether you want to Match on or Ignore the fields specified below.
- If Match is selected, then after a log passes through, future logs that have the same values for all of the fields you specify below are removed.
- If Ignore is selected, then after a log passes through, future logs that have the same values for all of their fields, except the ones you specify below, are removed.
Enter the fields you want to match on, or ignore. At least one field is required, and you can specify a maximum of three fields.
- Use the path notation <OUTER_FIELD>.<INNER_FIELD> to match subfields. See the Path notation example below.
Click Add field to add additional fields you want to filter on.

Path notation example

For the following message structure:

{
    "outer_key": {
        "inner_key": "inner_value",
        "a": {
            "double_inner_key": "double_inner_value",
            "b": "b value"
        },
        "c": "c value"
    },
    "d": "d value"
}

Use outer_key.inner_key to refer to the key with the value inner_value.
Use outer_key.inner_key.double_inner_key to refer to the key with the value double_inner_value.

Filter query syntax

Each processor has a corresponding filter query in their fields. Processors only process logs that match their filter query. And for all processors except the Filter processor, logs that do not match the query are sent to the next step of the pipeline. For the Filter processor, logs that do not match the query are dropped.

The following are logs filter query examples:

NOT (status:debug): This filters for logs that do not have the status DEBUG.
status:ok service:flask-web-app: This filters for all logs with the status OK from your flask-web-app service.
- This query can also be written as: status:ok AND service:flask-web-app.
host:COMP-A9JNGYK OR host:COMP-J58KAS: This filter query only matches logs from the labeled hosts.
user.status:inactive: This filters for logs with the status inactive nested under the user attribute.
http.status:[200 TO 299] or http.status:{300 TO 399}: These two filters represent the syntax to query a range for http.status. Ranges can be used across any attribute.

Learn more about writing filter queries in Observability Pipelines Search Syntax.

Deduplicate Processor

Path notation example

Filter query syntax

How can I help you today?