Deduplicate Processor

Docs > Observability Pipelines > Processors > Deduplicate Processor

Cette page n'est pas encore disponible en français, sa traduction est en cours.
Si vous avez des questions ou des retours sur notre projet de traduction actuel, n'hésitez pas à nous contacter.

Disponible pour:

Logs

Overview

The Deduplicate processor removes copies of data to reduce volume and noise. It caches messages and compares your incoming logs traffic against the cached messages. For example, this processor can be used to keep only unique warning logs in the case where multiple identical warning logs are sent in succession.

Setup

To set up the Deduplicate processor:

Define a filter query. Only logs that match the specified filter query are processed. Deduped logs and logs that do not match the filter query are sent to the next step in the pipeline. See Search Syntax for more information.
In the Type of deduplication dropdown menu, select whether you want to Match on or Ignore the fields specified below.
- If Match is selected, then after a log passes through, future logs that have the same values for all of the fields you specify below are removed.
- If Ignore is selected, then after a log passes through, future logs that have the same values for all of their fields, except the ones you specify below, are removed.
Enter the fields you want to match on, or ignore. At least one field is required, and you can specify a maximum of three fields.
- Use the path notation <OUTER_FIELD>.<INNER_FIELD> to match subfields. See the Path notation example below.
Click Add field to add additional fields you want to filter on.

Optional settings

Cache size

The default cache size is 5,000 messages (recommended). The cached messages are kept in memory to determine if the incoming messages are duplicates. You can increase the cache size to fit your needs.

Notes:

Increasing the cache size increases memory usage.
The cache is backed by a LRU cache, where the LRU cache size is the same as the configured cache size.
Since the cache is not shared between Workers, only duplicated events processed by the same Worker are dropped.

Path notation example

For the following message structure:

{
    "outer_key": {
        "inner_key": "inner_value",
        "a": {
            "double_inner_key": "double_inner_value",
            "b": "b value"
        },
        "c": "c value"
    },
    "d": "d value"
}

Use outer_key.inner_key to refer to the key with the value inner_value.
Use outer_key.inner_key.double_inner_key to refer to the key with the value double_inner_value.