---
title: Deduplicate Processor
description: Datadog, the leading service for cloud-scale monitoring.
breadcrumbs: Docs > Observability Pipelines > Processors > Deduplicate Processor
---

# Deduplicate Processor

{% callout %}
# Important note for users on the following Datadog sites: app.ddog-gov.com

{% alert level="danger" %}
This product is not supported for your selected [Datadog site](https://docs.datadoghq.com/getting_started/site). ().
{% /alert %}

{% /callout %}
Available for:
{% icon name="icon-logs" /%}
 Logs 
## Overview{% #overview %}

The Deduplicate processor removes copies of data to reduce volume and noise. It caches messages and compares your incoming logs traffic against the cached messages. For example, this processor can be used to keep only unique warning logs in the case where multiple identical warning logs are sent in succession.

## Setup{% #setup %}

To set up the Deduplicate processor:

1. Define a **filter query**. Only logs that match the specified filter query are processed. Deduped logs and logs that do not match the filter query are sent to the next step in the pipeline. See [Search Syntax](https://docs.datadoghq.com/observability_pipelines/search_syntax/logs/) for more information.
1. In the **Type of deduplication** dropdown menu, select whether you want to `Match` on or `Ignore` the fields specified below.
   - If `Match` is selected, then after a log passes through, future logs that have the same values for all of the fields you specify below are removed.
   - If `Ignore` is selected, then after a log passes through, future logs that have the same values for all of their fields, *except* the ones you specify below, are removed.
1. Enter the fields you want to match on, or ignore. At least one field is required, and you can specify a maximum of three fields.
   - Use the path notation `<OUTER_FIELD>.<INNER_FIELD>` to match subfields. See the Path notation example below.
1. Click **Add field** to add additional fields you want to filter on.

### Optional settings{% #optional-settings %}

#### Cache size{% #cache-size %}

The default cache size is 5,000 messages (recommended). The cached messages are kept in memory to determine if the incoming messages are duplicates. You can increase the cache size to fit your needs.

**Notes**:

- Increasing the cache size increases memory usage.
- The cache is backed by a LRU cache, where the LRU cache size is the same as the configured cache size.
- Since the cache is not shared between Workers, only duplicated events processed by the same Worker are dropped.

### Path notation example{% #path-notation-example %}

For the following message structure:

```json
{
    "outer_key": {
        "inner_key": "inner_value",
        "a": {
            "double_inner_key": "double_inner_value",
            "b": "b value"
        },
        "c": "c value"
    },
    "d": "d value"
}
```

- Use `outer_key.inner_key` to refer to the key with the value `inner_value`.
- Use `outer_key.inner_key.double_inner_key` to refer to the key with the value `double_inner_value`.
