This processor filters for logs that match the specified filter query and drops all non-matching logs. If a log is dropped at this processor, then none of the processors below this one receives that log. This processor can filter out unnecessary logs, such as debug or warning logs.
To set up the filter processor:
- Define a filter query. The query you specify filters for and passes on only logs that match it, dropping all other logs.
The remap processor can add, drop, or rename fields within your individual log data. Use this processor to enrich your logs with additional context, remove low-value fields to reduce volume, and standardize naming across important attributes. Select add field, drop field, or rename field in the dropdown menu to get started.
Add field
Use add field to append a new key-value field to your log.
To set up the add field processor:
- Define a filter query. Only logs that match the specified filter query are processed. All logs, regardless of whether they do or do not match the filter query, are sent to the next step in the pipeline.
- Enter the field and value you want to add. To specify a nested field for your key, use the path notation:
<OUTER_FIELD>.<INNER_FIELD>
. All values are stored as strings.
Note: If the field you want to add already exists, the Worker throws an error and the existing field remains unchanged.
Drop field
Use drop field to drop a field from logging data that matches the filter you specify below. It can delete objects, so you can use the processor to drop nested keys.
To set up the drop field processor:
- Define a filter query. Only logs that match the specified filter query are processed. All logs, regardless of whether they do or do not match the filter query, are sent to the next step in the pipeline.
- Enter the key of the field you want to drop. To specify a nested field for your specified key, use the path notation:
<OUTER_FIELD>.<INNER_FIELD>
.
Note: If your specified key does not exist, your log will be unimpacted.
Rename field
Use rename field to rename a field within your log.
To set up the rename field processor:
- Define a filter query. Only logs that match the specified filter query are processed. All logs, regardless of whether they do or do not match the filter query, are sent to the next step in the pipeline.
- Enter the name of the field you want to rename in the Source field. To specify a nested field for your key, use the path notation:
<OUTER_FIELD>.<INNER_FIELD>
. Once renamed, your original field is deleted unless you enable the Preserve source tag checkbox described below.
Note: If the source key you specify doesn’t exist, a default null
value is applied to your target. - In the Target field, enter the name you want the source field to be renamed to. To specify a nested field for your specified key, use the path notation:
<OUTER_FIELD>.<INNER_FIELD>
.
Note: If the target field you specify already exists, the Worker throws an error and does not overwrite the existing target field. - Optionally, check the Preserve source tag box if you want to retain the original source field and duplicate the information from your source key to your specified target key. If this box is not checked, the source key is dropped after it is renamed.
Path notation example
For the following message structure, use outer_key.inner_key.double_inner_key
to refer to the key with the value double_inner_value
.
{
"outer_key": {
"inner_key": "inner_value",
"a": {
"double_inner_key": "double_inner_value",
"b": "b value"
},
"c": "c value"
},
"d": "d value"
}
This processor samples your logging traffic for a representative subset at the rate that you define, dropping the remaining logs. As an example, you can use this processor to sample 20% of logs from a noisy non-critical service.
The sampling only applies to logs that match your filter query and does not impact other logs. If a log is dropped at this processor, none of the processors below receives that log.
To set up the sample processor:
- Define a filter query. Only logs that match the specified filter query are sampled at the specified retention rate below. The sampled logs and the logs that do not match the filter query are sent to the next step in the pipeline.
- Set the retain field with your desired sampling rate expressed as a percentage. For example, entering
2
means 2% of logs are retained out of all the logs that match the filter query.
Este procesador analiza logs mediante las reglas de parseo grok disponibles para un conjunto de orígenes. Las reglas se aplican automáticamente a logs basándose en el origen del log. Por lo tanto, los logs deben tener un campo source
con el nombre del origen. Si este campo no se añade cuando el log se envía al worker de pipelines de observabilidad, puedes utilizar el procesador Add field (Añadir campo) para añadirlo.
Si el campo source
de un log coincide con uno de los conjuntos de reglas de parseo grok, el campo message
del log se comprueba con esas reglas. Si una regla coincide, los datos analizados resultantes se añaden al campo message
como un objeto JSON, sobrescribiendo el message
original.
Si no hay un campo source
en el log, o ninguna regla coincide con el log message
, entonces no se realizan cambios en el log y se envía al siguiente paso del pipeline.
Para configurar el analizador sintáctico grok, define un filtro de consulta. Sólo se procesan los logs que coincidan con la [consulta de filtro] especificada (#filter-query-syntax). Todos los logs, independientemente de si coinciden o no con la consulta de filtro, se envían al siguiente paso del pipeline.
Para probar muestras de log para las reglas predefinidas:
- Haz clic en el botón Preview Library Rules (Previsualizar reglas de biblioteca).
- Busca o selecciona un origen en el menú desplegable.
- Introduce una muestra de log para probar las reglas de parseo para ese origen.
Para añadir una regla personalizada de parseo:
- Haz clic en Add Custom Rule (Añadir regla personalizada).
- Si deseas clonar una regla de biblioteca, selecciona Clone library rule (Clonar regla de biblioteca) y, a continuación, el origen de biblioteca en el menú desplegable.
- Si deseas crear una regla personalizada, selecciona Custom (Personalizada) y, a continuación, introduce el
source
. Las reglas de parseo se aplican a logs con ese source
. - Introduce muestras de log para probar las reglas de parseo.
- Introduce las reglas para el parseo de los logs. Consulta Parseo para obtener más información sobre la escritura de reglas de parseo.
Nota: Los filtros url
, useragent
y csv
no están disponibles. - Haz clic en Advanced Settings (Configuración avanzada) si deseas añadir reglas auxiliares. Consulta Uso de reglas auxiliares para factorizar varias reglas de parseo para obtener más información.
- Haz clic en Add Rule (Añadir regla).
The quota processor measures the logging traffic for logs that match the filter you specify. When the configured daily quota is met inside the 24-hour rolling window, the processor can either drop additional logs or send an alert using a Datadog monitor. You can configure the processor to track the total volume or the total number of events. The pipeline uses the name of the quota to identify the quota across multiple Remote Configuration deployments of the Worker.
As an example, you can configure this processor to drop new logs or trigger an alert without dropping logs after the processor has received 10 million events from a certain service in the last 24 hours.
To set up the quota processor:
- Enter a name for the quota processor.
- Define a filter query. Only logs that match the specified filter query are counted towards the daily limit.
- Logs that match the quota filter and are within the daily quota are sent to the next step in the pipeline.
- Logs that do not match the quota filter are sent to the next step of the pipeline.
- In the Unit for quota dropdown menu, select if you want to measure the quota by the number of
Events
or by the Volume
in bytes. - Set the daily quota limit and select the unit of magnitude for your desired quota.
- Check the Drop events checkbox if you want to drop all events when your quota is met. Leave it unchecked if you plan to set up a monitor that sends an alert when the quota is met.
- If logs that match the quota filter are received after the daily quota has been met and the Drop events option is selected, then those logs are dropped. In this case, only logs that did not match the filter query are sent to the next step in the pipeline.
- If logs that match the quota filter are received after the daily quota has been met and the Drop events option is not selected, then those logs and the logs that did not match the filter query are sent to the next step in the pipeline.
- Optional: Click Add Field if you want to set a quota on a specific service or region field.
a. Enter the field name you want to partition by. See the Partition example for more information.
i. Select the Ignore when missing if you want the quota applied only to events that match the partition. See the Ignore when missing example for more information.
ii. Optional: Click Overrides if you want to set different quotas for the partitioned field.
- Click Download as CSV for an example of how to structure the CSV.
- Drag and drop your overrides CSV to upload it. You can also click Browse to select the file to upload it. See the Overrides example for more information.
b. Click Add Field if you want to add another partition.
Examples
Partition example
Use Partition by if you want to set a quota on a specific service or region. For example, if you want to set a quota for 10 events per day and group the events by the service
field, enter service
into the Partition by field.
Example for the “ignore when missing” option
Select Ignore when missing if you want the quota applied only to events that match the partition. For example, if the Worker receives the following set of events:
{"service":"a", "source":"foo", "message": "..."}
{"service":"b", "source":"bar", "message": "..."}
{"service":"b", "message": "..."}
{"source":"redis", "message": "..."}
{"message": "..."}
And the Ignore when missing is selected, then the Worker:
- creates a set for logs with
service:a
and source:foo
- creates a set for logs with
service:b
and source:bar
- ignores the last three events
The quota is applied to the two sets of logs and not to the last three events.
If the Ignore when missing is not selected, the quota is applied to all five events.
Overrides example
If you are partitioning by service
and have two services: a
and b
, you can use overrides to apply different quotas for them. For example, if you want service:a
to have a quota limit of 5,000 bytes and service:b
to have a limit of 50 events, the override rules look like this:
Service | Type | Limit |
---|
a | Bytes | 5,000 |
b | Events | 50 |
The reduce processor groups multiple log events into a single log, based on the fields specified and the merge strategies selected. Logs are grouped at 10-second intervals. After the interval has elapsed for the group, the reduced log for that group is sent to the next step in the pipeline.
To set up the reduce processor:
- Define a filter query. Only logs that match the specified filter query are processed. Reduced logs and logs that do not match the filter query are sent to the next step in the pipeline.
- In the Group By section, enter the field you want to group the logs by.
- Click Add Group by Field to add additional fields.
- In the Merge Strategy section:
- In On Field, enter the name of the field you want to merge the logs on.
- Select the merge strategy in the Apply dropdown menu. This is the strategy used to combine events. See the following Merge strategies section for descriptions of the available strategies.
- Click Add Merge Strategy to add additional strategies.
Merge strategies
These are the available merge strategies for combining log events.
Name | Description |
---|
Array | Appends each value to an array. |
Concat | Concatenates each string value, delimited with a space. |
Concat newline | Concatenates each string value, delimited with a newline. |
Concat raw | Concatenates each string value, without a delimiter. |
Discard | Discards all values except the first value that was received. |
Flat unique | Creates a flattened array of all unique values that were received. |
Longest array | Keeps the longest array that was received. |
Max | Keeps the maximum numeric value that was received. |
Min | Keeps the minimum numeric value that was received. |
Retain | Discards all values except the last value that was received. Works as a way to coalesce by not retaining `null`. |
Shortest array | Keeps the shortest array that was received. |
Sum | Sums all numeric values that were received. |
El procesador de deduplicación elimina copias de datos para reducir el volumen y el ruido. Almacena en caché 5000 mensajes a la vez y compara el tráfico entrante de logs con los mensajes almacenados en caché. Por ejemplo, este procesador puede utilizarse para conservar sólo logs de advertencia únicos en el caso de que se envíen varios logs de advertencia idénticos seguidos.
Para configurar el procesador de deduplicación:
- Define una consulta de filtro. Sólo se procesan los logs que coinciden con la [consulta de filtro] especificada (#filter-query-syntax). Todos los logs deduplicados y los logs que no coinciden con la consulta de filtro se envían al siguiente paso del pipeline.
- En el menú desplegable Type of deduplication (Tipo de deduplicación), selecciona si deseas
Match
en o Ignore
los campos especificados a continuación.- Si se selecciona
Match
, después de que pase un log, se eliminarán los futuros logs que tengan los mismos valores para todos los campos que especifiques a continuación. - Si se selecciona
Ignore
, después de que pase un log, se eliminarán los futuros logs que tengan los mismos valores para todos los campos, excepto los que especifiques a continuación.
- Introduce los campos con los que deseas establecer una correspondencia o ignorarlos. Se requiere al menos un campo, y puedes especificar un máximo de tres campos.
- Utiliza la notación de ruta
<OUTER_FIELD>.<INNER_FIELD>
para hacer coincidir subcampos. Consulta el Ejemplo de notación de ruta más abajo.
- Haz clic en Add field (Añadir campo) para añadir los campos adicionales que desees filtrar.
Ejemplo de notación de ruta
Para la siguiente estructura de mensajes, utiliza outer_key.inner_key.double_inner_key
para referirse a la clave con el valor double_inner_value
.
{
"outer_key": {
"inner_key": "inner_value",
"a": {
"double_inner_key": "double_inner_value",
"b": "b value"
},
"c": "c value"
},
"d": "d value"
}
The Sensitive Data Scanner processor scans logs to detect and redact or hash sensitive information such as PII, PCI, and custom sensitive data. You can pick from our library of predefined rules, or input custom Regex rules to scan for sensitive data.
To set up the sensitive data scanner processor:
- Define a filter query. Only logs that match the specified filter query are scanned and processed. All logs, regardless of whether they do or do not match the filter query, are sent to the next step in the pipeline.
- Click Add Scanning Rule.
- Name your scanning rule.
- In the Select scanning rule type field, select whether you want to create a rule from the library or create a custom rule.
- If you are creating a rule from the library, select the library pattern you want to use.
- If you are creating a custom rule, enter the regex pattern to check against the data.
- In the Scan entire or part of event section, select if you want to scan the Entire Event, Specific Attributes, or Exclude Attributes in the dropdown menu.
- If you selected Specific Attributes, click Add Field and enter the specific attributes you want to scan. You can add up to three fields. Use path notation (
outer_key.inner_key
) to access nested keys. For specified attributes with nested data, all nested data is scanned. - If you selected Exclude Attributes, click Add Field and enter the specific attributes you want to exclude from scanning. You can add up to three fields. Use path notation (
outer_key.inner_key
) to access nested keys. For specified attributes with nested data, all nested data is excluded.
- In the Define action on match section, select the action you want to take for the matched information. Redaction, partial redaction, and hashing are all irreversible actions.
- If you are redacting the information, specify the text to replace the matched data.
- If you are partially redacting the information, specify the number of characters you want to redact and whether to apply the partial redaction to the start or the end of your matched data.
- Note: If you select hashing, the UTF-8 bytes of the match are hashed with the 64-bit fingerprint of FarmHash.
- Optionally, add tags to all events that match the regex, so that you can filter, analyze, and alert on the events.
Este procesador añade un campo con el nombre del host que envió el log. Por ejemplo, hostname: 613e197f3526
. Nota: Si el hostname
ya existe, el worker lanza un error y no sobrescribe el hostname
existente.
Para configurar este procesador:
- Define una consulta de filtro. Sólo se procesan los logs que coinciden con la [consulta de filtro] especificada (#filter-query-syntax). Todos los logs, independientemente de si coinciden o no con la consulta de filtro, se envían al siguiente paso del proceso.
This processor converts the specified field into JSON objects.
To set up this processor:
- Define a filter query. Only logs that match the specified filter query are processed. All logs, regardless of whether they do or do not match the filter query, are sent to the next step in the pipeline.
- Enter the name of the field you want to parse JSON on.
Note: The parsed JSON overwrites what was originally contained in the field.
Use this processor to enrich your logs with information from a reference table, which could be a local file or database.
To set up the enrichment table processor:
- Define a filter query. Only logs that match the specified filter query are processed. All logs, regardless of whether they do or do not match the filter query, are sent to the next step in the pipeline.
- Enter the source attribute of the log. The source attribute’s value is what you want to find in the reference table.
- Enter the target attribute. The target attribute’s value stores, as a JSON object, the information found in the reference table.
- Select the type of reference table you want to use, File or GeoIP.
- For the File type:
- Enter the file path.
- Enter the column name. The column name in the enrichment table is used for matching the source attribute value. See the Enrichment file example.
- For the GeoIP type, enter the GeoIP path.
Enrichment file example
For this example, merchant_id
is used as the source attribute and merchant_info
as the target attribute.
This is the example reference table that the enrichment processor uses:
merch_id | merchant_name | city | state |
---|
803 | Andy’s Ottomans | Boise | Idaho |
536 | Cindy’s Couches | Boulder | Colorado |
235 | Debra’s Benches | Las Vegas | Nevada |
merch_id
is set as the column name the processor uses to find the source attribute’s value. Note: The source attribute’s value does not have to match the column name.
If the enrichment processor receives a log with "merchant_id":"536"
:
- The processor looks for the value
536
in the reference table’s merch_id
column. - After it finds the value, it adds the entire row of information from the reference table to the
merchant_info
attribute as a JSON object:
merchant_info {
"merchant_name":"Cindy's Couches",
"city":"Boulder",
"state":"Colorado"
}