Set Up Observability Pipelines in your Splunk Environment
Observability Pipelines only supports Splunk's HTTP Event Collector (HEC) protocol.
Overview
The Observability Pipelines Worker can collect, process, and route logs and metrics from any source to any destination. Using Datadog, you can build and manage all of your Observability Pipelines Worker deployments at scale.
This guide walks you through deploying the Worker in your common tools cluster and configuring Splunk to send logs through the Worker, to dual-write to Datadog.
Assumptions
- You are using a log collector that is compatible with the Splunk HTTP Event Collector (HEC) protocol.
- You have administrative access to the collectors and the Splunk index where logs will be sent to.
- You have administrative access to the clusters where the Observability Pipelines Worker is going to be deployed.
- You have a common tools or security cluster for your environment to which all other clusters are connected.
Prerequisites
Before installing, make sure you have:
You can generate both of these in Observability Pipelines.
Provider-specific requirements
Ensure that your machine is configured to run Docker.
To run the Worker on your Kubernetes nodes, you need a minimum of two nodes with one CPU and 512MB RAM available. Datadog recommends creating a separate node pool for the Workers, which is also the recommended configuration for production deployments.
The AWS Load Balancer controller is required. To see if it is installed, run the following command and look for aws-load-balancer-controller
in the list:
Datadog recommends using Amazon EKS >= 1.16.
To run the Worker on your Kubernetes nodes, you need a minimum of two nodes with one CPU and 512MB RAM available. Datadog recommends creating a separate node pool for the Workers, which is also the recommended configuration for production deployments.
To run the Worker on your Kubernetes nodes, you need a minimum of two nodes with one CPU and 512MB RAM available. Datadog recommends creating a separate node pool for the Workers, which is also the recommended configuration for production deployments.
There are no provider-specific requirements for APT-based Linux.
There are no provider-specific requirements for RPM-based Linux.
In order to run the Worker in your AWS account, you need administrative access to that account. Collect the following pieces of information to run the Worker instances:
- The VPC ID your instances will run in.
- The subnet IDs your instances will run in.
- The AWS region your VPC is located in.
Setting up the Splunk index
Observability Pipelines supports acknowledgements when you enable the Enable Indexer Acknowledgements setting on the input.
To receive logs from the Observability Pipelines Worker, you must provision a HEC input and HEC token on the index.
- In Splunk, navigate to Settings > Data Inputs.
- Add a new HTTP Event Collector input and assign it a name.
- Select the indexes where you want the logs to be sent.
After you add the input, Splunk creates a token for you. The token is typically in a UUID format. In the sample configurations provided in later sections in this article, add this token to the configuration so that the Observability Pipelines Worker can authenticate itself.
Installing the Observability Pipelines Worker
The Observability Pipelines Worker Docker image is published to Docker Hub here.
Download the sample pipeline configuration file.
Run the following command to start the Observability Pipelines Worker with Docker:
docker run -i -e DD_API_KEY=<API_KEY> \
-e DD_OP_PIPELINE_ID=<PIPELINE_ID> \
-e DD_SITE=<SITE> \
-e SPLUNK_HEC_ENDPOINT=<SPLUNK_URL> \
-e SPLUNK_TOKEN=<SPLUNK_TOKEN> \
-p 8088:8088 \
-v ./pipeline.yaml:/etc/observability-pipelines-worker/pipeline.yaml:ro \
datadog/observability-pipelines-worker run
Replace <API_KEY>
with your Datadog API key, <PIPELINES_ID>
with your Observability Pipelines configuration ID, and <SITE>
with
. Be sure to also update SPLUNK_HEC_ENDPOINT
and SPLUNK_TOKEN
with values that match the Splunk deployment you created in Setting up the Splunk Index. ./pipeline.yaml
must be the relative or absolute path to the configuration you downloaded in Step 1.
Download the Helm chart for AWS EKS.
In the Helm chart, replace datadog.apiKey
and datadog.pipelineId
with their respective values and replace <site>
with
:
datadog:
apiKey: "<datadog_api_key>"
pipelineId: "<observability_pipelines_configuration_id>"
site: "<site>"
Replace the values for SPLUNK_HEC_ENDPOINT
and SPLUNK_HEC_TOKEN
to match your Splunk deployment, including the token you created in Setting up the Splunk Index:
env:
- name: SPLUNK_HEC_ENDPOINT
value: <https://your.splunk.index:8088/>
- name: SPLUNK_TOKEN
value: <a_random_token_usually_a_uuid>
Install the Helm chart in your cluster with the following commands:
helm repo add datadog https://helm.datadoghq.com
helm upgrade --install \
opw datadog/observability-pipelines-worker \
-f aws_eks.yaml
Download the Helm chart for Azure AKS.
In the Helm chart, replace datadog.apiKey
and datadog.pipelineId
with their respective values and replace <site>
with
:
datadog:
apiKey: "<datadog_api_key>"
pipelineId: "<observability_pipelines_configuration_id>"
site: "<site>"
Replace the values for SPLUNK_HEC_ENDPOINT
and SPLUNK_HEC_TOKEN
to match your Splunk deployment, including the token you created in Setting up the Splunk Index:
env:
- name: SPLUNK_HEC_ENDPOINT
value: <https://your.splunk.index:8088/>
- name: SPLUNK_TOKEN
value: <a_random_token_usually_a_uuid>
Install the Helm chart in your cluster with the following commands:
helm repo add datadog https://helm.datadoghq.com
helm upgrade --install \
opw datadog/observability-pipelines-worker \
-f azure_aks.yaml
Download the Helm chart for Google GKE.
In the Helm chart, replace datadog.apiKey
and datadog.pipelineId
with their respective values and replace <site>
with
:
datadog:
apiKey: "<datadog_api_key>"
pipelineId: "<observability_pipelines_configuration_id>"
site: "<site>"
Replace the values for SPLUNK_HEC_ENDPOINT
and SPLUNK_HEC_TOKEN
to match your Splunk deployment, including the token you created in Setting up the Splunk Index:
env:
- name: SPLUNK_HEC_ENDPOINT
value: <https://your.splunk.index:8088/>
- name: SPLUNK_TOKEN
value: <a_random_token_usually_a_uuid>
Install the Helm chart in your cluster with the following commands:
helm repo add datadog https://helm.datadoghq.com
helm upgrade --install \
opw datadog/observability-pipelines-worker \
-f google_gke.yaml
Run the following commands to set up APT to download through HTTPS:
sudo apt-get update
sudo apt-get install apt-transport-https curl gnupg
Run the following commands to set up the Datadog deb
repo on your system and create a Datadog archive keyring:
sudo sh -c "echo 'deb [signed-by=/usr/share/keyrings/datadog-archive-keyring.gpg] https://apt.datadoghq.com/ stable observability-pipelines-worker-1' > /etc/apt/sources.list.d/datadog-observability-pipelines-worker.list"
sudo touch /usr/share/keyrings/datadog-archive-keyring.gpg
sudo chmod a+r /usr/share/keyrings/datadog-archive-keyring.gpg
curl https://keys.datadoghq.com/DATADOG_APT_KEY_CURRENT.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
curl https://keys.datadoghq.com/DATADOG_APT_KEY_F14F620E.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
curl https://keys.datadoghq.com/DATADOG_APT_KEY_C0962C7D.public | sudo gpg --no-default-keyring --keyring /usr/share/keyrings/datadog-archive-keyring.gpg --import --batch
Run the following commands to update your local apt
repo and install the Worker:
sudo apt-get update
sudo apt-get install observability-pipelines-worker datadog-signing-keys
Add your keys, site (
), and Splunk information to the Worker’s environment variables:
sudo cat <<-EOF > /etc/default/observability-pipelines-worker
DD_API_KEY=<API_KEY>
DD_OP_PIPELINE_ID=<PIPELINE_ID>
DD_SITE=<SITE>
SPLUNK_HEC_ENDPOINT=<SPLUNK_URL>
SPLUNK_TOKEN=<SPLUNK_TOKEN>
EOF
Download the sample configuration file to /etc/observability-pipelines-worker/pipeline.yaml
on the host.
Start the worker:
sudo systemctl restart observability-pipelines-worker
Run the following commands to set up the Datadog rpm
repo on your system:
cat <<EOF > /etc/yum.repos.d/datadog-observability-pipelines-worker.repo
[observability-pipelines-worker]
name = Observability Pipelines Worker
baseurl = https://yum.datadoghq.com/stable/observability-pipelines-worker-1/\$basearch/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://keys.datadoghq.com/DATADOG_RPM_KEY_CURRENT.public
https://keys.datadoghq.com/DATADOG_RPM_KEY_FD4BF915.public
EOF
Note: If you are running RHEL 8.1 or CentOS 8.1, use repo_gpgcheck=0
instead of repo_gpgcheck=1
in the configuration above.
Update your packages and install the Worker:
sudo yum makecache
sudo yum install observability-pipelines-worker
Add your keys, site (
), and Splunk information to the Worker’s environment variables:
sudo cat <<-EOF > /etc/default/observability-pipelines-worker
DD_API_KEY=<API_KEY>
DD_OP_PIPELINE_ID=<PIPELINE_ID>
DD_SITE=<SITE>
SPLUNK_HEC_ENDPOINT=<SPLUNK_URL>
SPLUNK_TOKEN=<SPLUNK_TOKEN>
EOF
Download the sample configuration file to /etc/observability-pipelines-worker/pipeline.yaml
on the host.
Start the worker:
sudo systemctl restart observability-pipelines-worker
Setup the Worker module in your existing Terraform using this sample configuration. Update the values in vpc-id
, subnet-ids
, and region
to match your AWS deployment. Update the values in datadog-api-key
and pipeline-id
to match your pipeline.
module "opw" {
source = "git::https://github.com/DataDog/opw-terraform//aws"
vpc-id = "{VPC ID}"
subnet-ids = ["{SUBNET ID 1}", "{SUBNET ID 2}"]
region = "{REGION}"
datadog-api-key = "{DATADOG API KEY}"
pipeline-id = "{OP PIPELINE ID}"
environment = {
"SPLUNK_TOKEN": "<SPLUNK TOKEN>",
}
pipeline-config = <<EOT
sources:
splunk_receiver:
type: splunk_hec
address: 0.0.0.0:8088
valid_tokens:
- $${SPLUNK_TOKEN}
transforms:
## This is a placeholder for your own remap (or other transform)
## steps with tags set up. Datadog recommends these tag assignments.
## They show which data has been moved over to OP and what still needs
## to be moved.
LOGS_YOUR_STEPS:
type: remap
inputs:
- splunk_receiver
source: |
.sender = "observability_pipelines_worker"
.opw_aggregator = get_hostname!()
## This buffer configuration is split into 144GB buffers for both of the Datadog and Splunk sinks.
##
## This should work for the vast majority of OP Worker deployments and should rarely
## need to be adjusted. If you do change it, be sure to update the size the `ebs-drive-size-gb` parameter.
sinks:
datadog_logs:
type: datadog_logs
inputs:
- LOGS_YOUR_STEPS
default_api_key: "$${DD_API_KEY}"
compression: gzip
buffer:
type: disk
max_size: 154618822656
splunk_logs:
type: splunk_hec_logs
inputs:
- LOGS_YOUR_STEPS
endpoint: <SPLUNK HEC ENDPOINT>
default_token: $${SPLUNK_TOKEN}
encoding:
codec: json
buffer:
type: disk
max_size: 154618822656
EOT
}
Load balancing
Production-oriented setup is not included in the Docker instructions. Instead, refer to your company’s standards for load balancing in containerized environments. If you are testing on your local machine, configuring a load balancer is unnecessary.
Use the load balancers provided by your cloud provider.
They adjust based on autoscaling events that the default Helm setup is configured for. The load balancers are internal-facing,
so they are only accessible inside your network.
Use the load balancer URL given to you by Helm when you configure your existing collectors.
NLBs provisioned by the AWS Load Balancer Controller are used.
Cross-availability-zone load balancing
The provided Helm configuration tries to simplify load balancing, but you must take into consideration the potential price implications of cross-AZ traffic. Wherever possible, the samples try to avoid creating situations where multiple cross-AZ hops can happen.
The sample configurations do not enable the cross-zone load balancing feature available in this controller. To enable it, add the following annotation to the service
block:
service.beta.kubernetes.io/aws-load-balancer-attributes: load_balancing.cross_zone.enabled=true
See AWS Load Balancer Controller for more details.
Use the load balancers provided by your cloud provider.
They adjust based on autoscaling events that the default Helm setup is configured for. The load balancers are internal-facing,
so they are only accessible inside your network.
Use the load balancer URL given to you by Helm when you configure your existing collectors.
Cross-availability-zone load balancing
The provided Helm configuration tries to simplify load balancing, but you must take into consideration the potential price implications of cross-AZ traffic. Wherever possible, the samples try to avoid creating situations where multiple cross-AZ hops can happen.
Use the load balancers provided by your cloud provider.
They adjust based on autoscaling events that the default Helm setup is configured for. The load balancers are internal-facing,
so they are only accessible inside your network.
Use the load balancer URL given to you by Helm when you configure your existing collectors.
Cross-availability-zone load balancing
The provided Helm configuration tries to simplify load balancing, but you must take into consideration the potential price implications of cross-AZ traffic. Wherever possible, the samples try to avoid creating situations where multiple cross-AZ hops can happen.
Global Access is enabled by default since that is likely required for use in a shared tools cluster.
No built-in support for load-balancing is provided, given the single-machine nature of the installation. You will need to provision your own load balancers using whatever your company’s standard is.
No built-in support for load-balancing is provided, given the single-machine nature of the installation. You will need to provision your own load balancers using whatever your company’s standard is.
An NLB is provisioned by the Terraform module, and provisioned to point at the instances. Its DNS address is returned in the lb-dns
output in Terraform.
Buffering
Observability Pipelines includes multiple buffering strategies that allow you to increase the resilience of your cluster to downstream faults. The provided sample configurations use disk buffers, the capacities of which are rated for approximately 10 minutes of data at 10Mbps/core for Observability Pipelines deployments. That is often enough time for transient issues to resolve themselves, or for incident responders to decide what needs to be done with the observability data.
By default, the Observability Pipelines Worker’s data directory is set to /var/lib/observability-pipelines-worker
. Make sure that your host machine has a sufficient amount of storage capacity allocated to the container’s mountpoint.
For AWS, Datadog recommends using the io2
EBS drive family. Alternatively, the gp3
drives could also be used.
For Azure AKS, Datadog recommends using the default
(also known as managed-csi
) disks.
For Google GKE, Datadog recommends using the premium-rwo
drive class because it is backed by SSDs. The HDD-backed class, standard-rwo
, might not provide enough write performance for the buffers to be useful.
By default, the Observability Pipelines Worker’s data directory is set to /var/lib/observability-pipelines-worker
- if you are using the sample configuration, you should ensure that this has at least 288GB of space available for buffering.
Where possible, it is recommended to have a separate SSD mounted at that location.
By default, the Observability Pipelines Worker’s data directory is set to /var/lib/observability-pipelines-worker
- if you are using the sample configuration, you should ensure that this has at least 288GB of space available for buffering.
Where possible, it is recommended to have a separate SSD mounted at that location.
By default, a 288GB EBS drive is allocated to each instance, and the sample configuration above is set to use that for buffering.
Connect Splunk forwarders to the Observability Pipelines Worker
After you install and configure the Observability Pipelines Worker to send logs to your Splunk index, you must update your existing collectors to point to the Worker.
You can update most Splunk collectors with the IP/URL of the host (or load balancer) associated with the Observability Pipelines Worker.
For Terraform installs, the lb-dns
output provides the necessary value.
Additionally, you must update the Splunk collector with the HEC token you wish to use for authentication, so it matches the one specified in the Observability Pipelines Worker’s list of valid_tokens
in pipeline.yaml
.
# Example pipeline.yaml splunk_receiver source
sources:
splunk_receiver:
type: splunk_hec
address: 0.0.0.0:8088
valid_tokens:
- ${SPLUNK_TOKEN}
In the sample configuration provided, the same HEC token is used for both the Splunk source and destination.
At this point, your logs should be going to the Worker and be available for processing. The next section goes through what process is included by default, and the additional options that are available.
Working with data
The sample Observability Pipelines configuration does the following:
- Collects logs being sent from the Splunk forwarder to the Observability Pipelines Worker.
- Transforms logs by adding tags to data that has come through the Observability Pipelines Worker. This helps determine what traffic still needs to be shifted over to the Worker as you update your clusters. These tags also show you how logs are being routed through the load balancer, in case there are imbalances.
- Routes the logs by dual-shipping the data to both Splunk and Datadog.
Further reading
Additional helpful documentation, links, and articles: