CloudPrem on Google Kubernetes Engine (GKE)

This product is not supported for your selected Datadog site. ().
このページは日本語には対応しておりません。随時翻訳に取り組んでいます。
翻訳に関してご質問やご意見ございましたら、お気軽にご連絡ください
CloudPrem is in Preview

Join the CloudPrem Preview to access new self-hosted log management features.

Request Access

Overview

This installation setup walks you through deploying Datadog CloudPrem on Google Kubernetes Engine (GKE).

CloudPrem on GKE uses the following Google Cloud services:

  • Google Kubernetes Engine (GKE): Container orchestration platform for running CloudPrem components
  • Cloud Storage (GCS): Object storage for telemetry data and indexes
  • Cloud SQL for PostgreSQL: Managed PostgreSQL database for metadata storage
  • Workload Identity: Secure authentication between GKE workloads and Google Cloud services

Prerequisites

Before you begin, confirm you have:

  1. Google Cloud CLI installed and configured

    gcloud version
    
  2. kubectl installed

    kubectl version --client
    
  3. Helm 3.x installed

    helm version
    
  4. GCP Project with billing enabled

    gcloud config set project YOUR_PROJECT_ID
    
  5. Required IAM Permissions:

    • roles/container.admin (Kubernetes Engine Admin)
    • roles/iam.serviceAccountAdmin (Service Account Admin)
    • roles/compute.admin (Compute Admin)
  6. Create or retrieve your API key.

  7. APIs Enabled:

    gcloud services enable container.googleapis.com \
      compute.googleapis.com \
      sqladmin.googleapis.com \
      storage.googleapis.com
    

Installation steps

Step 1: Set environment variables

Set the following environment variables to simplify subsequent commands and reduce copy-paste errors.

export PROJECT_ID="your-gcp-project-id"
export REGION="us-central1"
export CLUSTER_NAME="cloudprem-cluster"
export DATADOG_SITE="datadoghq.com"  # or datadoghq.eu, us3.datadoghq.com, us5.datadoghq.com
export BUCKET_NAME="${PROJECT_ID}-cloudprem"

Step 2: Create GKE cluster

Create a GKE cluster with Workload Identity enabled:

gcloud container clusters create ${CLUSTER_NAME} \
  --region ${REGION} \
  --num-nodes 1 \
  --workload-pool=${PROJECT_ID}.svc.id.goog \
  --machine-type n1-standard-4
  • Small (Dev/Test): 3 nodes, n1-standard-4 (~100GB/day)

  • Medium (Production): 5 nodes, n1-standard-8 (~500GB/day)

  • Large (Enterprise): 7+ nodes, n1-standard-16 (~1TB+/day)

Get cluster credentials:

gcloud container clusters get-credentials ${CLUSTER_NAME} --region ${REGION}

Install GKE auth plugin:

gcloud components install gke-gcloud-auth-plugin
export USE_GKE_GCLOUD_AUTH_PLUGIN=True

Verify cluster access:

kubectl cluster-info
kubectl get nodes

Step 3: Create Cloud Storage bucket

Create a GCS bucket for CloudPrem data storage:

export BUCKET_NAME="cloudprem-data-${PROJECT_ID}"

gcloud storage buckets create gs://${BUCKET_NAME} \
  --project=${PROJECT_ID} \
  --location=${REGION} \
  --uniform-bucket-level-access

Step 4: Create Cloud SQL PostgreSQL instance

Create a Cloud SQL PostgreSQL instance for metadata storage:

# Generate a secure password
export DB_PASSWORD=$(openssl rand -base64 32)
echo "Database password: ${DB_PASSWORD}"
# Save this password securely - you'll need it later

# Create Cloud SQL instance
gcloud sql instances create cloudprem-postgres \
  --database-version=POSTGRES_15 \
  --region=${REGION} \
  --root-password="${DB_PASSWORD}"

This can take a few minutes. Wait for the instance to be ready:

gcloud sql instances describe cloudprem-postgres \
  --format="value(state)"
# Should output: RUNNABLE

Create the CloudPrem database:

gcloud sql databases create cloudprem \
  --instance=cloudprem-postgres

Get the connection details:

export DB_CONNECTION_NAME=$(gcloud sql instances describe cloudprem-postgres \
  --format="value(connectionName)")
export DB_PUBLIC_IP=$(gcloud sql instances describe cloudprem-postgres \
  --format="value(ipAddresses[0].ipAddress)")

echo "Connection Name: ${DB_CONNECTION_NAME}"
echo "Public IP: ${DB_PUBLIC_IP}"

Authorize GKE nodes to connect to Cloud SQL:

# Get GKE node external IPs
export NODE_IPS=$(kubectl get nodes -o jsonpath='{.items[*].status.addresses[?(@.type=="ExternalIP")].address}' | tr ' ' ',')

# Authorize the IPs
gcloud sql instances patch cloudprem-postgres \
  --authorized-networks=${NODE_IPS} \
  --quiet

Step 5: Configure IAM and Workload Identity

Create a GCP service account for CloudPrem:

export SERVICE_ACCOUNT_NAME="cloudprem-sa"

gcloud iam service-accounts create ${SERVICE_ACCOUNT_NAME} \
  --display-name="CloudPrem Service Account" \
  --project=${PROJECT_ID}

Grant necessary IAM roles:

# Cloud SQL Client role
gcloud projects add-iam-policy-binding ${PROJECT_ID} \
  --member="serviceAccount:${SERVICE_ACCOUNT_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/cloudsql.client"

# Storage Object Admin role for GCS bucket
gsutil iam ch \
  serviceAccount:${SERVICE_ACCOUNT_NAME}@${PROJECT_ID}.iam.gserviceaccount.com:objectAdmin \
  gs://${BUCKET_NAME}

Create Kubernetes namespace and service account:

kubectl create namespace datadog-cloudprem

kubectl create serviceaccount cloudprem-ksa \
  --namespace datadog-cloudprem

kubectl annotate serviceaccount cloudprem-ksa \
  --namespace datadog-cloudprem \
  iam.gke.io/gcp-service-account=${SERVICE_ACCOUNT_NAME}@${PROJECT_ID}.iam.gserviceaccount.com

Bind GCP service account to Kubernetes service account:

gcloud iam service-accounts add-iam-policy-binding \
  ${SERVICE_ACCOUNT_NAME}@${PROJECT_ID}.iam.gserviceaccount.com \
  --role=roles/iam.workloadIdentityUser \
  --member="serviceAccount:${PROJECT_ID}.svc.id.goog[datadog-cloudprem/cloudprem-ksa]"

Step 6: Create Kubernetes secrets

Create secret for Datadog API key:

kubectl create secret generic datadog-secret \
  --from-literal=api-key=${DD_API_KEY} \
  --namespace=datadog-cloudprem

Create secret for PostgreSQL connection:

The password must be URL-encoded. For example: /%2F, +%2B, =%3D.
# URL-encode the password first
# Example: if password is "abc/def+ghi=" it becomes "abc%2Fdef%2Bghi%3D"

kubectl create secret generic cloudprem-metastore-config \
  --from-literal=QW_METASTORE_URI="postgresql://postgres:${DB_PASSWORD}@${DB_PUBLIC_IP}:5432/cloudprem" \
  --namespace=datadog-cloudprem

Step 7: Install CloudPrem with Helm

Add the Datadog Helm repository:

helm repo add datadog https://helm.datadoghq.com
helm repo update

Create a values.yaml file:

Set your Datadog site to .

# Datadog configuration
datadog:
   # The Datadog site to connect to. Defaults to `datadoghq.com`.
   # site: datadoghq.com
   # The name of the existing Secret containing the Datadog API key. The secret key name must be `api-key`.
   apiKeyExistingSecret: datadog-secret

# Service Account with Workload Identity
serviceAccount:
  create: false
  name: cloudprem-ksa
  extraAnnotations:
    iam.gke.io/gcp-service-account: cloudprem-sa@${YOUR_PROJECT_ID}.iam.gserviceaccount.com

# CloudPrem node configuration
config:
  # The root URI where index data is stored. This should be an gs path.
  # All indexes created in CloudPrem are stored under this location.
  default_index_root_uri: gs://${BUCKET_NAME}/indexes

# Internal ingress configuration for access within the VPC
# Helm chart does not support yet GKE ingress
#
# Additional annotations can be added to customize the ALB behavior.
ingress:
  internal:
    enabled: false

# Metastore configuration
# The metastore is responsible for storing and managing index metadata.
# It requires a PostgreSQL database connection string to be provided by a Kubernetes secret.
# The secret should contain a key named `QW_METASTORE_URI` with a value in the format:
# postgresql://<username>:<password>@<host>:<port>/<database>
#
# The metastore connection string is mounted into the pods using extraEnvFrom to reference the secret.
metastore:
  extraEnvFrom:
    - secretRef:
        name: cloudprem-metastore-uri

# Indexer configuration
# The indexer is responsible for processing and indexing incoming data it receives data from various sources (for example, Datadog Agents, log collectors)
# and transforms it into searchable files called "splits" stored in S3.
#
# The indexer is horizontally scalable - you can increase `replicaCount` to handle higher indexing throughput. Resource requests and limits should be tuned based on your indexing workload.
#
# The `podSize` parameter sets vCPU, memory, and component-specific settings automatically. The default values are suitable for moderate indexing loads of up to 20 MB/s per indexer pod.
# See the sizing guide for available tiers and their configurations.
indexer:
  replicaCount: 2
  podSize: xlarge

# Searcher configuration
# The `podSize` parameter sets vCPU, memory, and component-specific settings automatically.
# Choose a tier based on your query complexity, concurrency, and data access patterns.
searcher:
  replicaCount: 2
  podSize: xlarge

Install CloudPrem:

helm install cloudprem datadog/cloudprem \
  --namespace datadog-cloudprem \
  --values values.yaml

Step 8: Add internal GCE ingress

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: cloudprem-internal
  namespace: datadog-cloudprem
  annotations:
    kubernetes.io/ingress.class: "gce-internal"
spec:
  rules:
  - http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: cloudprem-indexer
            port:
              number: 7280
kubectl apply -f ingress-values.yaml

Install the Datadog Agent to collect metrics from CloudPrem components and send them to Datadog.

Create a separate namespace for the Datadog Agent:

kubectl create namespace datadog

# Copy the API key secret to the datadog namespace
kubectl get secret datadog-secret -n datadog-cloudprem -o yaml | \
  sed 's/namespace: datadog-cloudprem/namespace: datadog-agent/' | \
  kubectl apply -f -

Install Datadog operator:

# Datadog Agent Helm Values for GKE Autopilot
apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
  name: datadog
  namespace: datadog
spec:
  global:
    clusterName: <CLUSTER_NAME>
    site: datadoghq.com
    credentials:
      apiSecret:
        secretName: datadog-secret
        keyName: api-key
    env:
      - name: DD_LOGS_CONFIG_LOGS_DD_URL
        value: http://<RELEASE_NAME>-indexer.<NAMESPACE_NAME>.svc.cluster.local:7280
      - name: DD_LOGS_CONFIG_EXPECTED_TAGS_DURATION
        value: "1000000h"

  features:
    logCollection:
      enabled: true
      containerCollectAll: true

    dogstatsd:
        port: 8125
        useHostPort: false  # Must be false in Autopilot
        nonLocalTraffic: true

Install the Datadog Agent:

kubectl apply -f datadog-operator-values.yaml

Verify the Datadog Agent is running:

kubectl get pods -n datadog

Update CloudPrem to send metrics to the Datadog Agent’s DogStatsD service. Add this to your values.yaml:

# DogStatsD configuration - send metrics to Datadog Agent
dogstatsdServer:
  host:
    value: "datadog-agent.datadog.svc.cluster.local"
  port: 8125

Upgrade CloudPrem with the new configuration:

helm upgrade cloudprem datadog/cloudprem \
  --namespace datadog-cloudprem \
  --values values.yaml \
  --timeout 10m

Verify the DogStatsD configuration:

kubectl get pod -n datadog-cloudprem -l app.kubernetes.io/component=metastore -o jsonpath='{.items[0].spec.containers[0].env[?(@.name=="CP_DOGSTATSD_SERVER_HOST")].value}'
# Should output: datadog-agent.datadog.svc.cluster.local

Step 10: Verify deployment

Check pod status:

kubectl get pods -n datadog-cloudprem

All pods should be in Running state with READY status:

NAME                                   READY   STATUS    RESTARTS   AGE
cloudprem-control-plane-xxx            1/1     Running   0          5m
cloudprem-indexer-0                    1/1     Running   0          5m
cloudprem-indexer-1                    1/1     Running   0          5m
cloudprem-janitor-xxx                  1/1     Running   0          5m
cloudprem-metastore-xxx                1/1     Running   0          5m
cloudprem-metastore-yyy                1/1     Running   0          5m
cloudprem-searcher-0                   1/1     Running   0          5m
cloudprem-searcher-1                   1/1     Running   0          5m

Check metastore logs for successful database connection:

kubectl logs -n datadog-cloudprem -l app.kubernetes.io/component=metastore --tail=50

Uninstall

To completely remove CloudPrem:

# Uninstall Helm release
helm uninstall cloudprem --namespace datadog-cloudprem

# Delete namespace
kubectl delete namespace datadog-cloudprem

# Delete Cloud SQL instance
gcloud sql instances delete cloudprem-postgres --quiet

# Delete GCS bucket
gsutil -m rm -r gs://${BUCKET_NAME}

# Delete GCP service account
gcloud iam service-accounts delete \
  ${SERVICE_ACCOUNT_NAME}@${PROJECT_ID}.iam.gserviceaccount.com \
  --quiet

# Delete GKE cluster
gcloud container clusters delete ${CLUSTER_NAME} \
  --region ${REGION} \
  --quiet

Best practices

  • Use Workload Identity instead of service account keys for better security.
  • Enable Cloud SQL backups for disaster recovery.
  • Use regional GKE clusters for high availability.
  • Monitor disk usage on indexer nodes and enable auto-scaling.
  • Set up alerts in Datadog for CloudPrem component health.
  • Use private GKE clusters for enhanced security in production.
  • Regularly update CloudPrem to the latest version for bug fixes and features.
  • Test scaling in a staging environment before production changes.
  • Store the database password in Secret Manager and use External Secrets Operator (ESO) or the Secrets Store CSI Driver to provide the password to metastore pods.

Next steps

  • Configure your applications to send telemetry to CloudPrem
  • Set up dashboards in Datadog to monitor CloudPrem performance
  • Review CloudPrem logs and metrics in your Datadog account
  • Plan capacity based on your data volume

Further reading

お役に立つドキュメント、リンクや記事: