Manage and monitor CloudPrem

Join the Preview!

Datadog CloudPrem is in Preview.

Retention policy

The retention policy specifies how long data is stored before being deleted. By default, the retention period is set to 30 days. Data is automatically removed daily by the janitor, which deletes splits (index files) older than the defined retention threshold.

To change the retention period, update the cloudprem.index.retention parameter in the Helm chart values file, then upgrade the Helm release and optionally restart the janitor pod to apply the changes immediately:

  1. Update the retention period in the Helm chart values file with a human-readable string (for example, 15 days, 6 months, or 3 years):

    datadog-values.yaml

    cloudprem:
      index:
        retention: 6 months

  2. Upgrade the Helm chart release:

    helm upgrade <RELEASE_NAME> datadog/cloudprem \
      -n <NAMESPACE_NAME> \
      -f datadog-values.yaml
    
  3. Restart the janitor pod (optional but recommended for immediate effect):

    kubectl delete pod -l app.kubernetes.io/component=janitor -n <NAMESPACE_NAME>
    

Dashboards

CloudPrem provides an out-of-the-box dashboard that monitors CloudPrem’s key metrics.

Setup

These metrics are exported by DogStatsD. You can either:

  • Run DogStatsD as a standalone service, or
  • Run the Datadog Agent (which includes DogStatsD by default)

Configure either option with your organization’s API key to export these metrics. As soon as your CloudPrem cluster is connected to Datadog, the OOTB dashboard is automatically created, and you can access it from your Dashboards list.

Data Collected

MetricDescription
indexed_events.count
(Counter)
Number of indexed events
indexed_events_bytes.count
(Counter)
Number of indexed bytes
ingest_requests.count
(Counter)
Number of ingest requests
object_storage_delete_requests.count
(Counter)
Number of delete requests on object storage
object_storage_get_requests.count
(Counter)
Number of get requests on object storage
object_storage_get_requests_bytes.count
(Counter)
Total bytes read from object storage using GET requests
object_storage_put_requests.count
(Counter)
Number of PUT requests on object storage
object_storage_put_requests_bytes.count
(Counter)
Total bytes written to object storage using PUT requests
pending_merge_ops.gauge
(Gauge)
Number of pending merge operations
search_requests.count
(Counter)
Number of search requests
search_requests.duration_seconds
(Histogram)
Search request latency
metastore_requests.count
(Counter)
Number of metastore requests
metastore_requests.duration_seconds
(Histogram)
Metastore request latency
cpu.usage.gauge
(Gauge)
CPU usage percentage
uptime.gauge
(Gauge)
Service uptime in seconds
memory.allocated_bytes.gauge
(Gauge)
Allocated memory in bytes
disk.bytes_read.counter
(Counter)
Total bytes read from disk
disk.bytes_written.counter
(Counter)
Total bytes written to disk
disk.available_space.gauge
(Gauge)
Available disk space in bytes
disk.total_space.gauge
(Gauge)
Total disk capacity in bytes
network.bytes_recv.counter
(Counter)
Total bytes received over network
network.bytes_sent.counter
(Counter)
Total bytes sent over network