Overview
This check monitors Boundary through the Datadog Agent. The minimum supported version of Boundary is 0.8.0.
Minimum Agent version: 7.38.0
Setup
Follow the instructions below to install and configure this check for an Agent running on a host. For containerized environments, see the Autodiscovery Integration Templates for guidance on applying these instructions.
Installation
The Boundary check is included in the Datadog Agent package.
No additional installation is needed on your server.
Configuration
Listener
A listener with an ops purpose must be set up in the config.hcl file to enable metrics collection. Here’s an example listener stanza:
controller {
name = "boundary-controller"
database {
url = "postgresql://<username>:<password>@10.0.0.1:5432/<database_name>"
}
}
listener "tcp" {
purpose = "api"
tls_disable = true
}
listener "tcp" {
purpose = "ops"
tls_disable = true
}
The boundary.controller.health service check submits as WARNING when the controller is shutting down. To enable this shutdown grace period, update the controller block with a defined wait duration:
controller {
name = "boundary-controller"
database {
url = "env://BOUNDARY_PG_URL"
}
graceful_shutdown_wait_duration = "10s"
}
Datadog Agent
Edit the boundary.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting your boundary performance data. See the sample boundary.d/conf.yaml for all available configuration options.
Restart the Agent.
Validation
Run the Agent’s status subcommand and look for boundary under the Checks section.
Data Collected
Metrics
| |
|---|
boundary.cluster.client.grpc.request_duration_seconds.bucket (count) | Histogram of latencies for gRPC requests between the cluster and any of its clients. Shown as second |
boundary.cluster.client.grpc.request_duration_seconds.count (count) | Histogram of latencies for gRPC requests between the cluster and any of its clients. Shown as second |
boundary.cluster.client.grpc.request_duration_seconds.sum (count) | Histogram of latencies for gRPC requests between the cluster and any of its clients. Shown as second |
boundary.controller.api.http.request_duration_seconds.bucket (count) | Histogram of latencies for HTTP requests. Shown as second |
boundary.controller.api.http.request_duration_seconds.count (count) | Histogram of latencies for HTTP requests. Shown as second |
boundary.controller.api.http.request_duration_seconds.sum (count) | Histogram of latencies for HTTP requests. Shown as second |
boundary.controller.api.http.request_size_bytes.bucket (count) | Histogram of request sizes for HTTP requests. Shown as byte |
boundary.controller.api.http.request_size_bytes.count (count) | Histogram of request sizes for HTTP requests. Shown as byte |
boundary.controller.api.http.request_size_bytes.sum (count) | Histogram of request sizes for HTTP requests. Shown as byte |
boundary.controller.api.http.response_size_bytes.bucket (count) | Histogram of response sizes for HTTP responses. Shown as byte |
boundary.controller.api.http.response_size_bytes.count (count) | Histogram of response sizes for HTTP responses. Shown as byte |
boundary.controller.api.http.response_size_bytes.sum (count) | Histogram of response sizes for HTTP responses. Shown as byte |
boundary.controller.cluster.grpc.request_duration_seconds.bucket (count) | Histogram of latencies for gRPC requests. Shown as second |
boundary.controller.cluster.grpc.request_duration_seconds.count (count) | Histogram of latencies for gRPC requests. Shown as second |
boundary.controller.cluster.grpc.request_duration_seconds.sum (count) | Histogram of latencies for gRPC requests. Shown as second |
boundary.worker.proxy.http.write_header_duration_seconds.bucket (count) | Histogram of time elapsed after the TLS connection is established to when the first http header is written back from the server. Shown as second |
boundary.worker.proxy.http.write_header_duration_seconds.count (count) | Histogram of time elapsed after the TLS connection is established to when the first http header is written back from the server. Shown as second |
boundary.worker.proxy.http.write_header_duration_seconds.sum (count) | Histogram of time elapsed after the TLS connection is established to when the first http header is written back from the server. Shown as second |
boundary.worker.proxy.websocket.active_connections (gauge) | Count of open websocket proxy connections (to Boundary workers). Shown as connection |
boundary.worker.proxy.websocket.received_bytes.count (count) | Count of received bytes for Worker proxy websocket connections. Shown as byte |
boundary.worker.proxy.websocket.sent_bytes.count (count) | Count of sent bytes for Worker proxy websocket connections. Shown as byte |
Events
The Boundary integration does not include any events.
Service Checks
boundary.openmetrics.health
Returns CRITICAL if the Agent is unable to connect to the OpenMetrics endpoint, otherwise returns OK.
Statuses: ok, critical
boundary.controller.health
Returns CRITICAL if the Agent is unable to connect to the controller’s health endpoint, WARNING if the controller received a shutdown signal, otherwise returns OK.
Statuses: ok, warning, critical
Log collection
Collecting logs is disabled by default in the Datadog Agent. Enable it in your datadog.yaml file:
To start collecting your Boundary logs, add this configuration block to your boundary.d/conf.yaml file:
logs:
- type: file
source: boundary
path: /var/log/boundary/events.ndjson
Change the path parameter value based on your environment. See the sample boundary.d/conf.yaml file for all available configuration options.
Troubleshooting
Need help? Contact Datadog support.