Network Performance Monitoring is now generally available! Network Monitoring is now available!

Vespa

Agent Check Agent Check

Supported OS: Linux

Overview

Gather metrics from your Vespa system in real time to:

  • Visualize and monitor Vespa state and performance
  • Alert on health and availability

Setup

The Vespa check is not included in the Datadog Agent package.

Installation

To install the check on your host:

  1. Install the developer toolkit on any machine.
  2. Run ddev release build vespa to build the package.
  3. Download the Datadog Agent.
  4. Upload the build artifact to any host with an Agent and run datadog-agent integration install -w path/to/vespa/dist/<ARTIFACT_NAME>.whl.

Configuration

To configure the Vespa check:

  1. Create a vespa.d/ folder in the conf.d/ folder at the root of your Agent’s configuration directory.
  2. Create a conf.yaml file in the vespa.d/ folder previously created.
  3. See the sample vespa.d/conf.yaml file and copy its content in the conf.yaml file.
  4. Edit the conf.yaml file to configure the consumer, which decides the set of metrics forwarded by the check:
    • consumer: The consumer to collect metrics for, either default or a custom consumer from your Vespa application’s services.xml.
  5. Restart the Agent.

Validation

Run the Agent’s status subcommand and look for vespa under the Checks section.

Data Collected

Metrics

vespa.http.status.1xx.rate
(gauge)
Number of responses with a 1xx status
Shown as response
vespa.http.status.2xx.rate
(gauge)
Number of responses with a 2xx status
Shown as response
vespa.http.status.3xx.rate
(gauge)
Number of responses with a 3xx status
Shown as response
vespa.http.status.4xx.rate
(gauge)
Number of responses with a 4xx status
Shown as response
vespa.http.status.5xx.rate
(gauge)
Number of responses with a 5xx status
Shown as response
vespa.jdisc.gc.ms.average
(gauge)
Time spent in GC
Shown as millisecond
vespa.mem.heap.free.average
(gauge)
Free heap size
Shown as byte
vespa.queries.rate
(gauge)
Number of search queries
Shown as query
vespa.feed.operations.rate
(gauge)
Number of feed operations
Shown as operation
vespa.query_latency.average
(gauge)
Total query processing time
Shown as millisecond
vespa.query_latency.95percentile
(gauge)
95 percentile total query processing time
Shown as millisecond
vespa.query_latency.99percentile
(gauge)
99 percentile total query processing time
Shown as millisecond
vespa.hits_per_query.average
(gauge)
Hits in the returned result, per query
Shown as hit
vespa.totalhits_per_query.average
(gauge)
Estimated total number of hits per query
Shown as hit
vespa.degraded_queries.rate
(gauge)
Queries with degraded results due to timeout
Shown as query
vespa.failed_queries.rate
(gauge)
Failed queries
Shown as query
vespa.serverActiveThreads.average
(gauge)
Threads that are active processing requests
Shown as thread
vespa.content.proton.search_protocol.docsum.requested_documents.rate
(gauge)
Requested document summaries
Shown as document
vespa.content.proton.search_protocol.docsum.latency.average
(gauge)
Docsum request latency on content node
Shown as second
vespa.content.proton.search_protocol.query.latency.average
(gauge)
Query request latency on content node
Shown as second
vespa.content.proton.documentdb.documents.total.last
(gauge)
Total documents in this document db (ready + not-ready)
Shown as document
vespa.content.proton.documentdb.documents.ready.last
(gauge)
Ready documents in this document db
Shown as document
vespa.content.proton.documentdb.documents.active.last
(gauge)
Active/searchable documents in this document db
Shown as document
vespa.content.proton.documentdb.disk_usage.last
(gauge)
Total disk usage for this document db
Shown as byte
vespa.content.proton.documentdb.memory_usage.allocated_bytes.last
(gauge)
Total memory usage for this document db
Shown as byte
vespa.content.proton.resource_usage.disk.average
(gauge)
Relative amount of disk space used by this process
Shown as fraction
vespa.content.proton.resource_usage.memory.average
(gauge)
Relative amount of memory used by this process
Shown as fraction
vespa.content.proton.resource_usage.feeding_blocked.last
(gauge)
Whether feeding is blocked due to resource limitations (value is 0 or 1)
vespa.content.proton.documentdb.matching.docs_matched.rate
(gauge)
Number of documents matched
Shown as document
vespa.content.proton.documentdb.matching.docs_reranked.rate
(gauge)
Number of documents re-ranked (second phase)
Shown as document
vespa.content.proton.documentdb.matching.rank_profile.query_latency.average
(gauge)
Total latency when matching and ranking a query
Shown as second
vespa.content.proton.documentdb.matching.rank_profile.rerank_time.average
(gauge)
Time spent on 2nd phase ranking
Shown as second
vespa.content.proton.transactionlog.disk_usage.last
(gauge)
Disk usage of the transaction log
Shown as byte

Service Checks

vespa.metrics_health:
Returns CRITICAL if there is no response from the Vespa Node metrics API. Returns WARNING if there is a response from the Vespa Node metrics API but there was an error in processing, otherwise returns OK.

vespa.process_health:
For each Vespa process, returns CRITICAL if the process seems to be down (the Vespa Node metrics API fails to connect to the process). Returns WARNING if the process status is unknown (the Vespa Node metrics API can connect to the process, but gets an error in the response), otherwise returns OK.

Events

The Vespa integration does not include any events.

Troubleshooting

Need help? Contact Datadog support.


Mistake in the docs? Feel free to contribute!