Ocient

Supported OS Linux

Integration version1.0.0

Overview

Ocient Hyperscale Data Warehouse is a data analytics software solutions company that enables all-the-time, compute-intensive analysis of large, complex datasets while optimizing for performance, cost, and energy efficiency.

With industry-standard interfaces like SQL and JDBC, Ocient makes it easy for organizations to interact with data within its platform. This integration enables your Ocient Hyperscale Data Warehouse to send metrics to Datadog, including metrics related to query performance, disk usage, database tables, and more.

Setup

Installation

  1. Run the following command to install the Agent integration:

    agent integration install -t datadog-ocient==1.0.0
    
  2. Configure the integration by setting openmetrics_endpoint to your cluster’s master node. See Getting Started with Integrations for more information.

  3. Restart the Agent.

Configuration

To configure this check for an Agent running on a host:

  1. Edit the ocient.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory. For all available configuration options, see the sample ocient.d/conf.yaml.
instances:

- use_openmetrics: true  # Enables OpenMetrics V2

  ## @param openmetrics_endpoint - string - required
  ## The URL exposing metrics in the OpenMetrics format.
  #
  openmetrics_endpoint: http://localhost:<PORT>/metrics
  1. Restart the Agent.

Validation

Run the Agent’s status subcommand and look for ocient under the Checks section.

Data Collected

Metrics

ocient.lat.paused
(gauge)
Returns 1 if LAT loading is paused
ocient.lat.bytes_buffered
(gauge)
The number of bytes currently buffered by the loading service
ocient.lat.complete
(gauge)
Returns 1 if LAT loading is complete
ocient.lat.workers
(gauge)
The number of workers currently in use by the loading service
ocient.lat.pipeline.bytes_pushed
(gauge)
The number of bytes pushed by the LAT service to the storage layer
ocient.lat.pipeline.push_errors
(gauge)
The number of errors from the pipeline
ocient.lat.pipeline.push_attempts
(gauge)
The number of times the pipeline has attempted to push rows to the storage layer
ocient.lat.pipeline.rows_pushed
(gauge)
The number of rows pushed to the storage layer since last startup
ocient.lat.pipeline.record_errors
(gauge)
The number of records which produced an error by the loading service
ocient.allocation.pending_timeouts
(gauge)
Number of uncompleted storage objects of a given type that will expire and be deallocated in the future
ocient.blobstore.num_blobs
(gauge)
Total number of tracked blobs within the specified blobstore (per silo) A blob represents a unit of spill issued by the VM
ocient.blobstore.pending_dispatches
(gauge)
Number of oustanding callbacks to be processed by the blobstore thread (per silo)
ocient.blobstore.total_bytes
(gauge)
Tracks accumulative bytes from different operations in the blobstore, along with the number of bytes actively used and bytes available for spill
ocient.blobstore.verify_hash_operations
(gauge)
Number of verify hash operations {completed, failed} - these operations validate spill requests from the VM as an extra check for correctness
ocient.blockDeviceContext.bytesRead
(gauge)
Total number of bytes read from the drives
ocient.cmdcomp.metadata.cache_operations
(gauge)
Number of times table or array column PDFs were found in the cache
ocient.cmdcomp.metadata.eviction_cycles
(gauge)
Number of times table or array column PDFs were evicted from the cache
ocient.cmdcomp.metadata.tableToRequestStatsCounter
(gauge)
Number of entries in the structure that maps tables to their statistics
ocient.cmdcomp.queries
(gauge)
Number of active queries in any state
ocient.cmdcomp.sql_node_load
(gauge)
The load, as a value between 0.0 and 1.0 for each sql node, used to load balance between sql nodes
ocient.cmdcomps.server_connections
(gauge)
Number of active connections to the SQL node
ocient.disk_based_operator_instance.bytes_spilled_compressed
(gauge)
Total bytes written to disk for spill
ocient.disk_based_operator_instance.bytes_spilled_uncompressed
(gauge)
Total bytes of memory to spill to disk
ocient.fs.num_file_instances
(gauge)
Number of opened file handles
ocient.fs.num_loaded_inode_descriptor_blocks
(gauge)
Number of cached inode descriptor blocks - inode descriptor blocks are accessed on files being created and opened
ocient.fs.num_memory_stalls
(gauge)
Number of requests for IO capable memory that were unable to be fulfilled
ocient.fs.operations
(gauge)
Tracks metrics on filesystem operations such as total operations of a given type and operations outstanding.
ocient.fs.volume.fragmentation.avg
(gauge)
Reports the average fragmentation score for the given file size bucket. The fragmentation score is the ratio of the number of block refs for the file against the ideal number of minimum block refs.
ocient.fs.volume.fragmentation.max
(gauge)
Reports the maximum fragmentation score for the given file size bucket. The fragmentation score is the ratio of the number of block refs for the file against the ideal number of minimum block refs.
ocient.fs.volume.fragmentation.min
(gauge)
Reports the minimum fragmentation score for the given file size bucket. The fragmentation score is the ratio of the number of block refs for the file against the ideal number of minimum block refs.
ocient.fs.volume.free_bytes
(gauge)
The number of free unused 4KiB blocks in the drive
ocient.fs.volume.free_blocks
(gauge)
The number of free bytes in the drive
ocient.fs.volume.free_inodes
(gauge)
The number of unused inode descriptors in the drive
ocient.fs.volume.health.status
(gauge)
The result of the startup fsck. If normal, will be 0
ocient.fs.volume.inodeOperationsSize
(gauge)
The number of active operations which are manipulating an associated file
ocient.fs.volume.legacySpaceUsedPct
(gauge)
Total percentage of space used on a drive, which also takes into account any legacy data still present in the local storage service. This is the value that should be used to gauge overall utilization of a drive
ocient.fs.volume.outstanding_operations
(gauge)
Outstanding metadata mutations from the filesystem
ocient.fs.volume.space_used_pct
(gauge)
Percentage of space used within the file store volume on a drive. Does not account for legacy data outside of the file store
ocient.fs.volume.total_blocks
(gauge)
Total number of blocks managed by the file store on a given drive. Equals the number of free blocks plus used blocks
ocient.fs.volume.total_blocks_used
(gauge)
Total number of 4 KiB blocks in-use within the file store on a given drive
ocient.fs.volume.total_bytes
(gauge)
Total number of bytes managed by the file store on a given drive. Equals the number of free bytes plus used bytes
ocient.fs.volume.total_bytes_used
(gauge)
Total number of bytes in-use within the file store on a given drive
ocient.fs.volume.total_inodes
(gauge)
Total number of inode descriptors that are managed by a file store volume. Equals number of used inode descriptors plus unused inode descriptors
ocient.fs.volume.total_inodes_used
(gauge)
Total number of inode descriptors that are in-use by a file store volume.
ocient.fs.volume_health.corruptions
(gauge)
Number of detected corruptions from fsck on startup
ocient.gsd_buffer.batched_requests.avg
(gauge)
The average number of getSegmentData requests that are sent in a single batch. Higher values represent more efficient network utilization. Only applicable when nodes or drives are down and virtual segments are being served
ocient.healthProtocolInstance.numRunningTasks
(gauge)
The number of distributed tasks that are currently running
ocient.io.page_scheduler.page_count
(gauge)
Number of pages given out by the page scheduler in a given status
ocient.jemalloc.stats
(gauge)
Contains stats from jemalloc's internal counters, such as heap bytes allocated
ocient.loadTocLimit.pendingQueueCount
(gauge)
The number of loadToc actions being rate-limited
ocient.local_storage_service.available_spare_pct
(gauge)
The percentage of spare capacity remaining on a drive
ocient.local_storage_service.controller_busy_time
(gauge)
The total time (in minutes) a controller is 'busy' (an operation is outstanding in one of its IO queues)
ocient.local_storage_service.crc_errors
(gauge)
Number of PCIe interface CRC errors encountered. Preserved across power cycles
ocient.local_storage_service.data_units_write
(gauge)
The total number of data units written to a drive
ocient.local_storage_service.device_endurance
(gauge)
Contains a vendor specific estimate of the percentage of life used based on actual usage and the manufacturer's prediction of life. A value of 100 indiciates the estimated endurance as been fully consumed, but the drive may continue to function
ocient.local_storage_service.device_status
(gauge)
Status of a drive controlled by the local storage service. A status of 10 is active
ocient.local_storage_service.error_log_entries
(gauge)
Number of entries in the error log over the life of the controller
ocient.local_storage_service.free_space
(gauge)
The estimated amount of free space on a drive
ocient.local_storage_service.media_errors
(gauge)
Number of occurances where the controller detected an unrecoverable data integrity error
ocient.local_storage_service.opal_enabled
(gauge)
OPAL related metrics for a given drive - OPAL allows data encrypted at rest. A status of 0 indiciates a drive is unlocked and accessible to the database. For OPAL-compliant drives, enabled and supported should both be 1
ocient.local_storage_service.opal_status
(gauge)
The OPAL status of a device
ocient.local_storage_service.opal_supported
(gauge)
Returns data about whether OPAL is supported on a device
ocient.local_storage_service.operations
(gauge)
Metrics for NVMe operations for a specific drive. If the number of inflight operations remains elevated and the overflow IO queue is populated, a drive is being saturated
ocient.local_storage_service.power_cycles
(gauge)
Number of power cycles experienced by the drive
ocient.local_storage_service.power_on_hours
(gauge)
Number of hours that a controller has been actively powered on
ocient.local_storage_service.read_commands
(gauge)
Total number of read commands issued
ocient.local_storage_service.segment_table_entries
(gauge)
Number of tracked partitions in the local storage service - for modern deployments this value should be 1
ocient.local_storage_service.space_free_pct
(gauge)
Total free space of a drive, reported as a percentage
ocient.local_storage_service.temp
(gauge)
The current temperature reading from a device
ocient.local_storage_service.total_space
(gauge)
Total amount of storage (in bytes) managed by the local storage service
ocient.local_storage_service.unsafe_shutdowns
(gauge)
Number of reported power losses to the controller
ocient.local_storage_service.warn_available_spare
(gauge)
Number of reported warnings for available spare falling below the controller's warning threshold
ocient.local_storage_service.warn_read_only
(gauge)
If 1, the controller has entered read-only mode in response to falling endurance
ocient.local_storage_service.warn_reliability
(gauge)
If 1, the controller has reported that reliability has fallen to a critical level due to media or internal errors
ocient.local_storage_service.warn_temp
(gauge)
If 1, the controller's current temperature exceeds a warning threshold
ocient.local_storage_service.write_block_complete
(gauge)
Total number of logical blocks written
ocient.local_storage_service.write_block_count
(gauge)
Total number of logical block write requests
ocient.local_storage_service.write_commands
(gauge)
Total number of write commands issued
ocient.local_storage_service.write_complete
(gauge)
Total number of write operations completed
ocient.local_storage_service.write_submit_count
(gauge)
Total number of write opreations started
ocient.memory.heap
(gauge)
Amount of memory, in bytes currently allocated from the heap
ocient.memory.huge
(gauge)
Amount of huge page memory currently allocated, in bytes
ocient.metadata.storage_protocol.task_count
(gauge)
The number of tasks belonging to the Metadata Storage Protocol with the given status or task type
ocient.metadata.storage_protocol.task_time.avg
(gauge)
The average time taken for a given task in the system, in nanoseconds
ocient.metadataStorageProtocolInstance.lastDeserializationDuration
(gauge)
Time to deserialize the database metadata, in microseconds
ocient.metadataStorageProtocolInstance.lastSerializationConfigSize
(gauge)
Size of serialized database metadata, in bytes
ocient.metadataStorageProtocolInstance.lastSerializationDuration
(gauge)
Time to serialize the database metadata, in microseconds
ocient.network.tcpChannel.uvBufferPool.bytesInUse
(gauge)
Number of bytes allocated for reading inbound network data
ocient.db.can_connect
(gauge)
Returns 1 if the database service is reachable and healthy
ocient.db.gdc_current_count
(gauge)
Number of active entries in the global dictionary compression table
ocient.db.gdc_max_count
(gauge)
Max allowance entries in the global dictionary compression table
ocient.db.maxpagetime
(gauge)
The max timestamp of a page
ocient.db.maxsegtime
(gauge)
The max timestamp of a segment
ocient.db.minpagetime
(gauge)
The min timestamp of a page
ocient.db.minsegtime
(gauge)
The min timestamp of a segment
ocient.db.page_rows
(gauge)
The ratio of current row count to pages
ocient.db.pagecount
(gauge)
The total number of active pages in the cluster
ocient.db.query_count
(gauge)
The number of active queries across the entire system
ocient.db.rows_per_page
(gauge)
The ratio of rows to pages
ocient.db.rows_per_seg
(gauge)
The ratio of rows to segments
ocient.db.seg_rows
(gauge)
The number of rows allocated to a segment for a given table
ocient.db.segcount
(gauge)
The number of segments in use by a database
ocient.db.segments
(gauge)
Reports the status detail of a segment
ocient.db.size
(gauge)
The consumed disk space of a database
ocient.db.tot_rows
(gauge)
The total number of rows in a table
ocient.operator_summary.num_operators
(gauge)
The number of query operators actively running on a system
ocient.operator_summary.num_queries
(gauge)
The number queries actively running on a system
ocient.operator_summary.oldest_age
(gauge)
The largest elapsed time for a query operator actively running on the system
ocient.partitionProvider.lockSlotsTotal
(gauge)
Total number of cache/lock slots
ocient.partitionProvider.lockSlotsUsed
(gauge)
Total number of cache/lock slots in use
ocient.partition_provider.bytes_read
(gauge)
Total number of bytes read from segment partitions
ocient.partition_provider.cache_bytes
(gauge)
Total number of bytes being allocated to activate cache slots (partition size * number of cache slots in use)
ocient.partition_provider.cache_operations
(gauge)
Number of partitions requested that were already held in a lock slot
ocient.partition_provider.cache_raw_bytes
(gauge)
Total number of bytes used by cache slots (less than or equal to cached_bytes as we might not use all the bytes allocated for a slot)
ocient.partition_provider.cache_size
(gauge)
Total number of bytes for all cache slots active or not
ocient.protocol.actions
(gauge)
Number of currently active adminstration protocol actions
ocient.protocol.actions.count
(gauge)
The number of a given action with given status on the given protocol.
ocient.protocol.actions.time
(gauge)
The amount of time, in milliseconds, taken by a given event on a given protocol.
ocient.protocol.cached_rows
(gauge)
Number of rows cached by the GlobalDataStorage cache.
ocient.protocol.cached_tables
(gauge)
Number of tables cached by the GlobalDataStorage cache.
ocient.protocol.raft.participant_state
(gauge)
Global data Storage raft participant state. SUSPENDED=1, FOLLOWER=2, CANDIDATE=3, LEADER_ESTABLISHING=4, LEADER=5
ocient.protocol.raft.snapshot_size
(gauge)
Global Data Storage raft serialized snapshot size
ocient.protocol.time
(gauge)
The amount of time, in milliseconds, taken by a given event on a given protocol.
ocient.raftEngine.metadataStorageProtocol.nodeCameOnline.time
(gauge)
Time in MS it took to execute the nodeCameOnline leader method for each metadataNode (reported only if time is in excess of 1 second)
ocient.resource_manager.pending_hp_memory
(gauge)
Number of HugePage Fragments reserved for use by the resource manager
ocient.result_cache.queries
(gauge)
The number of queries for which the result set is currently cached.
ocient.rolehostd.initialization_status
(gauge)
Status code of the phases of the ocient application initialization process
ocient.segment_activation_limit.pending_queue_count
(gauge)
The number of segment activations being rate-limited
ocient.segment_store.bytes
(gauge)
Number of bytes in-use for storage objects of a given type
ocient.segment_store.count
(gauge)
Number of tracked storage objects of a given type
ocient.segment_store.operations
(gauge)
Number of operations started of a kind for a given storage type
ocient.segment_transfer.operations.count
(gauge)
Number of requests of a given operation responded to with a given status (success or failure)
ocient.segment_transfer.operations.duration
(gauge)
Sum of response time minus request time for all requests of a given operation
ocient.storage_cluster.activating_segments
(gauge)
The number of segments that are currently activating on the node (either all segments or a certain type)
ocient.storage_cluster.data_stats.avg_row_count
(gauge)
Average number of rows in tracked storage objects of a given type
ocient.storage_cluster.data_stats.avg_size
(gauge)
Average size (in bytes) of tracked storage objects of a given type
ocient.storage_cluster.data_stats.num_objects
(gauge)
Total number of tracked storage objects of a given type
ocient.storage_cluster.data_stats.total_row_count
(gauge)
Total row count of tracked storage objects of a given type
ocient.storage_cluster.data_stats.total_size
(gauge)
Total size (in bytes) of tracked storage objects of a given type
ocient.storage_cluster.node_table_segment_size_lookup.size
(gauge)
Total number of bytes tracked in the node table segment size lookup cache
ocient.storage_cluster.on_put_segment_data.throughput.avg
(gauge)
Average throughput of actions for loading data onto a foundation node in MiB/s
ocient.storage_cluster.osn_reaps.batched
(gauge)
Number of reap requests that have been given to the OSN reap batcher (a mechanism to reduce the number of RPCs performed)
ocient.storage_cluster.osn_reaps.duration
(gauge)
Length of time (in nanoseconds) of the last OSN reap leader method that was executed on a given node Values will seem to freeze is leadership in the cluster is passed to a new node
ocient.storage_cluster.osns.created
(gauge)
Number of new OSNs created by the storage cluster leader through advancing the legal OSN range
ocient.storage_cluster.osns.reaped
(gauge)
Number of actual OSN reap leader method requests sent (will be less than number of raw batched requests)
ocient.storage_cluster.pending_segment_deletions
(gauge)
The number of segments pending deletion, blocked by ongoing, running queries
ocient.storage_cluster.probes.complete
(gauge)
The number of completed query probes that this node believes has completed
ocient.storage_cluster.probes.incomplete
(gauge)
The number of incomplete query probes that this node believes are ongoing
ocient.storage_storage.elapsed_time_for_first_osn_to_activate
(gauge)
The time it takes from when a node starts activating its first OSN to when it completes
ocient.storage_table_action_cooldown_buffer.queued_actions
(gauge)
The number of storage vtable actions being rate-limited
ocient.stream_loader.api.push_rows_requests
(gauge)
Total number of pushRows requests received
ocient.stream_loader.data.page_max_size
(gauge)
Size of largest page from last page set (per-table)
ocient.stream_loader.data.page_median_count
(gauge)
Size of median page from last page set (per-table)
ocient.stream_loader.data.page_min_size
(gauge)
Size of smallest page from last page set (per-table)
ocient.stream_loader.data.tracked_page_bytes
(gauge)
Size of pages awaiting segment generation by this loader (per-bucket)
ocient.stream_loader.data.tracked_page_count
(gauge)
Count of pages awaiting segment generation by this loader (per-bucket)
ocient.tkt.pipeline.operations
(gauge)
Number of completed IO requests
ocient.tkt.segment_service.cache_ops
(gauge)
Number of segment retrieval requests that interact with the segment service cache with a given outcome (e.g. number of requests that make use of the cache) The cache is used to speed up the filtering of target segments for query execution
ocient.tkt.segment_service.cache_size
(gauge)
The number of OSNs whose list of segments are cached in the segment service
ocient.tkt.segment_service.required_tables
(gauge)
The number of tables tracked within the segment service that have segments which can be served to queries
ocient.virtual_read_cache.back_writes
(gauge)
Number of blocks evicted from the virtual read heap cache to the disk cache
ocient.virtual_read_cache.disk.operations
(gauge)
Number of total operations in the disk cache with the given type and outcome
ocient.virtual_read_cache.heap.operations
(gauge)
Number of total operations in the heap cache with the given type and outcome
ocient.virtual_read_cache.reads.heap
(gauge)
Number of blocks read from the virtual read cache of the given type (heap or disk)
ocient.virtual_read_cache.total_blocks.disk
(gauge)
Total number of cache blocks in the virtual segment cache of a given type (heap or disk)
ocient.vm.active_queries_count
(gauge)
The number of queries currently being processed by this node.
ocient.vm.datablock_router.network_rate
(gauge)
The current rate of receiving data, in Mbps
ocient.vm.datablock_router.block_count
(gauge)
The number of non-routed, direct datablocks received from connected peers
ocient.vm.datablock_router.budget
(gauge)
The number of additional fragments the router is currently allowed to queue
ocient.vm.datablock_router.byte_count
(gauge)
The number of bytes of the serialized messages received from connected peers
ocient.vm.fetchingCachedQueriesCount
(gauge)
The number of outstanding requests to the query info cache.
ocient.vm.huge_block.alloc_count
(gauge)
The number of huge blocks that have been allocated.
ocient.vm.huge_block.resizes
(gauge)
The number of times a huge block has been shrunk.
ocient.vm.huge_memory_pool.allocated
(gauge)
The number of bytes of huge memory in silos on this node.
ocient.vm.long_dispatch_events
(gauge)
The number of times an event has taken more than half a second to process.
ocient.vm.query_tree_probe.count
(gauge)
The number of times we've run a query tree probe action.
ocient.vm.queryTreeProbe.timeMs
(gauge)
The amount of time, in milliseconds, we've spent on query tree probes.
ocient.vm.scheduler.no_work_oom_cycles
(gauge)
The number of cycles (passes through the VM loop) which were used for handling OOM, but did not actually do any work.
ocient.vm.scheduler.oom_killed_queries
(gauge)
Number of queries which have been killed for OOM since startup.
ocient.vm.scheduler.opInstRuntimeRatio
(gauge)
Fraction of time being spent in executing operators.
ocient.vm.stats.pdf_cache_size
(gauge)
Number of bytes of RAM used in the PDF cache
ocient.vmprotocol.ping.countOver100ms
(gauge)
Number of pings to other nodes which have exceeded 100ms since startup.
ocient.vmprotocol.ping.current
(gauge)
The elapsed time for the current ping.

Service Checks

Ocient does not include any service checks.

Events

Ocient does not include any events.

Troubleshooting

Need help? Contact Ocient support.