InfiniBand

Supported OS Linux

Integration version1.1.0

Overview

This check monitors InfiniBand through the Datadog Agent.

This integration monitors data transfers by collecting counters and RDMA hardware counters from the InfiniBand subsystem. It tracks performance metrics through the Linux kernel’s InfiniBand interface, which provides metric counters even when using alternative transports like RDMA over Converged Ethernet (RoCE).

Get visibility into your high-performance networking infrastructure to help identify bottlenecks and performance issues in data-intensive workloads. By monitoring both standard InfiniBand counters and RDMA hardware counters, you’ll get comprehensive insights into network throughput, errors, and packet statistics across your devices and ports.

Key metrics collected include port counters like bytes/packets transmitted and received, error counts, and RDMA hardware-specific metrics - giving operators the data needed to ensure optimal performance of their high-speed networking infrastructure.

Setup

Follow the instructions below to install and configure this check for an Agent running on a host. The check collects metrics by reading and submitting counters by default from /sys/class/infiniband/<device>/ports/*/counters/ and /sys/class/infiniband/<device>/ports/*/hw_counters/ directories. To ensure that this integration works, ensure that the Agent has the appropriate permissions to access and read the counters from these directories.

Installation

The InfiniBand check is included in the Datadog Agent package. No additional installation is needed on your server.

Configuration

  1. To start collecting your InfiniBand performance data, create and edit the infiniband.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory. See the sample infiniband.d/conf.yaml for all available configuration options.

  2. This check works with minimal configuration. Configure optional parameters, which are provided to better control where the Agent looks for data and what data to collect if the default behaviors are not desired. Options include configuring the directory where counters reside, excluding specific devices/ports, and skipping or adding counters for collection.

init_config:
instances:
  -
    ## @param infiniband_path - string - optional - default: /sys/class/infiniband
    ## The path to the infiniband directory.
    #
    # infiniband_path: /sys/class/infiniband

    ## @param exclude_devices - list of strings - optional
    ## A list of devices to exclude from the check. Devices are located in the infiniband directory. 
    ## The devices are located by default in /sys/class/infiniband.
    #
    # exclude_devices:
    #   - mlx5_0
    #   - efa0
    #   - ib1

    ## @param additional_counters - list of strings - optional
    ## A list of additional counters to collect. The counter names are the files in which the counter 
    ## values are stored. These are located inside /sys/class/infiniband/devices/<device>/ports/<port>/counters.
    #
    # additional_counters:
    #   - additional_counter
    #   - rx_mpwqe_frag

    ## @param additional_hw_counters - list of strings - optional
    ## A list of additional hardware counters to collect. The counter names are the files in which the 
    ## counter values are stored. These are located inside 
    ## /sys/class/infiniband/devices/<device>/ports/<port>/hw_counters.
    #
    # additional_hw_counters:
    #   - additional_hw_counter
    #   - rx_mpwqe_frag

    ## @param exclude_counters - list of strings - optional
    ## A list of counters to exclude from the check.
    #
    # exclude_counters:
    #   - duplicate_request
    #   - lifespan

    ## @param exclude_hw_counters - list of strings - optional
    ## A list of hardware counters to exclude from the check.
    #
    # exclude_hw_counters:
    #   - VL15_dropped
    #   - link_downed
  1. Restart the Agent.

Validation

Run the Agent’s status subcommand and look for infiniband under the Checks section.

Data Collected

Metrics

infiniband.VL15_dropped
(gauge)
Number of incoming VL15 packets dropped due to resource limitations (e.g.,lack of buffers) of the port
Shown as packet
infiniband.VL15_dropped.count
(count)
Number of new VL15 packets dropped due to resource limitations since the last metric submission
Shown as packet
infiniband.excessive_buffer_overrun_errors
(gauge)
Number of excessive buffer overrun errors
Shown as error
infiniband.excessive_buffer_overrun_errors.count
(count)
Number of new excessive buffer overrun errors since the last metric submission
Shown as error
infiniband.link_downed
(gauge)
Number of times the Port Training state machine has failed the link error recovery process and downed the link
Shown as occurrence
infiniband.link_downed.count
(count)
Number of new times the Port Training state machine has downed the link since the last metric submission
Shown as occurrence
infiniband.link_error_recovery
(gauge)
Number of times the Port Training state machine has successfully completed the link error recovery process
Shown as occurrence
infiniband.link_error_recovery.count
(count)
Number of new successful link error recoveries since the last metric submission
Shown as occurrence
infiniband.local_link_integrity_errors
(gauge)
Number of times that the count of local physical errors exceeded the threshold specified by LocalPhyErrors
Shown as error
infiniband.local_link_integrity_errors.count
(count)
Number of new times local physical errors exceeded threshold since the last metric submission
Shown as error
infiniband.multicast_rcv_packets
(gauge)
Number of multicast packets,including multicast packets containing errors (legacy)
Shown as packet
infiniband.multicast_rcv_packets.count
(count)
Number of new multicast packets received since the last metric submission (legacy)
Shown as packet
infiniband.multicast_xmit_packets
(gauge)
Number of multicast packets transmitted on all VLs from the port (legacy)
Shown as packet
infiniband.multicast_xmit_packets.count
(count)
Number of new multicast packets transmitted since the last metric submission (legacy)
Shown as packet
infiniband.phys_state
(gauge)
Physical link state
infiniband.port_multicast_rcv_packets
(gauge)
Number of multicast packets received
Shown as packet
infiniband.port_multicast_rcv_packets.count
(count)
Number of new multicast packets received since the last metric submission
Shown as packet
infiniband.port_multicast_xmit_packets
(gauge)
Number of multicast packets transmitted on all VLs from the port
Shown as packet
infiniband.port_multicast_xmit_packets.count
(count)
Number of new multicast packets transmitted since the last metric submission
Shown as packet
infiniband.port_rcv_constraint_errors
(gauge)
Number of packets received on the switch physical port that are discarded
Shown as error
infiniband.port_rcv_constraint_errors.count
(count)
Number of new packets discarded on receive since the last metric submission
Shown as error
infiniband.port_rcv_data
(gauge)
Number of data octets,divided by 4,received on all VLs from the port
Shown as byte
infiniband.port_rcv_data.count
(count)
Number of new data octets (divided by 4) received since the last metric submission
Shown as byte
infiniband.port_rcv_data_64
(gauge)
Number of data octets,divided by 4,received on all VLs from the port
Shown as byte
infiniband.port_rcv_data_64.count
(count)
Number of new data octets (divided by 4) received since the last metric submission
Shown as byte
infiniband.port_rcv_discards
(gauge)
Number of discarded received packets
Shown as packet
infiniband.port_rcv_discards.count
(count)
Number of new discarded received packets since the last metric submission
Shown as packet
infiniband.port_rcv_errors
(gauge)
Number of packets containing an error that were received on the port
Shown as error
infiniband.port_rcv_errors.count
(count)
Number of new error packets received since the last metric submission
Shown as error
infiniband.port_rcv_packets
(gauge)
Number of packets (this may include packets containing Errors)
Shown as packet
infiniband.port_rcv_packets.count
(count)
Number of new packets received since the last metric submission
Shown as packet
infiniband.port_rcv_packets_64
(gauge)
Number of 64-bit packets received
Shown as packet
infiniband.port_rcv_packets_64.count
(count)
Number of new 64-bit packets received since the last metric submission
Shown as packet
infiniband.port_rcv_remote_physical_errors
(gauge)
Number of packets marked with the EBP delimiter received on the port
Shown as error
infiniband.port_rcv_remote_physical_errors.count
(count)
Number of new packets with EBP delimiter received since the last metric submission
Shown as error
infiniband.port_rcv_switch_relay_errors
(gauge)
Number of packets received on the port that were discarded because they could not be forwarded by the switch relay
Shown as error
infiniband.port_rcv_switch_relay_errors.count
(count)
Number of new packets discarded due to switch relay forwarding failure since the last metric submission
Shown as error
infiniband.port_unicast_rcv_packets
(gauge)
Number of unicast packets,including unicast packets containing errors
Shown as packet
infiniband.port_unicast_rcv_packets.count
(count)
Number of new unicast packets received since the last metric submission
Shown as packet
infiniband.port_unicast_xmit_packets
(gauge)
Number of unicast packets transmitted on all VLs from the port
Shown as packet
infiniband.port_unicast_xmit_packets.count
(count)
Number of new unicast packets transmitted since the last metric submission
Shown as packet
infiniband.port_xmit_constraint_errors
(gauge)
Number of packets not transmitted from the switch physical port
Shown as error
infiniband.port_xmit_constraint_errors.count
(count)
Number of new packets not transmitted due to constraints since the last metric submission
Shown as error
infiniband.port_xmit_data
(gauge)
Number of data octets,divided by 4,transmitted on all VLs from the port
Shown as byte
infiniband.port_xmit_data.count
(count)
Number of new data octets (divided by 4) transmitted since the last metric submission
Shown as byte
infiniband.port_xmit_data_64
(gauge)
64-bit transmitted data volume
Shown as byte
infiniband.port_xmit_data_64.count
(count)
Change in 64-bit transmitted data volume since the last metric submission
Shown as byte
infiniband.port_xmit_discards
(gauge)
Number of outbound packets discarded by the port because the port is down or congested
Shown as packet
infiniband.port_xmit_discards.count
(count)
Number of new outbound packets discarded since the last metric submission
Shown as packet
infiniband.port_xmit_packets
(gauge)
Number of packets transmitted on all VLs from this port
Shown as packet
infiniband.port_xmit_packets.count
(count)
Number of new packets transmitted since the last metric submission
Shown as packet
infiniband.port_xmit_packets_64
(gauge)
Number of 64-bit packets transmitted
Shown as packet
infiniband.port_xmit_packets_64.count
(count)
Number of new 64-bit packets transmitted since the last metric submission
Shown as packet
infiniband.port_xmit_wait
(gauge)
Number of ticks during which the port had data to transmit but no data was sent
infiniband.port_xmit_wait.count
(count)
Number of new transmission wait ticks since the last metric submission
infiniband.rdma.duplicate_request
(gauge)
Number of received packets. A duplicate request is a request that had been previously executed
Shown as error
infiniband.rdma.duplicate_request.count
(count)
Number of new received packets that were duplicate requests since the last metric submission
Shown as error
infiniband.rdma.implied_nak_seq_err
(gauge)
Number of time the requested decided an ACK with a PSN larger than the expected PSN for an RDMA read or response
Shown as error
infiniband.rdma.implied_nak_seq_err.count
(count)
Number of new ACKs with PSN larger than expected since the last metric submission
Shown as error
infiniband.rdma.lifespan
(gauge)
The maximum period in ms which defines the aging of the counter reads
Shown as millisecond
infiniband.rdma.lifespan.count
(count)
Change in maximum aging period since the last metric submission
Shown as millisecond
infiniband.rdma.link_down_events_phy
(gauge)
Number of physical link down events
Shown as occurrence
infiniband.rdma.link_down_events_phy.count
(count)
Number of new physical link down events since the last metric submission
Shown as occurrence
infiniband.rdma.local_ack_timeout_err
(gauge)
Number of times QP's ack timer expired for RC, XRC, DCT QPs at the sender side
Shown as error
infiniband.rdma.local_ack_timeout_err.count
(count)
Number of new QP ack timer expirations since the last metric submission
Shown as error
infiniband.rdma.np_cnp_sent
(gauge)
Number of CNP packets sent by the Notification Point when it noticed congestion experienced
Shown as packet
infiniband.rdma.np_cnp_sent.count
(count)
Number of new CNP packets sent due to congestion since the last metric submission
Shown as packet
infiniband.rdma.np_ecn_marked_roce_packets
(gauge)
Number of RoCEv2 packets received by the notification point which were marked for experiencing congestion
Shown as packet
infiniband.rdma.np_ecn_marked_roce_packets.count
(count)
Number of new congestion-marked RoCEv2 packets received since the last metric submission
Shown as packet
infiniband.rdma.out_of_buffer
(gauge)
Number of drops occurred due to lack of WQE for the associated QPs
Shown as error
infiniband.rdma.out_of_buffer.count
(count)
Number of new drops due to WQE lack since the last metric submission
Shown as error
infiniband.rdma.out_of_sequence
(gauge)
Number of out of sequence packets received
Shown as error
infiniband.rdma.out_of_sequence.count
(count)
Number of new out of sequence packets received since the last metric submission
Shown as error
infiniband.rdma.packet_seq_err
(gauge)
Number of received NAK sequence error packets. The QP retry limit was not exceeded
Shown as error
infiniband.rdma.packet_seq_err.count
(count)
Number of new NAK sequence error packets received since the last metric submission
Shown as error
infiniband.rdma.rdma_read_bytes
(gauge)
Number of bytes read in RDMA operations
Shown as byte
infiniband.rdma.rdma_read_bytes.count
(count)
Number of new bytes read in RDMA operations since the last metric submission
Shown as byte
infiniband.rdma.rdma_read_resp_bytes
(gauge)
Number of bytes in RDMA read responses
Shown as byte
infiniband.rdma.rdma_read_resp_bytes.count
(count)
Number of new bytes in RDMA read responses since the last metric submission
Shown as byte
infiniband.rdma.rdma_read_wr_err
(gauge)
Number of RDMA read work request errors
Shown as error
infiniband.rdma.rdma_read_wr_err.count
(count)
Number of new RDMA read work request errors since the last metric submission
Shown as error
infiniband.rdma.rdma_read_wrs
(gauge)
Number of RDMA read work requests
Shown as request
infiniband.rdma.rdma_read_wrs.count
(count)
Number of new RDMA read work requests since the last metric submission
Shown as request
infiniband.rdma.rdma_write_bytes
(gauge)
Number of bytes written in RDMA operations
Shown as byte
infiniband.rdma.rdma_write_bytes.count
(count)
Number of new bytes written in RDMA operations since the last metric submission
Shown as byte
infiniband.rdma.rdma_write_recv_bytes
(gauge)
Number of bytes received in RDMA write operations
Shown as byte
infiniband.rdma.rdma_write_recv_bytes.count
(count)
Number of new bytes received in RDMA write operations since the last metric submission
Shown as byte
infiniband.rdma.rdma_write_wr_err
(gauge)
Number of RDMA write work request errors
Shown as error
infiniband.rdma.rdma_write_wr_err.count
(count)
Number of new RDMA write work request errors since the last metric submission
Shown as error
infiniband.rdma.rdma_write_wrs
(gauge)
Number of RDMA write work requests
Shown as request
infiniband.rdma.rdma_write_wrs.count
(count)
Number of new RDMA write work requests since the last metric submission
Shown as request
infiniband.rdma.recv_bytes
(gauge)
Number of bytes received in work requests
Shown as byte
infiniband.rdma.recv_bytes.count
(count)
Number of new bytes received in work requests since the last metric submission
Shown as byte
infiniband.rdma.recv_wrs
(gauge)
Number of receive work requests
Shown as request
infiniband.rdma.recv_wrs.count
(count)
Number of new receive work requests since the last metric submission
Shown as request
infiniband.rdma.req_cqe_error
(gauge)
Number of completion queue entry errors (requester)
Shown as error
infiniband.rdma.req_cqe_error.count
(count)
Number of new completion queue entry errors (requester) since the last metric submission
Shown as error
infiniband.rdma.req_cqe_flush_error
(gauge)
Number of completion queue flush errors (requester)
Shown as error
infiniband.rdma.req_cqe_flush_error.count
(count)
Number of new completion queue flush errors (requester) since the last metric submission
Shown as error
infiniband.rdma.req_remote_access_errors
(gauge)
Number of remote access errors (requester)
Shown as error
infiniband.rdma.req_remote_access_errors.count
(count)
Number of new remote access errors (requester) since the last metric submission
Shown as error
infiniband.rdma.req_remote_invalid_request
(gauge)
Number of invalid remote requests
Shown as request
infiniband.rdma.req_remote_invalid_request.count
(count)
Number of new invalid remote requests since the last metric submission
Shown as request
infiniband.rdma.resp_cqe_error
(gauge)
Number of completion queue entry errors (responder)
Shown as error
infiniband.rdma.resp_cqe_error.count
(count)
Number of new completion queue entry errors (responder) since the last metric submission
Shown as error
infiniband.rdma.resp_cqe_flush_error
(gauge)
Number of completion queue flush errors (responder)
Shown as error
infiniband.rdma.resp_cqe_flush_error.count
(count)
Number of new completion queue flush errors (responder) since the last metric submission
Shown as error
infiniband.rdma.resp_local_length_error
(gauge)
Number of local length errors (responder)
Shown as error
infiniband.rdma.resp_local_length_error.count
(count)
Number of new local length errors (responder) since the last metric submission
Shown as error
infiniband.rdma.resp_remote_access_errors
(gauge)
Number of remote access errors (responder)
Shown as error
infiniband.rdma.resp_remote_access_errors.count
(count)
Number of new remote access errors (responder) since the last metric submission
Shown as error
infiniband.rdma.rnr_nak_retry_err
(gauge)
Number of RNR NAK retry errors
Shown as error
infiniband.rdma.rnr_nak_retry_err.count
(count)
Number of new RNR NAK retry errors since the last metric submission
Shown as error
infiniband.rdma.roce_adp_retrans
(gauge)
Number of adaptive retransmissions for RoCE traffic
Shown as occurrence
infiniband.rdma.roce_adp_retrans.count
(count)
Number of new adaptive retransmissions for RoCE traffic since the last metric submission
Shown as occurrence
infiniband.rdma.roce_adp_retrans_to
(gauge)
Number of times RoCE traffic reached timeout due to adaptive retransmission
Shown as occurrence
infiniband.rdma.roce_adp_retrans_to.count
(count)
Number of new RoCE traffic timeouts due to adaptive retransmission since the last metric submission
Shown as occurrence
infiniband.rdma.roce_slow_restart
(gauge)
Number of times RoCE slow restart was used
Shown as occurrence
infiniband.rdma.roce_slow_restart.count
(count)
Number of new RoCE slow restart uses since the last metric submission
Shown as occurrence
infiniband.rdma.roce_slow_restart_cnps
(gauge)
Number of times RoCE slow restart generated CNP packets
Shown as occurrence
infiniband.rdma.roce_slow_restart_cnps.count
(count)
Number of new CNP packets generated by RoCE slow restart since the last metric submission
Shown as occurrence
infiniband.rdma.roce_slow_restart_trans
(gauge)
Number of times RoCE slow restart changed state to slow restart
Shown as occurrence
infiniband.rdma.roce_slow_restart_trans.count
(count)
Number of new RoCE slow restart state changes since the last metric submission
Shown as occurrence
infiniband.rdma.rp_cnp_handled
(gauge)
Number of CNP packets handled
Shown as packet
infiniband.rdma.rp_cnp_handled.count
(count)
Number of new CNP packets handled since the last metric submission
Shown as packet
infiniband.rdma.rp_cnp_ignored
(gauge)
Number of CNP packets ignored
Shown as packet
infiniband.rdma.rp_cnp_ignored.count
(count)
Number of new CNP packets ignored since the last metric submission
Shown as packet
infiniband.rdma.rx_atomic_requests
(gauge)
Number of received atomic RDMA requests
Shown as request
infiniband.rdma.rx_atomic_requests.count
(count)
Number of new received atomic RDMA requests since the last metric submission
Shown as request
infiniband.rdma.rx_buff_alloc_err
(gauge)
Number of receive buffer allocation errors
Shown as error
infiniband.rdma.rx_buff_alloc_err.count
(count)
Number of new receive buffer allocation errors since the last metric submission
Shown as error
infiniband.rdma.rx_bytes
(gauge)
Number of bytes received
Shown as byte
infiniband.rdma.rx_bytes.count
(count)
Number of new bytes received since the last metric submission
Shown as byte
infiniband.rdma.rx_cqe_compress_blks
(gauge)
Number of compressed completion queue blocks
Shown as block
infiniband.rdma.rx_cqe_compress_blks.count
(count)
Number of new compressed completion queue blocks since the last metric submission
Shown as block
infiniband.rdma.rx_cqe_compress_pkts
(gauge)
Number of compressed completion queue packets
Shown as packet
infiniband.rdma.rx_cqe_compress_pkts.count
(count)
Number of new compressed completion queue packets since the last metric submission
Shown as packet
infiniband.rdma.rx_dct_connect
(gauge)
Number of received connection request for the associated DCTs
Shown as connection
infiniband.rdma.rx_dct_connect.count
(count)
Number of new DCT connection requests received since the last metric submission
Shown as connection
infiniband.rdma.rx_drops
(gauge)
Number of dropped packets
Shown as packet
infiniband.rdma.rx_drops.count
(count)
Number of new dropped packets since the last metric submission
Shown as packet
infiniband.rdma.rx_icrc_encapsulated
(gauge)
Number of RoCE packets with ICRC errors
Shown as packet
infiniband.rdma.rx_icrc_encapsulated.count
(count)
Number of new RoCE packets with ICRC errors since the last metric submission
Shown as packet
infiniband.rdma.rx_mpwqe_filler
(gauge)
Number of MPWQE filler events
Shown as event
infiniband.rdma.rx_mpwqe_filler.count
(count)
Number of new MPWQE filler events since the last metric submission
Shown as event
infiniband.rdma.rx_mpwqe_frag
(gauge)
Number of MPWQE fragment events
Shown as event
infiniband.rdma.rx_mpwqe_frag.count
(count)
Number of new MPWQE fragment events since the last metric submission
Shown as event
infiniband.rdma.rx_out_of_buffer
(gauge)
Number of out of buffer events on receive
Shown as event
infiniband.rdma.rx_out_of_buffer.count
(count)
Number of new out of buffer events on receive since the last metric submission
Shown as event
infiniband.rdma.rx_pkts
(gauge)
Number of packets received
Shown as packet
infiniband.rdma.rx_pkts.count
(count)
Number of new packets received since the last metric submission
Shown as packet
infiniband.rdma.rx_read_requests
(gauge)
Number of read requests received
Shown as request
infiniband.rdma.rx_read_requests.count
(count)
Number of new read requests received since the last metric submission
Shown as request
infiniband.rdma.rx_vport_multicast_bytes
(gauge)
Number of multicast bytes received on virtual port
Shown as byte
infiniband.rdma.rx_vport_multicast_bytes.count
(count)
Number of new multicast bytes received on virtual port since the last metric submission
Shown as byte
infiniband.rdma.rx_vport_multicast_packets
(gauge)
Number of multicast packets received on virtual port
Shown as packet
infiniband.rdma.rx_vport_multicast_packets.count
(count)
Number of new multicast packets received on virtual port since the last metric submission
Shown as packet
infiniband.rdma.rx_vport_unicast_bytes
(gauge)
Number of unicast bytes received on virtual port
Shown as byte
infiniband.rdma.rx_vport_unicast_bytes.count
(count)
Number of new unicast bytes received on virtual port since the last metric submission
Shown as byte
infiniband.rdma.rx_vport_unicast_packets
(gauge)
Number of unicast packets received on virtual port
Shown as packet
infiniband.rdma.rx_vport_unicast_packets.count
(count)
Number of new unicast packets received on virtual port since the last metric submission
Shown as packet
infiniband.rdma.rx_wqe_err
(gauge)
Number of work queue entry errors on receive
Shown as error
infiniband.rdma.rx_wqe_err.count
(count)
Number of new work queue entry errors on receive since the last metric submission
Shown as error
infiniband.rdma.rx_write_requests
(gauge)
Number of write requests received
Shown as request
infiniband.rdma.rx_write_requests.count
(count)
Number of new write requests received since the last metric submission
Shown as request
infiniband.rdma.send_bytes
(gauge)
Number of bytes sent
Shown as byte
infiniband.rdma.send_bytes.count
(count)
Number of new bytes sent since the last metric submission
Shown as byte
infiniband.rdma.send_wrs
(gauge)
Number of send work requests
Shown as request
infiniband.rdma.send_wrs.count
(count)
Number of new send work requests since the last metric submission
Shown as request
infiniband.rdma.tx_bytes
(gauge)
Number of bytes transmitted
Shown as byte
infiniband.rdma.tx_bytes.count
(count)
Number of new bytes transmitted since the last metric submission
Shown as byte
infiniband.rdma.tx_pkts
(gauge)
Number of packets transmitted on all VLs from this port
Shown as packet
infiniband.rdma.tx_pkts.count
(count)
Number of new packets transmitted since the last metric submission
Shown as packet
infiniband.rdma.tx_vport_unicast_bytes
(gauge)
Number of unicast bytes transmitted on virtual port
Shown as byte
infiniband.rdma.tx_vport_unicast_bytes.count
(count)
Number of new unicast bytes transmitted on virtual port since the last metric submission
Shown as byte
infiniband.rdma.tx_vport_unicast_packets
(gauge)
Number of unicast packets transmitted on virtual port
Shown as packet
infiniband.rdma.tx_vport_unicast_packets.count
(count)
Number of new unicast packets transmitted on virtual port since the last metric submission
Shown as packet
infiniband.state
(gauge)
Port state
infiniband.symbol_error
(gauge)
Number of minor link errors detected on one or more physical lanes
Shown as error
infiniband.symbol_error.count
(count)
Number of new minor link errors detected since the last metric submission
Shown as error
infiniband.unicast_rcv_packets
(gauge)
Number of unicast packets,including unicast packets containing errors (legacy)
Shown as packet
infiniband.unicast_rcv_packets.count
(count)
Number of new unicast packets received since the last metric submission (legacy)
Shown as packet
infiniband.unicast_xmit_packets
(gauge)
Number of unicast packets transmitted on all VLs from the port (legacy)
Shown as packet
infiniband.unicast_xmit_packets.count
(count)
Number of new unicast packets transmitted since the last metric submission (legacy)
Shown as packet

Events

The InfiniBand integration does not include any events.

Service Checks

The InfiniBand integration does not include any service checks.

Troubleshooting

Need help? Contact Datadog support.