InfiniBand 输入插件
此插件收集系统中所有 InfiniBand 设备和端口的统计信息。这些计数器可以在 /sys/class/infiniband/<dev>/port/<port>/counters/ 中找到,RDMA 计数器可以在 /sys/class/infiniband/<dev>/ports/<port>/hw_counters/ 中找到。
引入于: Telegraf v1.14.0 标签: network 操作系统支持: linux
全局配置选项
插件支持其他全局和插件配置设置,用于修改指标、标签和字段,创建别名以及配置插件顺序等任务。更多详情请参阅 CONFIGURATION.md。
配置
# Gets counters from all InfiniBand cards and ports installed
# This plugin ONLY supports Linux
[[inputs.infiniband]]
# no configuration
## Collect RDMA counters
# gather_rdma = falseMetrics
实际指标取决于 InfiniBand 设备,该插件使用从计数器 -> 计数器值的简单映射。
Nvidia 提供了有关收集的计数器的信息。
选择 counters 时,插件会发出以下字段
- infiniband
标签 (tags)
- device
- port
字段 (fields)
Infiniband 计数器
- excessive_buffer_overrun_errors (integer)
- link_downed (integer)
- link_error_recovery (integer)
- local_link_integrity_errors (integer)
- multicast_rcv_packets (integer)
- multicast_xmit_packets (integer)
- port_rcv_constraint_errors (integer)
- port_rcv_data (integer)
- port_rcv_errors (integer)
- port_rcv_packets (integer)
- port_rcv_remote_physical_errors (integer)
- port_rcv_switch_relay_errors (integer)
- port_xmit_constraint_errors (integer)
- port_xmit_data (integer)
- port_xmit_discards (integer)
- port_xmit_packets (integer)
- port_xmit_wait (integer)
- symbol_error (integer)
- unicast_rcv_packets (integer)
- unicast_xmit_packets (integer)
- VL15_dropped (integer)
Infiniband RDMA 计数器
- duplicate_request (integer)
- implied_nak_seq_err (integer)
- lifespan (integer)
- local_ack_timeout_err (integer)
- np_cnp_sent (integer)
- np_ecn_marked_roce_packets (integer)
- out_of_buffer (integer)
- out_of_sequence (integer)
- packet_seq_err (integer)
- req_cqe_error (integer)
- req_cqe_flush_error (integer)
- req_remote_access_errors (integer)
- req_remote_invalid_request (integer)
- resp_cqe_error (integer)
- resp_cqe_flush_error (integer)
- resp_local_length_error (integer)
- resp_remote_access_errors (integer)
- rnr_nak_retry_err (integer)
- roce_adp_retrans (integer)
- roce_adp_retrans_to (integer)
- roce_slow_restart (integer)
- roce_slow_restart_cnps (integer)
- roce_slow_restart_trans (integer)
- rp_cnp_handled (integer)
- rp_cnp_ignored (integer)
- rx_atomic_requests (integer)
- rx_icrc_encapsulated (integer)
- rx_read_requests (integer)
- rx_write_requests (integer)
示例输出
infiniband,device=mlx5_bond_0,host=hop-r640-12,port=1 port_xmit_data=85378896588i,VL15_dropped=0i,port_rcv_packets=34914071i,port_rcv_data=34600185253i,port_xmit_discards=0i,link_downed=0i,local_link_integrity_errors=0i,symbol_error=0i,link_error_recovery=0i,multicast_rcv_packets=0i,multicast_xmit_packets=0i,unicast_xmit_packets=82002535i,excessive_buffer_overrun_errors=0i,port_rcv_switch_relay_errors=0i,unicast_rcv_packets=34914071i,port_xmit_constraint_errors=0i,port_rcv_errors=0i,port_xmit_wait=0i,port_rcv_remote_physical_errors=0i,port_rcv_constraint_errors=0i,port_xmit_packets=82002535i 1737652060000000000
infiniband,device=mlx5_bond_0,host=hop-r640-12,port=1 local_ack_timeout_err=0i,lifespan=10i,out_of_buffer=0i,resp_remote_access_errors=0i,resp_local_length_error=0i,np_cnp_sent=0i,roce_slow_restart=0i,rx_read_requests=6000i,duplicate_request=0i,resp_cqe_error=0i,rx_write_requests=19000i,roce_slow_restart_cnps=0i,rx_icrc_encapsulated=0i,rnr_nak_retry_err=0i,roce_adp_retrans=0i,out_of_sequence=0i,req_remote_access_errors=0i,roce_slow_restart_trans=0i,req_remote_invalid_request=0i,req_cqe_error=0i,resp_cqe_flush_error=0i,packet_seq_err=0i,roce_adp_retrans_to=0i,np_ecn_marked_roce_packets=0i,rp_cnp_handled=0i,implied_nak_seq_err=0i,rp_cnp_ignored=0i,req_cqe_flush_error=0i,rx_atomic_requests=0i 1737652060000000000此页面是否有帮助?
感谢您的反馈!
支持和反馈
感谢您成为我们社区的一员!我们欢迎并鼓励您对 Telegraf 和本文档提出反馈和 bug 报告。要获取支持,请使用以下资源
具有年度合同或支持合同的客户可以 联系 InfluxData 支持。