Telegraf v1.5.0+

S.M.A.R.T. 输入插件

此插件使用 smartmontools 包收集存储设备的 SMART (自我监控、分析和报告技术) 信息。此插件还通过使用 nvme-cli 包支持 NVMe 设备。

此插件要求在您的系统上安装 smartmontools 包，并在 NVMe 设备上安装 nvme-cli 包。smartctl 和 nvme 命令必须可由 Telegraf 执行。

引入于: Telegraf v1.5.0 标签: hardware, system 操作系统支持: all

全局配置选项

插件支持其他全局和插件配置设置，用于修改指标、标签和字段，创建别名以及配置插件顺序等任务。更多详情请参阅 CONFIGURATION.md。

配置

# Read metrics from storage devices supporting S.M.A.R.T.
[[inputs.smart]]
    ## Optionally specify the path to the smartctl executable
    # path_smartctl = "/usr/bin/smartctl"

    ## Optionally specify the path to the nvme-cli executable
    # path_nvme = "/usr/bin/nvme"

    ## Optionally specify if vendor specific attributes should be propagated for NVMe disk case
    ## ["auto-on"] - automatically find and enable additional vendor specific disk info
    ## ["vendor1", "vendor2", ...] - e.g. "Intel" enable additional Intel specific disk info
    # enable_extensions = ["auto-on"]

    ## On most platforms used cli utilities requires root access.
    ## Setting 'use_sudo' to true will make use of sudo to run smartctl or nvme-cli.
    ## Sudo must be configured to allow the telegraf user to run smartctl or nvme-cli
    ## without a password.
    # use_sudo = false

    ## Adds an extra tag "device_type", which can be used to differentiate
    ## multiple disks behind the same controller (e.g., MegaRAID).
    # tag_with_device_type = false

    ## Skip checking disks in this power mode. Defaults to
    ## "standby" to not wake up disks that have stopped rotating.
    ## See --nocheck in the man pages for smartctl.
    ## smartctl version 5.41 and 5.42 have faulty detection of
    ## power mode and might require changing this value to
    ## "never" depending on your disks.
    # nocheck = "standby"

    ## Gather all returned S.M.A.R.T. attribute metrics and the detailed
    ## information from each drive into the 'smart_attribute' measurement.
    # attributes = false

    ## Optionally specify devices to exclude from reporting if disks auto-discovery is performed.
    # excludes = [ "/dev/pass6" ]

    ## Optionally specify devices and device type, if unset
    ## a scan (smartctl --scan and smartctl --scan -d nvme) for S.M.A.R.T. devices will be done
    ## and all found will be included except for the excluded in excludes.
    # devices = [ "/dev/ada0 -d atacam", "/dev/nvme0"]

    ## Timeout for the cli command to complete.
    # timeout = "30s"

    ## Optionally call smartctl and nvme-cli with a specific concurrency policy.
    ## By default, smartctl and nvme-cli are called in separate threads (goroutines) to gather disk attributes.
    ## Some devices (e.g. disks in RAID arrays) may have access limitations that require sequential reading of
    ## SMART data - one individual array drive at the time. In such case please set this configuration option
    ## to "sequential" to get readings for all drives.
    ## valid options: concurrent, sequential
    # read_method = "concurrent"

权限

需要注意的是，此插件会引用 smartctl 和 nvme-cli，这可能需要额外的权限才能成功执行。根据执行此插件的 telegraf 用户的用户/组权限，您可能需要使用 sudo。

您需要在 telegraf 配置中添加以下内容

[[inputs.smart]]
  use_sudo = true

您还需要更新您的 sudoers 文件

$ visudo
# For smartctl add the following lines:
Cmnd_Alias SMARTCTL = /usr/bin/smartctl
telegraf  ALL=(ALL) NOPASSWD: SMARTCTL
Defaults!SMARTCTL !logfile, !syslog, !pam_session

# For nvme-cli add the following lines:
Cmnd_Alias NVME = /path/to/nvme
telegraf  ALL=(ALL) NOPASSWD: NVME
Defaults!NVME !logfile, !syslog, !pam_session

要使用 sudo 包装器脚本来运行 smartctl 或 nvme，可以创建一个。配置中的 path_smartctl 或 path_nvme 应设置为执行此脚本。

SMART 特定属性

SMART 是包含在计算机硬盘驱动器 (HDD) 和固态驱动器 (SSD) 中的监控系统，它检测和报告驱动器可靠性的各种指标，目的是能够预测硬件故障。

SMART 信息被分成不同的测量值：smart_device 用于一般信息，而 smart_attribute 存储详细的属性信息，前提是在插件配置中启用了 attributes = true。

如果没有指定设备，插件将通过以下命令扫描 SMART 设备

smartctl --scan

指标将从以下 smartctl 命令报告

smartctl --info --attributes --health -n <nocheck> --format=brief <device>

此插件支持 smartmontools 版本 5.41 及更高版本，但 v. 5.41 和 v. 5.42 可能需要设置 nocheck。请参阅示例配置中的注释。此外，NVMe 功能是在版本 6.5 中引入的。

要在存储设备上启用 SMART，请运行

smartctl -s on <device>

NVMe 厂商特定属性

对于 NVMe 磁盘类型，插件可以使用命令行实用程序 nvme-cli。它具有轻松访问厂商特定属性的功能。此插件支持 nmve-cli 版本 1.5 及更高版本。如果 nvme-cli 不存在，将无法获取 NVMe 厂商特定指标。

NVMe 磁盘的厂商特定 SMART 指标可能会从以下 nvme 命令报告

nvme <vendor> smart-log-add <device>

请注意，nvme-cli 的厂商插件可能需要不同的命名约定和报告格式。

要查看已安装的插件扩展，具体取决于 nvme-cli 的版本，请查看底部

nvme help

要收集磁盘厂商 ID (vid)，可以使用 id-ctrl

nvme id-ctrl <device>

vid 和公司之间的关联可以在会员列表中找到。

设备是否属于 NVMe 或非 NVMe 将通过以下方式确定

smartctl --scan

和

smartctl --scan -d nvme

Metrics

smart_device
- 标签 (tags)
  - capacity
  - device
  - device_type (仅当 tag_with_device_type 设置为 true 时才会发出)
  - enabled
  - model
  - serial_no
  - wwn
- 字段 (fields)
  - exit_status
  - health_ok
  - media_wearout_indicator
  - percent_lifetime_remain
  - read_error_rate
  - seek_error
  - temp_c
  - udma_crc_errors
  - wear_leveling_count
smart_attribute
- 标签 (tags)
  - capacity
  - device
  - device_type (仅当 tag_with_device_type 设置为 true 时才会发出)
  - enabled
  - fail
  - flags
  - id
  - model
  - name
  - serial_no
  - wwn
- 字段 (fields)
  - exit_status
  - raw_value
  - threshold
  - value
  - worst

Flags

标签 flags 的解释是

K 自动保留
C 事件计数
R 错误率
S 速度/性能
O 在线更新
P 预故障警告

Exit Status

exit_status 字段捕获所使用的 cli 实用程序命令的退出状态，该状态由位掩码定义。有关位掩码的解释，请参阅 smartctl 或 nvme-cli 的 man 手册页。

设备名称

设备名称，例如 /dev/sda，是非持久性的，并且可能在重启或系统更改后发生变化。相反，您可以使用全球唯一名称 (WWN) 或序列号来识别设备。在 Linux 上，块设备可以通过 WWN 在以下位置引用：/dev/disk/by-id/。

故障排除

如果您希望看到比此插件显示的更多 SMART 指标，请确保使用正确版本的 smartctl 或 nvme-cli 实用程序，该实用程序具有收集所需数据的能力。此外，请检查您的设备功能，因为并非所有 SMART 指标都是强制性的。例如，温度传感器的数量取决于设备规格。

如果此插件未能按预期为您的 SMART 启用设备工作，请运行这些命令并将输出包含在错误报告中

对于非 NVMe 设备 (从 smartctl 版本 >= 7.0 开始，它也将默认返回 NVMe 设备)

smartctl --scan

对于 NVMe 设备

smartctl --scan -d nvme

运行以下命令，替换 NOCHECK 的配置设置和 DEVICE (设备名称可以从上一个命令中获取)

smartctl --info --health --attributes --tolerance=verypermissive --nocheck NOCHECK --format=brief -d DEVICE

如果您尝试收集厂商特定指标，请提供此命令并替换厂商和设备以匹配您的情况

nvme VENDOR smart-log-add DEVICE

如果您在配置文件中指定了 devices 数组，而 Telegraf 只显示一个设备的数据，您应该更改插件配置，以便按顺序收集磁盘属性，而不是在单独的线程（goroutines）中收集。要做到这一点，请在插件配置中找到 read_method 并将其更改为 sequential

    ## Optionally call smartctl and nvme-cli with a specific concurrency policy.
    ## By default, smartctl and nvme-cli are called in separate threads (goroutines) to gather disk attributes.
    ## Some devices (e.g. disks in RAID arrays) may have access limitations that require sequential reading of
    ## SMART data - one individual array drive at the time. In such case please set this configuration option
    ## to "sequential" to get readings for all drives.
    ## valid options: concurrent, sequential
    read_method = "sequential"

示例输出

smart_device,enabled=Enabled,host=mbpro.local,device=rdisk0,model=APPLE\ SSD\ SM0512F,serial_no=S1K5NYCD964433,wwn=5002538655584d30,capacity=500277790720 udma_crc_errors=0i,exit_status=0i,health_ok=true,read_error_rate=0i,temp_c=40i 1502536854000000000
smart_attribute,capacity=500277790720,device=rdisk0,enabled=Enabled,fail=-,flags=-O-RC-,host=mbpro.local,id=199,model=APPLE\ SSD\ SM0512F,name=UDMA_CRC_Error_Count,serial_no=S1K5NYCD964433,wwn=5002538655584d30 exit_status=0i,raw_value=0i,threshold=0i,value=200i,worst=200i 1502536854000000000
smart_attribute,capacity=500277790720,device=rdisk0,enabled=Enabled,fail=-,flags=-O---K,host=mbpro.local,id=199,model=APPLE\ SSD\ SM0512F,name=Unknown_SSD_Attribute,serial_no=S1K5NYCD964433,wwn=5002538655584d30 exit_status=0i,raw_value=0i,threshold=0i,value=100i,worst=100i 1502536854000000000

此页面是否有帮助？

感谢您的反馈！

支持和反馈

感谢您成为我们社区的一员！我们欢迎并鼓励您对 Telegraf 和本文档提出反馈和 bug 报告。要获取支持，请使用以下资源

具有年度合同或支持合同的客户可以联系 InfluxData 支持。

编辑此页面提交文档问题提交 Telegraf 问题

S.M.A.R.T. 输入插件

全局配置选项

配置

权限

SMART 特定属性

NVMe 厂商特定属性

Metrics

Flags

Exit Status

设备名称

故障排除

示例输出

支持和反馈

InfluxDB 3.8 新特性

InfluxDB Docker 的 latest 标签将指向 InfluxDB 3 Core

S.M.A.R.T. 输入插件

全局配置选项

配置

权限

SMART 特定属性

NVMe 厂商特定属性

Metrics

Flags

Exit Status

设备名称

故障排除

示例输出

Related

支持和反馈

您在哪里运行 InfluxDB？

AWS

GCP

Azure

默认

自定义

感谢您的反馈！

InfluxDB 3.8 新特性

InfluxDB Docker 的 latest 标签将指向 InfluxDB 3 Core