文档文档

Prometheus 输入插件

此插件从 Prometheus 指标端点收集指标,例如实现此类端点的应用程序或 node-exporter 实例。此插件还支持各种服务发现方法。

发布时间: Telegraf v0.1.5 标签: applications, server 操作系统支持: all

全局配置选项

插件支持其他全局和插件配置设置,用于修改指标、标签和字段,创建别名以及配置插件顺序等任务。更多详情请参阅 CONFIGURATION.md

Secret-store 支持

此插件支持来自 secret-stores 的 usernamepasswordbearer_token_string 选项的 secret。有关如何使用它们的更多详细信息,请参阅 secret-store 文档

配置

# Read metrics from one or many prometheus clients
[[inputs.prometheus]]
  ## An array of urls to scrape metrics from.
  urls = ["https://:9100/metrics"]

  ## Metric version controls the mapping from Prometheus metrics into Telegraf metrics.
  ## See "Metric Format Configuration" in plugins/inputs/prometheus/README.md for details.
  ## Valid options: 1, 2
  # metric_version = 1

  ## Url tag name (tag containing scrapped url. optional, default is "url")
  # url_tag = "url"

  ## Whether the timestamp of the scraped metrics will be ignored.
  ## If set to true, the gather time will be used.
  # ignore_timestamp = false

  ## Override content-type of the returned message
  ## Available options are for prometheus:
  ##   text, protobuf-delimiter, protobuf-compact, protobuf-text,
  ## and for openmetrics:
  ##   openmetrics-text, openmetrics-protobuf
  ## By default the content-type of the response is used.
  # content_type_override = ""

  ## An array of Kubernetes services to scrape metrics from.
  # kubernetes_services = ["http://my-service-dns.my-namespace:9100/metrics"]

  ## Kubernetes config file to create client from.
  # kube_config = "/path/to/kubernetes.config"

  ## Scrape Pods
  ## Enable scraping of k8s pods. Further settings as to which pods to scape
  ## are determiend by the 'method' option below. When enabled, the default is
  ## to use annotations to determine whether to scrape or not.
  # monitor_kubernetes_pods = false

  ## Scrape Pods Method
  ## annotations: default, looks for specific pod annotations documented below
  ## settings: only look for pods matching the settings provided, not
  ##   annotations
  ## settings+annotations: looks at pods that match annotations using the user
  ##   defined settings
  # monitor_kubernetes_pods_method = "annotations"

  ## Scrape Pods 'annotations' method options
  ## If set method is set to 'annotations' or 'settings+annotations', these
  ## annotation flags are looked for:
  ## - prometheus.io/scrape: Required to enable scraping for this pod. Can also
  ##     use 'prometheus.io/scrape=false' annotation to opt-out entirely.
  ## - prometheus.io/scheme: If the metrics endpoint is secured then you will
  ##     need to set this to 'https' & most likely set the tls config
  ## - prometheus.io/path: If the metrics path is not /metrics, define it with
  ##     this annotation
  ## - prometheus.io/port: If port is not 9102 use this annotation

  ## Scrape Pods 'settings' method options
  ## When using 'settings' or 'settings+annotations', the default values for
  ## annotations can be modified using with the following options:
  # monitor_kubernetes_pods_scheme = "http"
  # monitor_kubernetes_pods_port = "9102"
  # monitor_kubernetes_pods_path = "/metrics"

  ## Get the list of pods to scrape with either the scope of
  ## - cluster: the kubernetes watch api (default, no need to specify)
  ## - node: the local cadvisor api; for scalability. Note that the config node_ip or the environment variable NODE_IP must be set to the host IP.
  # pod_scrape_scope = "cluster"

  ## Only for node scrape scope: node IP of the node that telegraf is running on.
  ## Either this config or the environment variable NODE_IP must be set.
  # node_ip = "10.180.1.1"

  ## Only for node scrape scope: interval in seconds for how often to get updated pod list for scraping.
  ## Default is 60 seconds.
  # pod_scrape_interval = 60

  ## Content length limit
  ## When set, telegraf will drop responses with length larger than the configured value.
  ## Default is "0KB" which means unlimited.
  # content_length_limit = "0KB"

  ## Restricts Kubernetes monitoring to a single namespace
  ##   ex: monitor_kubernetes_pods_namespace = "default"
  # monitor_kubernetes_pods_namespace = ""
  ## The name of the label for the pod that is being scraped.
  ## Default is 'namespace' but this can conflict with metrics that have the label 'namespace'
  # pod_namespace_label_name = "namespace"
  # label selector to target pods which have the label
  # kubernetes_label_selector = "env=dev,app=nginx"
  # field selector to target pods
  # eg. To scrape pods on a specific node
  # kubernetes_field_selector = "spec.nodeName=$HOSTNAME"

  ## Filter which pod annotations and labels will be added to metric tags
  #
  # pod_annotation_include = ["annotation-key-1"]
  # pod_annotation_exclude = ["exclude-me"]
  # pod_label_include = ["label-key-1"]
  # pod_label_exclude = ["exclude-me"]

  # cache refresh interval to set the interval for re-sync of pods list.
  # Default is 60 minutes.
  # cache_refresh_interval = 60

  ## Use bearer token for authorization. ('bearer_token' takes priority)
  # bearer_token = "/path/to/bearer/token"
  ## OR
  # bearer_token_string = "abc_123"

  ## HTTP Basic Authentication username and password. ('bearer_token' and
  ## 'bearer_token_string' take priority)
  # username = ""
  # password = ""

  ## Optional custom HTTP headers
  # http_headers = {"X-Special-Header" = "Special-Value"}

  ## Specify timeout duration for slower prometheus clients (default is 5s)
  # timeout = "5s"

  ## This option is now used by the HTTP client to set the header response
  ## timeout, not the overall HTTP timeout.
  # response_timeout = "5s"

  ## HTTP Proxy support
  # use_system_proxy = false
  # http_proxy_url = ""

  ## Optional TLS Config
  # tls_ca = /path/to/cafile
  # tls_cert = /path/to/certfile
  # tls_key = /path/to/keyfile

  ## Use TLS but skip chain & host verification
  # insecure_skip_verify = false

  ## Use the given name as the SNI server name on each URL
  # tls_server_name = "myhost.example.org"

  ## TLS renegotiation method, choose from "never", "once", "freely"
  # tls_renegotiation_method = "never"

  ## Enable/disable TLS
  ## Set to true/false to enforce TLS being enabled/disabled. If not set,
  ## enable TLS only if any of the other options are specified.
  # tls_enable = true

  ## This option allows you to report the status of prometheus requests.
  # enable_request_metrics = false

  ## Scrape Services available in Consul Catalog
  # [inputs.prometheus.consul]
  #   enabled = true
  #   agent = "https://:8500"
  #   query_interval = "5m"

  #   [[inputs.prometheus.consul.query]]
  #     name = "a service name"
  #     tag = "a service tag"
  #     url = 'http://{{if ne .ServiceAddress ""}}{{.ServiceAddress}}{{else}}{{.Address}}{{end}}:{{.ServicePort}}/{{with .ServiceMeta.metrics_path}}{{.}}{{else}}metrics{{end}}'
  #     [inputs.prometheus.consul.query.tags]
  #       host = "{{.Node}}"

  ## Scrape Hosts available with http service discovery
  # [inputs.prometheus.http_service_discovery]
  #   enabled = false
  #   url = "https://:9000/service-discovery"
  #   query_interval = "5m"

  ## Control pod scraping based on pod namespace annotations
  ## Pass and drop here act like tagpass and tagdrop, but instead
  ## of filtering metrics they filters pod candidates for scraping
  #[inputs.prometheus.namespace_annotation_pass]
  # annotation_key = ["value1", "value2"]
  #[inputs.prometheus.namespace_annotation_drop]
  # some_annotation_key = ["dont-scrape"]

urls 也可以包含一个 unix 套接字。如果需要不同的路径(http[s] 和 unix 的默认路径均为 /metrics),请按如下方式将 path 添加为查询参数:unix:///var/run/prometheus.sock?path=/custom/metrics

指标格式配置

metric_version 设置控制 telegraf 如何将 prometheus 格式的指标转换为 telegraf 指标。有两个选项。

使用 metric_version = 1 时,prometheus 指标名称将成为 telegraf 指标名称。Prometheus 标签成为 telegraf 标签。Prometheus 值成为 telegraf 字段值。字段具有基于 prometheus 指标类型的通用键。此选项会生成密集(非稀疏)的指标。密集性对于某些输出很有用,包括那些对面向行的数据更有效的输出。

metric_version = 2 在几个方面有所不同。Prometheus 指标名称将成为 telegraf 字段键。指标包含多个值,字段键不是通用的。生成的指标是稀疏的,但对于某些输出,它们可能更容易处理或查询,包括那些对面向列的数据更有效的输出。telegraf 指标名称对于输入实例中的所有指标都相同。可以使用 name_override 设置进行设置,默认为“prometheus”。要具有多个指标名称,可以使用插件的多个实例,每个实例都有自己的 name_override

metric_version = 2 使用与直方图聚合器相同的直方图格式

示例输出部分展示了两个选项的示例。

当此插件与 prometheus_client 输出一起使用时,请在两者中使用相同的选项,以确保指标在不修改的情况下进行往返。

Kubernetes 服务发现

kubernetes_services 参数中列出的 URL 将通过查找分配给主机名的所有 A 记录来展开,如 Kubernetes DNS 服务发现中所述。

此方法可用于定位所有 Kubernetes 无头服务

Kubernetes 抓取

启用此选项将允许插件抓取 Kubernetes pod 上的 prometheus 注释。目前,您可以在 Kubernetes 集群中运行此插件,或者我们使用 kubeconfig 文件来确定要监视的位置。目前支持以下注释

  • prometheus.io/scrape 为此 pod 启用抓取。
  • prometheus.io/scheme 如果指标端点已加密,则需要将其设置为 https & 最有可能设置 tls 配置。(默认值为 ‘http’)
  • prometheus.io/path 覆盖服务上指标端点的路径。(默认值为 ‘/metrics’)
  • prometheus.io/port 用于覆盖端口。(默认值为 9102)

使用 monitor_kubernetes_pods_namespace 选项允许您限制要抓取的 pod。

pod_namespace_label_name 设置允许您更改要抓取的 pod 的命名空间的标签名称。默认值为 namespace,但这将覆盖从抓取的指标中具有 namespace 名称的标签。

使用 pod_scrape_scope = "node" 可以更具可伸缩性地抓取 pod,它将仅抓取 telegraf 运行所在节点中的 pod。它将从节点的 kubelet 本地获取 pod 列表。这需要在集群的每个节点中运行 Telegraf。请注意,必须在配置中指定 node_ip,或者必须将环境变量 NODE_IP 设置为主机 IP。后者可以在运行 telegraf 的 pod 的 yaml 中完成

env:
  - name: NODE_IP
    valueFrom:
      fieldRef:
        fieldPath: status.hostIP

如果使用节点级别抓取范围,pod_scrape_interval 指定应每隔多长时间(以秒为单位)更新用于抓取的 pod 列表。如果未指定,则默认为 60 秒。

运行 telegraf 的 pod 需要具有适当的 rbac 配置,以便允许调用 k8s api 来发现和监视集群中的 pod。典型的配置将创建一个服务帐户、一个具有适当规则的集群角色以及一个将集群角色绑定到服务帐户的集群角色绑定。集群级别发现配置示例

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: telegraf-k8s-role-{{.Release.Name}}
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
---
# Rolebinding for namespace to cluster-admin
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: telegraf-k8s-role-{{.Release.Name}}
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: telegraf-k8s-role-{{.Release.Name}}
subjects:
- kind: ServiceAccount
  name: telegraf-k8s-{{ .Release.Name }}
  namespace: {{ .Release.Namespace }}
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: telegraf-k8s-{{ .Release.Name }}

Consul 服务发现

启用此选项并配置 consul agent url 将允许插件查询 consul catalog 以获取可用服务。使用 query_interval,插件将定期查询 consul catalog 以查找具有 nametag 的服务,并刷新抓取 URL 列表。它可以利用 catalog 中的信息来构建抓取的 URL 和来自模板的附加标签。

可以配置多个 consul 查询,每个查询用于不同的服务。以下示例字段可用于 url 或 tag 模板

  • 节点
  • 地址
  • NodeMeta
  • ServicePort
  • ServiceAddress
  • ServiceTags
  • ServiceMeta

有关可用字段及其类型的完整列表,请参阅 https://github.com/hashicorp/consul/blob/master/api/catalog.go 中的 struct CatalogService。

HTTP 服务发现

启用此选项并配置 url 将允许插件查询给定的 http 服务发现端点以获取可用主机。使用 query_interval,插件将定期查询端点以获取服务并刷新抓取 URL 列表。它可以利用响应中的信息来构建抓取的 URL 和附加标签。

有关 http 服务发现格式的更多信息,请参阅 prometheus 文档

Bearer Token

如果设置,将在每个间隔读取 bearer_token 参数指定的文件,并将其内容附加到 Authorization 标头中的 Bearer 字符串。

Caddy HTTP 服务器用法

使用 Telegraf 的 Prometheus 输入插件监视 Caddy 的步骤

[[inputs.prometheus]]
#   ## An array of urls to scrape metrics from.
  urls = ["https://:2019/metrics"]

这是 Caddy 发送数据的默认 URL。有关更多详细信息,请阅读 Caddy Prometheus 文档

Metrics

度量名称基于指标系列,并且为每个标签创建标签。值被添加到基于指标类型的字段中。

所有指标都收到 url 标签,指示 Telegraf 配置中指定的关联 URL。如果使用 Kubernetes 服务发现,还会添加 address 标签,指示发现的 IP 地址。

  • prometheus_request
    • 标签 (tags)
      • url
      • address
    • 字段 (fields)
      • response_time (float, seconds)
      • content_length (int, response body length)

示例输出

来源

# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 7.4545e-05
go_gc_duration_seconds{quantile="0.25"} 7.6999e-05
go_gc_duration_seconds{quantile="0.5"} 0.000277935
go_gc_duration_seconds{quantile="0.75"} 0.000706591
go_gc_duration_seconds{quantile="1"} 0.000706591
go_gc_duration_seconds_sum 0.00113607
go_gc_duration_seconds_count 4
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 15
# HELP cpu_usage_user Telegraf collected metric
# TYPE cpu_usage_user gauge
cpu_usage_user{cpu="cpu0"} 1.4112903225816156
cpu_usage_user{cpu="cpu1"} 0.702106318955865
cpu_usage_user{cpu="cpu2"} 2.0161290322588776
cpu_usage_user{cpu="cpu3"} 1.5045135406226022

输出

go_gc_duration_seconds,url=http://example.org:9273/metrics 1=0.001336611,count=14,sum=0.004527551,0=0.000057965,0.25=0.000083812,0.5=0.000286537,0.75=0.000365303 1505776733000000000
go_goroutines,url=http://example.org:9273/metrics gauge=21 1505776695000000000
cpu_usage_user,cpu=cpu0,url=http://example.org:9273/metrics gauge=1.513622603430151 1505776751000000000
cpu_usage_user,cpu=cpu1,url=http://example.org:9273/metrics gauge=5.829145728641773 1505776751000000000
cpu_usage_user,cpu=cpu2,url=http://example.org:9273/metrics gauge=2.119071644805144 1505776751000000000
cpu_usage_user,cpu=cpu3,url=http://example.org:9273/metrics gauge=1.5228426395944945 1505776751000000000
prometheus_request,result=success,url=http://example.org:9273/metrics content_length=179013i,http_response_code=200i,response_time=0.051521601 1505776751000000000

输出(当 metric_version = 2 时)

prometheus,quantile=1,url=http://example.org:9273/metrics go_gc_duration_seconds=0.005574303 1556075100000000000
prometheus,quantile=0.75,url=http://example.org:9273/metrics go_gc_duration_seconds=0.0001046 1556075100000000000
prometheus,quantile=0.5,url=http://example.org:9273/metrics go_gc_duration_seconds=0.0000719 1556075100000000000
prometheus,quantile=0.25,url=http://example.org:9273/metrics go_gc_duration_seconds=0.0000579 1556075100000000000
prometheus,quantile=0,url=http://example.org:9273/metrics go_gc_duration_seconds=0.0000349 1556075100000000000
prometheus,url=http://example.org:9273/metrics go_gc_duration_seconds_count=324,go_gc_duration_seconds_sum=0.091340353 1556075100000000000
prometheus,url=http://example.org:9273/metrics go_goroutines=15 1556075100000000000
prometheus,cpu=cpu0,url=http://example.org:9273/metrics cpu_usage_user=1.513622603430151 1505776751000000000
prometheus,cpu=cpu1,url=http://example.org:9273/metrics cpu_usage_user=5.829145728641773 1505776751000000000
prometheus,cpu=cpu2,url=http://example.org:9273/metrics cpu_usage_user=2.119071644805144 1505776751000000000
prometheus,cpu=cpu3,url=http://example.org:9273/metrics cpu_usage_user=1.5228426395944945 1505776751000000000
prometheus_request,result=success,url=http://example.org:9273/metrics content_length=179013i,http_response_code=200i,response_time=0.051521601 1505776751000000000

包含时间戳的输出

下面是一个包含时间戳的 Prometheus 指标的示例

# TYPE test_counter counter
test_counter{label="test"} 1 1685443805885

Telegraf 将生成以下指标

test_counter,address=127.0.0.1,label=test counter=1 1685443805885000000

使用标准配置

[[inputs.prometheus]]
  ## An array of urls to scrape metrics from.
  urls = ["https://:2019/metrics"]

请注意: Prometheus 端点生成的指标是以毫秒精度生成的。默认的 Telegraf 代理级别精度设置会将此降低到秒。将代理或插件级别的 precision 设置更改为毫秒或更小,以报告具有完整精度的指标时间戳。


此页面是否有帮助?

感谢您的反馈!


InfluxDB 3.8 新特性

InfluxDB 3.8 和 InfluxDB 3 Explorer 1.6 的主要增强功能。

查看博客文章

InfluxDB 3.8 现已适用于 Core 和 Enterprise 版本,同时发布了 InfluxDB 3 Explorer UI 的 1.6 版本。本次发布着重于操作成熟度,以及如何更轻松地部署、管理和可靠地运行 InfluxDB。

更多信息,请查看

InfluxDB Docker 的 latest 标签将指向 InfluxDB 3 Core

在 **2026 年 2 月 3 日**,InfluxDB Docker 镜像的 latest 标签将指向 InfluxDB 3 Core。为避免意外升级,请在您的 Docker 部署中使用特定的版本标签。

如果使用 Docker 来安装和运行 InfluxDB,latest 标签将指向 InfluxDB 3 Core。为避免意外升级,请在您的 Docker 部署中使用特定的版本标签。例如,如果使用 Docker 运行 InfluxDB v2,请将 latest 版本标签替换为 Docker pull 命令中的特定版本标签 — 例如

docker pull influxdb:2