文档文档

Amazon CloudWatch Statistics Input Plugin

此插件将从 Amazon CloudWatch 收集指标统计信息。

引入于: Telegraf v0.12.1 标签: cloud 操作系统支持: all

Amazon 认证

此插件使用凭证链进行 CloudWatch API 端点的认证。插件将按以下顺序尝试进行认证。

  1. 如果指定了 role_arn 属性,则通过 STS 假定凭证 (源凭证从后续规则评估)
  2. 来自 access_keysecret_keytoken 属性的显式凭证
  3. 来自 profile 属性的共享配置文件
  4. 环境变量
  5. 共享凭证
  6. EC2 实例配置文件

全局配置选项

插件支持其他全局和插件配置设置,用于修改指标、标签和字段,创建别名以及配置插件顺序等任务。更多详情请参阅 CONFIGURATION.md

配置

# Pull Metric Statistics from Amazon CloudWatch
[[inputs.cloudwatch]]
  ## Amazon Region
  region = "us-east-1"

  ## Amazon Credentials
  ## Credentials are loaded in the following order
  ## 1) Web identity provider credentials via STS if role_arn and
  ##    web_identity_token_file are specified
  ## 2) Assumed credentials via STS if role_arn is specified
  ## 3) explicit credentials from 'access_key' and 'secret_key'
  ## 4) shared profile from 'profile'
  ## 5) environment variables
  ## 6) shared credentials file
  ## 7) EC2 Instance Profile
  # access_key = ""
  # secret_key = ""
  # token = ""
  # role_arn = ""
  # web_identity_token_file = ""
  # role_session_name = ""
  # profile = ""
  # shared_credential_file = ""

  ## If you are using CloudWatch cross-account observability, you can
  ## set IncludeLinkedAccounts to true in a monitoring account
  ## and collect metrics from the linked source accounts
  # include_linked_accounts = false

  ## Endpoint to make request against, the correct endpoint is automatically
  ## determined and this option should only be set if you wish to override the
  ## default.
  ##   ex: endpoint_url = "https://:8000"
  # endpoint_url = ""

  ## Set http_proxy
  # use_system_proxy = false
  # http_proxy_url = "https://:8888"

  ## The minimum period for Cloudwatch metrics is 1 minute (60s). However not
  ## all metrics are made available to the 1 minute period. Some are collected
  ## at 3 minute, 5 minute, or larger intervals.
  ## See https://aws.amazon.com/cloudwatch/faqs/#monitoring.
  ## Note that if a period is configured that is smaller than the minimum for a
  ## particular metric, that metric will not be returned by the Cloudwatch API
  ## and will not be collected by Telegraf.
  #
  ## Requested CloudWatch aggregation Period (required)
  ## Must be a multiple of 60s.
  period = "5m"

  ## Collection Delay (required)
  ## Must account for metrics availability via CloudWatch API
  delay = "5m"

  ## Recommended: use metric 'interval' that is a multiple of 'period' to avoid
  ## gaps or overlap in pulled data
  interval = "5m"

  ## Recommended if "delay" and "period" are both within 3 hours of request
  ## time. Invalid values will be ignored. Recently Active feature will only
  ## poll for CloudWatch ListMetrics values that occurred within the last 3h.
  ## If enabled, it will reduce total API usage of the CloudWatch ListMetrics
  ## API and require less memory to retain.
  ## Do not enable if "period" or "delay" is longer than 3 hours, as it will
  ## not return data more than 3 hours old.
  ## See https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_ListMetrics.html
  # recently_active = "PT3H"

  ## Configure the TTL for the internal cache of metrics.
  # cache_ttl = "1h"

  ## Metric Statistic Namespaces, wildcards are allowed
  # namespaces = ["*"]

  ## Metric Format
  ## This determines the format of the produces metrics. 'sparse', the default
  ## will produce a unique field for each statistic. 'dense' will report all
  ## statistics will be in a field called value and have a metric_name tag
  ## defining the name of the statistic. See the plugin README for examples.
  # metric_format = "sparse"

  ## Maximum requests per second. Note that the global default AWS rate limit
  ## is 50 reqs/sec, so if you define multiple namespaces, these should add up
  ## to a maximum of 50.
  ## See http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_limits.html
  # ratelimit = 25

  ## Timeout for http requests made by the cloudwatch client.
  # timeout = "5s"

  ## Batch Size
  ## The size of each batch to send requests to Cloudwatch. 500 is the
  ## suggested largest size. If a request gets to large (413 errors), consider
  ## reducing this amount.
  # batch_size = 500

  ## Namespace-wide statistic filters. These allow fewer queries to be made to
  ## cloudwatch.
  # statistic_include = ["average", "sum", "minimum", "maximum", sample_count"]
  # statistic_exclude = []

  ## Metrics to Pull
  ## Defaults to all Metrics in Namespace if nothing is provided
  ## Refreshes Namespace available metrics every 1h
  #[[inputs.cloudwatch.metrics]]
  #  names = ["Latency", "RequestCount"]
  #
  #  ## Statistic filters for Metric.  These allow for retrieving specific
  #  ## statistics for an individual metric.
  #  # statistic_include = ["average", "sum", "minimum", "maximum", sample_count"]
  #  # statistic_exclude = []
  #
  #  ## Dimension filters for Metric.
  #  ## All dimensions defined for the metric names must be specified in order
  #  ## to retrieve the metric statistics.
  #  ## 'value' has wildcard / 'glob' matching support such as 'p-*'.
  #  [[inputs.cloudwatch.metrics.dimensions]]
  #    name = "LoadBalancerName"
  #    value = "p-example"

请注意,namespace 选项已弃用,改为使用 namespaces 列表选项。

要求和术语

插件配置利用 CloudWatch 概念和访问模式,以便监控任何 CloudWatch 指标。

  • region 必须是有效的 AWS 区域
  • period 必须是有效的 CloudWatch 周期
  • namespaces 必须是有效的 CloudWatch 命名空间值列表
  • names 必须是有效的 CloudWatch 指标名称
  • dimensions 必须是有效的 CloudWatch 维度名称/值对

省略或将维度值指定为 '*' 会检索包含具有指定名称的维度的所有可用指标。如果指定了多个维度,则指标必须包含所有已配置的维度,其中通配符维度的值将被忽略。

示例

[[inputs.cloudwatch]]
  period = "1m"
  interval = "5m"

  [[inputs.cloudwatch.metrics]]
    names = ["Latency"]

    ## Dimension filters for Metric (optional)
    [[inputs.cloudwatch.metrics.dimensions]]
      name = "LoadBalancerName"
      value = "p-example"

    [[inputs.cloudwatch.metrics.dimensions]]
      name = "AvailabilityZone"
      value = "*"

如果存在以下 ELB

  • 名称: p-example, availabilityZone: us-east-1a
  • 名称: p-example, availabilityZone: us-east-1b
  • 名称: q-example, availabilityZone: us-east-1a
  • 名称: q-example, availabilityZone: us-east-1b

那么将输出 2 个指标

  • 名称: p-example, availabilityZone: us-east-1a
  • 名称: p-example, availabilityZone: us-east-1b

如果省略了 AvailabilityZone 通配符维度,那么将导出单个指标(名称:p-example),其中包含 ELB 在各个可用区中的聚合值。

为了最大化效率和节省成本,请考虑通过增加 interval 但将 period 保持在您希望报告指标的持续时间,来减少请求次数。上面的示例将每 5 分钟从 Cloudwatch 请求一次指标,但会输出五分钟间隔一次的五个指标。

限制和局限性

  • CloudWatch 指标无法通过 CloudWatch API 即时获得。您应该根据您的 监控订阅级别调整您的收集 delay 以适应指标可用性的延迟。
  • CloudWatch API 使用会产生费用 - 请参阅 GetMetricData 定价

Metrics

监控的每个 CloudWatch 命名空间都会记录一个测量值,其中包含每个可用指标统计信息的字段。命名空间和指标以 snake_case 表示

稀疏指标

默认情况下,此插件生成的指标是稀疏的。使用 metric_format 选项可以覆盖此设置。

稀疏指标会为每个 AWS 指标生成一组字段。

  • cloudwatch_{namespace}
    • 字段
      • {metric}_sum (指标 Sum 值)
      • {metric}_average (指标 Average 值)
      • {metric}_minimum (指标 Minimum 值)
      • {metric}_maximum (指标 Maximum 值)
      • {metric}_sample_count (指标 SampleCount 值)

例如

cloudwatch_aws_usage,class=None,resource=GetSecretValue,service=Secrets\ Manager,type=API call_count_maximum=1,call_count_minimum=1,call_count_sum=8,call_count_sample_count=8,call_count_average=1 1715097720000000000

密集指标

metric_format 设置为 dense 时,会生成密集指标。

密集指标为每个 AWS 指标重复使用相同的字段,并使用名为 metric_name 的标签来区分 AWS 指标,标签值为 AWS 指标名称

  • cloudwatch_{namespace}
    • 标签
      • metric_name (AWS 指标名称)
    • 字段
      • sum (指标 Sum 值)
      • average (指标 Average 值)
      • minimum (指标 Minimum 值)
      • maximum (指标 Maximum 值)
      • sample_count (指标 SampleCount 值)

例如

cloudwatch_aws_usage,class=None,resource=GetSecretValue,service=Secrets\ Manager,metric_name=call_count,type=API sum=6,sample_count=6,average=1,maximum=1,minimum=1 1715097840000000000

标签

每个测量值都带有以下标识符标签,以唯一标识关联的指标。标签维度名称以 snake_case 表示

  • 所有测量值都有以下标签
    • region (CloudWatch 区域)
    • {dimension-name} (Cloudwatch 维度值 - 每个指标维度一个)
  • 如果 include_linked_accounts 设置为 true,则还会提供以下标签
    • account (指标所在的账户 ID。)

故障排除

您可以使用 aws cli 获取可用指标和维度的列表

aws cloudwatch list-metrics --namespace AWS/EC2 --region us-east-1
aws cloudwatch list-metrics --namespace AWS/EC2 --region us-east-1 --metric-name CPUCreditBalance

如果未返回预期的指标,您可以尝试手动获取一段时间的指标

aws cloudwatch get-metric-data \
  --start-time 2018-07-01T00:00:00Z \
  --end-time 2018-07-01T00:15:00Z \
  --metric-data-queries '[
  {
    "Id": "avgCPUCreditBalance",
    "MetricStat": {
      "Metric": {
        "Namespace": "AWS/EC2",
        "MetricName": "CPUCreditBalance",
        "Dimensions": [
          {
            "Name": "InstanceId",
            "Value": "i-deadbeef"
          }
        ]
      },
      "Period": 300,
      "Stat": "Average"
    },
    "Label": "avgCPUCreditBalance"
  }
]'

示例输出

有关稀疏指标与密集指标的更多详细信息,请参阅上面的讨论。

cloudwatch_aws_elb,load_balancer_name=p-example,region=us-east-1 latency_average=0.004810798017284538,latency_maximum=0.1100282669067383,latency_minimum=0.0006084442138671875,latency_sample_count=4029,latency_sum=19.382705211639404 1459542420000000000

此页面是否有帮助?

感谢您的反馈!


InfluxDB 3.8 新特性

InfluxDB 3.8 和 InfluxDB 3 Explorer 1.6 的主要增强功能。

查看博客文章

InfluxDB 3.8 现已适用于 Core 和 Enterprise 版本,同时发布了 InfluxDB 3 Explorer UI 的 1.6 版本。本次发布着重于操作成熟度,以及如何更轻松地部署、管理和可靠地运行 InfluxDB。

更多信息,请查看

InfluxDB Docker 的 latest 标签将指向 InfluxDB 3 Core

在 **2026 年 2 月 3 日**,InfluxDB Docker 镜像的 latest 标签将指向 InfluxDB 3 Core。为避免意外升级,请在您的 Docker 部署中使用特定的版本标签。

如果使用 Docker 来安装和运行 InfluxDB,latest 标签将指向 InfluxDB 3 Core。为避免意外升级,请在您的 Docker 部署中使用特定的版本标签。例如,如果使用 Docker 运行 InfluxDB v2,请将 latest 版本标签替换为 Docker pull 命令中的特定版本标签 — 例如

docker pull influxdb:2