Telegraf v0.12.1+

Amazon CloudWatch Statistics Input Plugin

此插件将从 Amazon CloudWatch 收集指标统计信息。

引入于: Telegraf v0.12.1 标签: cloud 操作系统支持: all

Amazon 认证

此插件使用凭证链进行 CloudWatch API 端点的认证。插件将按以下顺序尝试进行认证。

如果指定了 role_arn 属性，则通过 STS 假定凭证 (源凭证从后续规则评估)
来自 access_key、secret_key 和 token 属性的显式凭证
来自 profile 属性的共享配置文件
环境变量
共享凭证
EC2 实例配置文件

全局配置选项

插件支持其他全局和插件配置设置，用于修改指标、标签和字段，创建别名以及配置插件顺序等任务。更多详情请参阅 CONFIGURATION.md。

配置

# Pull Metric Statistics from Amazon CloudWatch
[[inputs.cloudwatch]]
  ## Amazon Region
  region = "us-east-1"

  ## Amazon Credentials
  ## Credentials are loaded in the following order
  ## 1) Web identity provider credentials via STS if role_arn and
  ##    web_identity_token_file are specified
  ## 2) Assumed credentials via STS if role_arn is specified
  ## 3) explicit credentials from 'access_key' and 'secret_key'
  ## 4) shared profile from 'profile'
  ## 5) environment variables
  ## 6) shared credentials file
  ## 7) EC2 Instance Profile
  # access_key = ""
  # secret_key = ""
  # token = ""
  # role_arn = ""
  # web_identity_token_file = ""
  # role_session_name = ""
  # profile = ""
  # shared_credential_file = ""

  ## If you are using CloudWatch cross-account observability, you can
  ## set IncludeLinkedAccounts to true in a monitoring account
  ## and collect metrics from the linked source accounts
  # include_linked_accounts = false

  ## Endpoint to make request against, the correct endpoint is automatically
  ## determined and this option should only be set if you wish to override the
  ## default.
  ##   ex: endpoint_url = "https://:8000"
  # endpoint_url = ""

  ## Set http_proxy
  # use_system_proxy = false
  # http_proxy_url = "https://:8888"

  ## The minimum period for Cloudwatch metrics is 1 minute (60s). However not
  ## all metrics are made available to the 1 minute period. Some are collected
  ## at 3 minute, 5 minute, or larger intervals.
  ## See https://aws.amazon.com/cloudwatch/faqs/#monitoring.
  ## Note that if a period is configured that is smaller than the minimum for a
  ## particular metric, that metric will not be returned by the Cloudwatch API
  ## and will not be collected by Telegraf.
  #
  ## Requested CloudWatch aggregation Period (required)
  ## Must be a multiple of 60s.
  period = "5m"

  ## Collection Delay (required)
  ## Must account for metrics availability via CloudWatch API
  delay = "5m"

  ## Recommended: use metric 'interval' that is a multiple of 'period' to avoid
  ## gaps or overlap in pulled data
  interval = "5m"

  ## Recommended if "delay" and "period" are both within 3 hours of request
  ## time. Invalid values will be ignored. Recently Active feature will only
  ## poll for CloudWatch ListMetrics values that occurred within the last 3h.
  ## If enabled, it will reduce total API usage of the CloudWatch ListMetrics
  ## API and require less memory to retain.
  ## Do not enable if "period" or "delay" is longer than 3 hours, as it will
  ## not return data more than 3 hours old.
  ## See https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_ListMetrics.html
  # recently_active = "PT3H"

  ## Configure the TTL for the internal cache of metrics.
  # cache_ttl = "1h"

  ## Metric Statistic Namespaces, wildcards are allowed
  # namespaces = ["*"]

  ## Metric Format
  ## This determines the format of the produces metrics. 'sparse', the default
  ## will produce a unique field for each statistic. 'dense' will report all
  ## statistics will be in a field called value and have a metric_name tag
  ## defining the name of the statistic. See the plugin README for examples.
  # metric_format = "sparse"

  ## Maximum requests per second. Note that the global default AWS rate limit
  ## is 50 reqs/sec, so if you define multiple namespaces, these should add up
  ## to a maximum of 50.
  ## See http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_limits.html
  # ratelimit = 25

  ## Timeout for http requests made by the cloudwatch client.
  # timeout = "5s"

  ## Batch Size
  ## The size of each batch to send requests to Cloudwatch. 500 is the
  ## suggested largest size. If a request gets to large (413 errors), consider
  ## reducing this amount.
  # batch_size = 500

  ## Namespace-wide statistic filters. These allow fewer queries to be made to
  ## cloudwatch.
  # statistic_include = ["average", "sum", "minimum", "maximum", sample_count"]
  # statistic_exclude = []

  ## Metrics to Pull
  ## Defaults to all Metrics in Namespace if nothing is provided
  ## Refreshes Namespace available metrics every 1h
  #[[inputs.cloudwatch.metrics]]
  #  names = ["Latency", "RequestCount"]
  #
  #  ## Statistic filters for Metric.  These allow for retrieving specific
  #  ## statistics for an individual metric.
  #  # statistic_include = ["average", "sum", "minimum", "maximum", sample_count"]
  #  # statistic_exclude = []
  #
  #  ## Dimension filters for Metric.
  #  ## All dimensions defined for the metric names must be specified in order
  #  ## to retrieve the metric statistics.
  #  ## 'value' has wildcard / 'glob' matching support such as 'p-*'.
  #  [[inputs.cloudwatch.metrics.dimensions]]
  #    name = "LoadBalancerName"
  #    value = "p-example"

请注意，namespace 选项已弃用，改为使用 namespaces 列表选项。

要求和术语

插件配置利用 CloudWatch 概念和访问模式，以便监控任何 CloudWatch 指标。

region 必须是有效的 AWS 区域值
period 必须是有效的 CloudWatch 周期值
namespaces 必须是有效的 CloudWatch 命名空间值列表
names 必须是有效的 CloudWatch 指标名称
dimensions 必须是有效的 CloudWatch 维度名称/值对

省略或将维度值指定为 '*' 会检索包含具有指定名称的维度的所有可用指标。如果指定了多个维度，则指标必须包含所有已配置的维度，其中通配符维度的值将被忽略。

示例

[[inputs.cloudwatch]]
  period = "1m"
  interval = "5m"

  [[inputs.cloudwatch.metrics]]
    names = ["Latency"]

    ## Dimension filters for Metric (optional)
    [[inputs.cloudwatch.metrics.dimensions]]
      name = "LoadBalancerName"
      value = "p-example"

    [[inputs.cloudwatch.metrics.dimensions]]
      name = "AvailabilityZone"
      value = "*"

如果存在以下 ELB

名称: p-example, availabilityZone: us-east-1a
名称: p-example, availabilityZone: us-east-1b
名称: q-example, availabilityZone: us-east-1a
名称: q-example, availabilityZone: us-east-1b

那么将输出 2 个指标

名称: p-example, availabilityZone: us-east-1a
名称: p-example, availabilityZone: us-east-1b

如果省略了 AvailabilityZone 通配符维度，那么将导出单个指标（名称：p-example），其中包含 ELB 在各个可用区中的聚合值。

为了最大化效率和节省成本，请考虑通过增加 interval 但将 period 保持在您希望报告指标的持续时间，来减少请求次数。上面的示例将每 5 分钟从 Cloudwatch 请求一次指标，但会输出五分钟间隔一次的五个指标。

限制和局限性

CloudWatch 指标无法通过 CloudWatch API 即时获得。您应该根据您的监控订阅级别调整您的收集 delay 以适应指标可用性的延迟。
CloudWatch API 使用会产生费用 - 请参阅 GetMetricData 定价

Metrics

监控的每个 CloudWatch 命名空间都会记录一个测量值，其中包含每个可用指标统计信息的字段。命名空间和指标以 snake_case 表示

稀疏指标

默认情况下，此插件生成的指标是稀疏的。使用 metric_format 选项可以覆盖此设置。

稀疏指标会为每个 AWS 指标生成一组字段。

cloudwatch_{namespace}
- 字段
  - {metric}_sum (指标 Sum 值)
  - {metric}_average (指标 Average 值)
  - {metric}_minimum (指标 Minimum 值)
  - {metric}_maximum (指标 Maximum 值)
  - {metric}_sample_count (指标 SampleCount 值)

例如

cloudwatch_aws_usage,class=None,resource=GetSecretValue,service=Secrets\ Manager,type=API call_count_maximum=1,call_count_minimum=1,call_count_sum=8,call_count_sample_count=8,call_count_average=1 1715097720000000000

密集指标

当 metric_format 设置为 dense 时，会生成密集指标。

密集指标为每个 AWS 指标重复使用相同的字段，并使用名为 metric_name 的标签来区分 AWS 指标，标签值为 AWS 指标名称

cloudwatch_{namespace}
- 标签
  - metric_name (AWS 指标名称)
- 字段
  - sum (指标 Sum 值)
  - average (指标 Average 值)
  - minimum (指标 Minimum 值)
  - maximum (指标 Maximum 值)
  - sample_count (指标 SampleCount 值)

例如

cloudwatch_aws_usage,class=None,resource=GetSecretValue,service=Secrets\ Manager,metric_name=call_count,type=API sum=6,sample_count=6,average=1,maximum=1,minimum=1 1715097840000000000

故障排除

您可以使用 aws cli 获取可用指标和维度的列表

aws cloudwatch list-metrics --namespace AWS/EC2 --region us-east-1
aws cloudwatch list-metrics --namespace AWS/EC2 --region us-east-1 --metric-name CPUCreditBalance

如果未返回预期的指标，您可以尝试手动获取一段时间的指标

aws cloudwatch get-metric-data \
  --start-time 2018-07-01T00:00:00Z \
  --end-time 2018-07-01T00:15:00Z \
  --metric-data-queries '[
  {
    "Id": "avgCPUCreditBalance",
    "MetricStat": {
      "Metric": {
        "Namespace": "AWS/EC2",
        "MetricName": "CPUCreditBalance",
        "Dimensions": [
          {
            "Name": "InstanceId",
            "Value": "i-deadbeef"
          }
        ]
      },
      "Period": 300,
      "Stat": "Average"
    },
    "Label": "avgCPUCreditBalance"
  }
]'

示例输出

有关稀疏指标与密集指标的更多详细信息，请参阅上面的讨论。

cloudwatch_aws_elb,load_balancer_name=p-example,region=us-east-1 latency_average=0.004810798017284538,latency_maximum=0.1100282669067383,latency_minimum=0.0006084442138671875,latency_sample_count=4029,latency_sum=19.382705211639404 1459542420000000000

此页面是否有帮助？

感谢您的反馈！

支持和反馈

感谢您成为我们社区的一员！我们欢迎并鼓励您对 Telegraf 和本文档提出反馈和 bug 报告。要获取支持，请使用以下资源

具有年度合同或支持合同的客户可以联系 InfluxData 支持。

编辑此页面提交文档问题提交 Telegraf 问题

Amazon CloudWatch Statistics Input Plugin

Amazon 认证

全局配置选项

配置

要求和术语

限制和局限性

Metrics

稀疏指标

密集指标

标签

故障排除

示例输出

支持和反馈

InfluxDB 3.8 新特性

InfluxDB Docker 的 latest 标签将指向 InfluxDB 3 Core

Amazon CloudWatch Statistics Input Plugin

Amazon 认证

全局配置选项

配置

要求和术语

限制和局限性

Metrics

稀疏指标

密集指标

标签

故障排除

示例输出

Related

支持和反馈

您在哪里运行 InfluxDB？

AWS

GCP

Azure

默认

自定义

感谢您的反馈！

InfluxDB 3.8 新特性

InfluxDB Docker 的 latest 标签将指向 InfluxDB 3 Core