Amazon CloudWatch Statistics Input Plugin
此插件将从 Amazon CloudWatch 收集指标统计信息。
引入于: Telegraf v0.12.1 标签: cloud 操作系统支持: all
Amazon 认证
此插件使用凭证链进行 CloudWatch API 端点的认证。插件将按以下顺序尝试进行认证。
- 如果指定了
role_arn属性,则通过 STS 假定凭证 (源凭证从后续规则评估) - 来自
access_key、secret_key和token属性的显式凭证 - 来自
profile属性的共享配置文件 - 环境变量
- 共享凭证
- EC2 实例配置文件
全局配置选项
插件支持其他全局和插件配置设置,用于修改指标、标签和字段,创建别名以及配置插件顺序等任务。更多详情请参阅 CONFIGURATION.md。
配置
# Pull Metric Statistics from Amazon CloudWatch
[[inputs.cloudwatch]]
## Amazon Region
region = "us-east-1"
## Amazon Credentials
## Credentials are loaded in the following order
## 1) Web identity provider credentials via STS if role_arn and
## web_identity_token_file are specified
## 2) Assumed credentials via STS if role_arn is specified
## 3) explicit credentials from 'access_key' and 'secret_key'
## 4) shared profile from 'profile'
## 5) environment variables
## 6) shared credentials file
## 7) EC2 Instance Profile
# access_key = ""
# secret_key = ""
# token = ""
# role_arn = ""
# web_identity_token_file = ""
# role_session_name = ""
# profile = ""
# shared_credential_file = ""
## If you are using CloudWatch cross-account observability, you can
## set IncludeLinkedAccounts to true in a monitoring account
## and collect metrics from the linked source accounts
# include_linked_accounts = false
## Endpoint to make request against, the correct endpoint is automatically
## determined and this option should only be set if you wish to override the
## default.
## ex: endpoint_url = "https://:8000"
# endpoint_url = ""
## Set http_proxy
# use_system_proxy = false
# http_proxy_url = "https://:8888"
## The minimum period for Cloudwatch metrics is 1 minute (60s). However not
## all metrics are made available to the 1 minute period. Some are collected
## at 3 minute, 5 minute, or larger intervals.
## See https://aws.amazon.com/cloudwatch/faqs/#monitoring.
## Note that if a period is configured that is smaller than the minimum for a
## particular metric, that metric will not be returned by the Cloudwatch API
## and will not be collected by Telegraf.
#
## Requested CloudWatch aggregation Period (required)
## Must be a multiple of 60s.
period = "5m"
## Collection Delay (required)
## Must account for metrics availability via CloudWatch API
delay = "5m"
## Recommended: use metric 'interval' that is a multiple of 'period' to avoid
## gaps or overlap in pulled data
interval = "5m"
## Recommended if "delay" and "period" are both within 3 hours of request
## time. Invalid values will be ignored. Recently Active feature will only
## poll for CloudWatch ListMetrics values that occurred within the last 3h.
## If enabled, it will reduce total API usage of the CloudWatch ListMetrics
## API and require less memory to retain.
## Do not enable if "period" or "delay" is longer than 3 hours, as it will
## not return data more than 3 hours old.
## See https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_ListMetrics.html
# recently_active = "PT3H"
## Configure the TTL for the internal cache of metrics.
# cache_ttl = "1h"
## Metric Statistic Namespaces, wildcards are allowed
# namespaces = ["*"]
## Metric Format
## This determines the format of the produces metrics. 'sparse', the default
## will produce a unique field for each statistic. 'dense' will report all
## statistics will be in a field called value and have a metric_name tag
## defining the name of the statistic. See the plugin README for examples.
# metric_format = "sparse"
## Maximum requests per second. Note that the global default AWS rate limit
## is 50 reqs/sec, so if you define multiple namespaces, these should add up
## to a maximum of 50.
## See http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_limits.html
# ratelimit = 25
## Timeout for http requests made by the cloudwatch client.
# timeout = "5s"
## Batch Size
## The size of each batch to send requests to Cloudwatch. 500 is the
## suggested largest size. If a request gets to large (413 errors), consider
## reducing this amount.
# batch_size = 500
## Namespace-wide statistic filters. These allow fewer queries to be made to
## cloudwatch.
# statistic_include = ["average", "sum", "minimum", "maximum", sample_count"]
# statistic_exclude = []
## Metrics to Pull
## Defaults to all Metrics in Namespace if nothing is provided
## Refreshes Namespace available metrics every 1h
#[[inputs.cloudwatch.metrics]]
# names = ["Latency", "RequestCount"]
#
# ## Statistic filters for Metric. These allow for retrieving specific
# ## statistics for an individual metric.
# # statistic_include = ["average", "sum", "minimum", "maximum", sample_count"]
# # statistic_exclude = []
#
# ## Dimension filters for Metric.
# ## All dimensions defined for the metric names must be specified in order
# ## to retrieve the metric statistics.
# ## 'value' has wildcard / 'glob' matching support such as 'p-*'.
# [[inputs.cloudwatch.metrics.dimensions]]
# name = "LoadBalancerName"
# value = "p-example"请注意,namespace 选项已弃用,改为使用 namespaces 列表选项。
要求和术语
插件配置利用 CloudWatch 概念和访问模式,以便监控任何 CloudWatch 指标。
region必须是有效的 AWS 区域值period必须是有效的 CloudWatch 周期值namespaces必须是有效的 CloudWatch 命名空间值列表names必须是有效的 CloudWatch 指标名称dimensions必须是有效的 CloudWatch 维度名称/值对
省略或将维度值指定为 '*' 会检索包含具有指定名称的维度的所有可用指标。如果指定了多个维度,则指标必须包含所有已配置的维度,其中通配符维度的值将被忽略。
示例
[[inputs.cloudwatch]]
period = "1m"
interval = "5m"
[[inputs.cloudwatch.metrics]]
names = ["Latency"]
## Dimension filters for Metric (optional)
[[inputs.cloudwatch.metrics.dimensions]]
name = "LoadBalancerName"
value = "p-example"
[[inputs.cloudwatch.metrics.dimensions]]
name = "AvailabilityZone"
value = "*"如果存在以下 ELB
- 名称:
p-example, availabilityZone:us-east-1a - 名称:
p-example, availabilityZone:us-east-1b - 名称:
q-example, availabilityZone:us-east-1a - 名称:
q-example, availabilityZone:us-east-1b
那么将输出 2 个指标
- 名称:
p-example, availabilityZone:us-east-1a - 名称:
p-example, availabilityZone:us-east-1b
如果省略了 AvailabilityZone 通配符维度,那么将导出单个指标(名称:p-example),其中包含 ELB 在各个可用区中的聚合值。
为了最大化效率和节省成本,请考虑通过增加 interval 但将 period 保持在您希望报告指标的持续时间,来减少请求次数。上面的示例将每 5 分钟从 Cloudwatch 请求一次指标,但会输出五分钟间隔一次的五个指标。
限制和局限性
- CloudWatch 指标无法通过 CloudWatch API 即时获得。您应该根据您的 监控订阅级别调整您的收集
delay以适应指标可用性的延迟。 - CloudWatch API 使用会产生费用 - 请参阅 GetMetricData 定价
Metrics
监控的每个 CloudWatch 命名空间都会记录一个测量值,其中包含每个可用指标统计信息的字段。命名空间和指标以 snake_case 表示
稀疏指标
默认情况下,此插件生成的指标是稀疏的。使用 metric_format 选项可以覆盖此设置。
稀疏指标会为每个 AWS 指标生成一组字段。
- cloudwatch_{namespace}
- 字段
- {metric}_sum (指标 Sum 值)
- {metric}_average (指标 Average 值)
- {metric}_minimum (指标 Minimum 值)
- {metric}_maximum (指标 Maximum 值)
- {metric}_sample_count (指标 SampleCount 值)
- 字段
例如
cloudwatch_aws_usage,class=None,resource=GetSecretValue,service=Secrets\ Manager,type=API call_count_maximum=1,call_count_minimum=1,call_count_sum=8,call_count_sample_count=8,call_count_average=1 1715097720000000000密集指标
当 metric_format 设置为 dense 时,会生成密集指标。
密集指标为每个 AWS 指标重复使用相同的字段,并使用名为 metric_name 的标签来区分 AWS 指标,标签值为 AWS 指标名称
- cloudwatch_{namespace}
- 标签
- metric_name (AWS 指标名称)
- 字段
- sum (指标 Sum 值)
- average (指标 Average 值)
- minimum (指标 Minimum 值)
- maximum (指标 Maximum 值)
- sample_count (指标 SampleCount 值)
- 标签
例如
cloudwatch_aws_usage,class=None,resource=GetSecretValue,service=Secrets\ Manager,metric_name=call_count,type=API sum=6,sample_count=6,average=1,maximum=1,minimum=1 1715097840000000000标签
每个测量值都带有以下标识符标签,以唯一标识关联的指标。标签维度名称以 snake_case 表示
- 所有测量值都有以下标签
- region (CloudWatch 区域)
- {dimension-name} (Cloudwatch 维度值 - 每个指标维度一个)
- 如果
include_linked_accounts设置为 true,则还会提供以下标签- account (指标所在的账户 ID。)
故障排除
您可以使用 aws cli 获取可用指标和维度的列表
aws cloudwatch list-metrics --namespace AWS/EC2 --region us-east-1
aws cloudwatch list-metrics --namespace AWS/EC2 --region us-east-1 --metric-name CPUCreditBalance如果未返回预期的指标,您可以尝试手动获取一段时间的指标
aws cloudwatch get-metric-data \
--start-time 2018-07-01T00:00:00Z \
--end-time 2018-07-01T00:15:00Z \
--metric-data-queries '[
{
"Id": "avgCPUCreditBalance",
"MetricStat": {
"Metric": {
"Namespace": "AWS/EC2",
"MetricName": "CPUCreditBalance",
"Dimensions": [
{
"Name": "InstanceId",
"Value": "i-deadbeef"
}
]
},
"Period": 300,
"Stat": "Average"
},
"Label": "avgCPUCreditBalance"
}
]'示例输出
有关稀疏指标与密集指标的更多详细信息,请参阅上面的讨论。
cloudwatch_aws_elb,load_balancer_name=p-example,region=us-east-1 latency_average=0.004810798017284538,latency_maximum=0.1100282669067383,latency_minimum=0.0006084442138671875,latency_sample_count=4029,latency_sum=19.382705211639404 1459542420000000000此页面是否有帮助?
感谢您的反馈!
支持和反馈
感谢您成为我们社区的一员!我们欢迎并鼓励您对 Telegraf 和本文档提出反馈和 bug 报告。要获取支持,请使用以下资源
具有年度合同或支持合同的客户可以 联系 InfluxData 支持。