使用 pandas 分析数据

使用pandas，Python 数据分析库，来处理、分析和可视化存储在 InfluxDB 集群数据库中的数据。

pandas 是一个开源的、BSD 许可的库，为 Python 编程语言提供了高性能、易于使用的数据结构和数据分析工具。
pandas 文档

安装先决条件
安装 pandas
使用 PyArrow 将查询结果转换为 pandas
使用 pandas 分析数据
- 查看数据信息和统计信息
- 下采样时间序列

安装先决条件

本指南中的示例假定使用 Python 虚拟环境和 InfluxDB v3 influxdb3-python Python 客户端库。有关更多信息，请参阅如何使用 Python 查询 InfluxDB的说明。

安装 influxdb3-python 还会安装提供 Apache Arrow Python 绑定的 pyarrow 库。

安装 pandas

要使用 pandas，您需要安装和导入 pandas 库。

在您的终端中，使用 pip 在您的活动 Python 虚拟环境中安装 pandas

pip install pandas

使用 PyArrow 将查询结果转换为 pandas

以下步骤使用 Python、influxdb3-python 和 pyarrow 查询 InfluxDB 并将 Arrow 数据流式传输到 pandas DataFrame。

在您的编辑器中，将以下代码复制并粘贴到一个新文件中 - 例如，pandas-example.py

# pandas-example.py

from influxdb_client_3 import InfluxDBClient3
import pandas

# Instantiate an InfluxDB client configured for a database
client = InfluxDBClient3(
  "https://cluster-host.com",
  database="DATABASE_NAME
",
  token="DATABASE_TOKEN
")

# Execute the query to retrieve all record batches in the stream
# formatted as a PyArrow Table.
table = client.query(
  '''SELECT *
    FROM home
    WHERE time >= now() - INTERVAL '90 days'
    ORDER BY time'''
)

client.close()

# Convert the PyArrow Table to a pandas DataFrame.
dataframe = table.to_pandas()

print(dataframe)

替换以下配置值
- DATABASE_NAME：要查询的数据库的名称
- DATABASE_TOKEN：具有在指定数据库上读取权限的数据库令牌
在您的终端中，使用Python解释器运行该文件
```
python pandas-example.py
```

示例调用以下方法

InfluxDBClient3.query()：发送查询请求并返回一个包含响应流中所有Arrow记录批次的pyarrow.Table
pyarrow.Table.to_pandas()：从PyArrow Table中的数据创建一个pandas.DataFrame

查看示例结果

接下来，使用pandas分析数据。

查看数据信息和统计信息

以下示例展示了如何使用pandas DataFrame方法转换和汇总存储在InfluxDB Clustered中的数据。

# pandas-example.py

from influxdb_client_3 import InfluxDBClient3
import pandas

# Instantiate an InfluxDB client configured for a database
client = InfluxDBClient3(
  "https://cluster-host.com",
  database="DATABASE_NAME
",
  token="DATABASE_TOKEN
")

# Execute the query to retrieve all record batches in the stream
# formatted as a PyArrow Table.
table = client.query(
  '''SELECT *
    FROM home
    WHERE time >= now() - INTERVAL '90 days'
    ORDER BY time'''
)

client.close()

# Convert the PyArrow Table to a pandas DataFrame.
dataframe = table.to_pandas()

# Print information about the results DataFrame,
# including the index dtype and columns, non-null values, and memory usage.
dataframe.info()

# Calculate descriptive statistics that summarize the distribution of the results.
print(dataframe.describe())

# Extract a DataFrame column.
print(dataframe['temp'])

# Print the DataFrame in Markdown format.
print(dataframe.to_markdown())

替换以下配置值

DATABASE_NAME：要查询的InfluxDB 数据库的名称
DATABASE_TOKEN：具有在指定数据库上读取权限的数据库令牌

下采样时间序列

pandas库提供了处理时间序列数据的丰富功能。

pandas.DataFrame.resample()方法将数据下采样和上采样到基于时间的小组——例如

# pandas-example.py

...

# Use the `time` column to generate a DatetimeIndex for the DataFrame
dataframe = dataframe.set_index('time')

# Print information about the index
print(dataframe.index)

# Downsample data into 1-hour groups based on the DatetimeIndex
resample = dataframe.resample("1H")

# Print a summary that shows the start time and average temp for each group
print(resample['temp'].mean())

查看示例结果

有关更多详细信息和示例，请参阅pandas文档。

分析 pandas pyarrow python

这个页面有用吗？

感谢您的反馈！

支持和反馈

感谢您成为我们社区的一员！我们欢迎并鼓励您为InfluxDB和本文档提供反馈和错误报告。要获取支持，请使用以下资源

拥有年度或支持合同的客户可以联系InfluxData支持。

编辑此页面提交文档问题提交InfluxDB问题

使用 pandas 分析数据

安装先决条件

安装 pandas

使用 PyArrow 将查询结果转换为 pandas

使用 pandas 分析数据

查看数据信息和统计信息

下采样时间序列

支持和反馈

Flux的未来

InfluxDB v3增强功能和InfluxDB Clustered现已普遍可用

InfluxDB v3性能和功能

InfluxDB 集群版现已全面上市

使用 pandas 分析数据

安装先决条件

安装 pandas

使用 PyArrow 将查询结果转换为 pandas

使用 pandas 分析数据

查看数据信息和统计信息

下采样时间序列

相关

支持和反馈

您的InfluxDB集群URL是什么？

输入集群URL

感谢您的反馈！

选择新的日期

Flux的未来

InfluxDB v3增强功能和InfluxDB Clustered现已普遍可用

InfluxDB v3性能和功能

InfluxDB 集群版现已全面上市