使用 pandas 分析数据

使用 pandas，Python 数据分析库，处理、分析和可视化存储在 InfluxDB Clustered 数据库中的数据。

pandas 是一个开源的、BSD 许可的库，为 Python 编程语言提供高性能、易于使用的数据结构和数据分析工具。
pandas 文档

安装先决条件
安装 pandas
使用 PyArrow 将查询结果转换为 pandas
使用 pandas 分析数据
- 查看数据信息和统计信息
- 降采样时间序列

安装先决条件

本指南中的示例假设使用 Python 虚拟环境和 InfluxDB 3 influxdb3-python Python 客户端库。有关更多信息，请参阅如何开始使用 Python 查询 InfluxDB。

安装 influxdb3-python 还会安装 pyarrow 库，该库为 Apache Arrow 提供 Python 绑定。

安装 pandas

要使用 pandas，您需要安装并导入 pandas 库。

在您的终端中，使用 pip 在您的活动Python 虚拟环境中安装 pandas

pip install pandas

使用 PyArrow 将查询结果转换为 pandas

以下步骤使用 Python、influxdb3-python 和 pyarrow 查询 InfluxDB 并将 Arrow 数据流式传输到 pandas DataFrame。

在您的编辑器中，复制以下代码并粘贴到一个新文件中——例如，pandas-example.py

# pandas-example.py

from influxdb_client_3 import InfluxDBClient3
import pandas

# Instantiate an InfluxDB client configured for a database
client = InfluxDBClient3(
  "https://cluster-host.com",
  database="DATABASE_NAME
",
  token="DATABASE_TOKEN
")

# Execute the query to retrieve all record batches in the stream
# formatted as a PyArrow Table.
table = client.query(
  '''SELECT *
    FROM home
    WHERE time >= now() - INTERVAL '90 days'
    ORDER BY time'''
)

client.close()

# Convert the PyArrow Table to a pandas DataFrame.
dataframe = table.to_pandas()

print(dataframe)

替换以下配置值
- DATABASE_NAME：要查询的数据库的名称
- DATABASE_TOKEN：对指定数据库具有读取权限的数据库令牌
在您的终端中，使用 Python 解释器运行该文件
```
python pandas-example.py
```

该示例调用以下方法

InfluxDBClient3.query()：发送查询请求并返回一个 pyarrow.Table，其中包含来自响应流的所有 Arrow 记录批次。
pyarrow.Table.to_pandas()：从 PyArrow Table 中的数据创建一个 pandas.DataFrame。

查看示例结果

接下来，使用 pandas 分析数据。

查看数据信息和统计信息

以下示例展示了如何使用 pandas DataFrame 方法来转换和汇总存储在 InfluxDB Clustered 中的数据。

# pandas-example.py

from influxdb_client_3 import InfluxDBClient3
import pandas

# Instantiate an InfluxDB client configured for a database
client = InfluxDBClient3(
  "https://cluster-host.com",
  database="DATABASE_NAME
",
  token="DATABASE_TOKEN
")

# Execute the query to retrieve all record batches in the stream
# formatted as a PyArrow Table.
table = client.query(
  '''SELECT *
    FROM home
    WHERE time >= now() - INTERVAL '90 days'
    ORDER BY time'''
)

client.close()

# Convert the PyArrow Table to a pandas DataFrame.
dataframe = table.to_pandas()

# Print information about the results DataFrame,
# including the index dtype and columns, non-null values, and memory usage.
dataframe.info()

# Calculate descriptive statistics that summarize the distribution of the results.
print(dataframe.describe())

# Extract a DataFrame column.
print(dataframe['temp'])

# Print the DataFrame in Markdown format.
print(dataframe.to_markdown())

替换以下配置值

DATABASE_NAME：要查询的 InfluxDB 数据库的名称
DATABASE_TOKEN：对指定数据库具有读取权限的数据库令牌

降采样时间序列

pandas 库提供了用于处理时间序列数据的广泛功能。

pandas.DataFrame.resample() 方法将数据降采样和升采样到基于时间的组——例如

# pandas-example.py

...

# Use the `time` column to generate a DatetimeIndex for the DataFrame
dataframe = dataframe.set_index('time')

# Print information about the index
print(dataframe.index)

# Downsample data into 1-hour groups based on the DatetimeIndex
resample = dataframe.resample("1H")

# Print a summary that shows the start time and average temp for each group
print(resample['temp'].mean())

查看示例结果

有关更多详细信息和示例，请参阅 pandas 文档。

分析 pandas pyarrow python

此页是否对您有帮助？

感谢您的反馈！

支持和反馈

感谢您成为我们社区的一份子！我们欢迎并鼓励您提供关于 InfluxDB Clustered 和本文档的反馈和错误报告。要寻求支持，请使用以下资源

拥有年度或支持合同的客户可以联系 InfluxData 支持。

编辑此页提交文档问题提交 InfluxDB Clustered 问题

使用 pandas 分析数据

安装先决条件

安装 pandas

使用 PyArrow 将查询结果转换为 pandas

使用 pandas 分析数据

查看数据信息和统计信息

降采样时间序列

支持和反馈

Flux 的未来

InfluxDB 3 开源版本现已发布公开 Alpha 版

使用 pandas 分析数据

安装先决条件

安装 pandas

使用 PyArrow 将查询结果转换为 pandas

使用 pandas 分析数据

查看数据信息和统计信息

降采样时间序列

相关内容

支持和反馈

您的 InfluxDB 集群 URL 是什么？

输入集群 URL

感谢您的反馈！

选择新日期

Flux 的未来

InfluxDB 3 开源版本现已发布公开 Alpha 版