使用 PyArrow 库分析数据

使用 PyArrow 读取和分析 InfluxDB 集群查询结果。PyArrow 库提供了高效的计算、聚合、序列化和转换 Arrow 格式数据的工具。

Apache Arrow 是一个内存分析开发平台。它包含一系列技术，使大数据系统能够快速存储、处理和移动数据。
Arrow Python 绑定（也称为“PyArrow”）与 NumPy、pandas 和内置 Python 对象具有一流集成。它们基于 Arrow 的 C++ 实现。
PyArrow 文档

安装先决条件
使用 PyArrow 读取查询结果
使用 PyArrow 分析数据
- 分组和聚合数据

安装先决条件

本指南中的示例假设使用 Python 虚拟环境和 InfluxDB v3 influxdb3-python Python 客户端库。有关更多信息，请参阅如何使用 Python 查询 InfluxDB 入门。

安装 influxdb3-python 还会安装提供 Apache Arrow Python 绑定的 pyarrow 库。

使用 PyArrow 读取查询结果

以下示例展示了如何使用 influxdb3-python 和 pyarrow 查询 InfluxDB 并以 PyArrow Table 的形式查看 Arrow 数据。

在您的编辑器中，将以下示例代码复制并粘贴到一个新文件中——例如，pyarrow-example.py

# pyarrow-example.py

from influxdb_client_3 import InfluxDBClient3
import pandas

def querySQL():
  
  # Instantiate an InfluxDB client configured for a database
  client = InfluxDBClient3(
    "https://cluster-host.com",
    database="DATABASE_NAME
",
    token="DATABASE_TOKEN
")

  # Execute the query to retrieve all record batches in the stream formatted as a PyArrow Table.
  table = client.query(
    '''SELECT *
      FROM home
      WHERE time >= now() - INTERVAL '90 days'
      ORDER BY time'''
  )

  client.close()

print(querySQL())

替换以下配置值
- DATABASE_TOKEN：具有读取权限的数据库的数据库令牌
- DATABASE_NAME：要查询的数据库名称
在您的终端中，使用Python解释器运行该文件
```
python pyarrow-example.py
```

InfluxDBClient3.query()方法发送查询请求，然后返回一个包含响应流中所有Arrow记录批次的pyarrow.Table

接下来，使用PyArrow分析数据。

使用 PyArrow 分析数据

分组和聚合数据

使用pyarrow.Table，您可以使用列中的值作为键进行分组。

以下示例展示了如何查询InfluxDB，然后使用PyArrow对表数据进行分组并计算每个组的汇总值

# pyarrow-example.py

from influxdb_client_3 import InfluxDBClient3
import pandas

def querySQL():
  
  # Instantiate an InfluxDB client configured for a database
  client = InfluxDBClient3(
    "https://cluster-host.com",
    database="DATABASE_NAME
",
    token="DATABASE_TOKEN
")

  # Execute the query to retrieve data 
  # formatted as a PyArrow Table
  table = client.query(
    '''SELECT *
      FROM home
      WHERE time >= now() - INTERVAL '90 days'
      ORDER BY time'''
  )

  client.close()

  return table

table = querySQL()

# Use PyArrow to aggregate data
print(table.group_by('room').aggregate([('temp', 'mean')]))

替换以下

DATABASE_TOKEN：具有读取权限的数据库的数据库令牌
DATABASE_NAME：要查询的数据库名称

查看示例结果

pyarrow.Table
temp_mean: double
room: string
----
temp_mean: [[22.581987577639747,22.10807453416151]]
room: [["Kitchen","Living Room"]]

有关更多详细信息示例，请参阅PyArrow文档和Apache Arrow Python食谱。

分析箭头 pyarrow Python

这个页面有帮助吗？

感谢您的反馈！

支持和反馈

感谢您成为我们社区的一员！我们欢迎并鼓励您对InfluxDB和此文档的反馈和错误报告。要获取支持，请使用以下资源

拥有年度或支持合同的客户可以联系InfluxData支持。

编辑此页面提交文档问题提交InfluxDB问题

使用 PyArrow 库分析数据

安装先决条件

使用 PyArrow 读取查询结果

使用 PyArrow 分析数据

分组和聚合数据

支持和反馈

Flux的未来

InfluxDB v3增强功能和InfluxDB Clustered现已正式发布

InfluxDB v3性能和功能

InfluxDB Clustered正式发布

使用 PyArrow 库分析数据

安装先决条件

使用 PyArrow 读取查询结果

使用 PyArrow 分析数据

分组和聚合数据

相关

支持和反馈

您的InfluxDB集群URL是什么？

输入集群URL

感谢您的反馈！

Flux的未来

InfluxDB v3增强功能和InfluxDB Clustered现已正式发布

InfluxDB v3性能和功能

InfluxDB Clustered正式发布