安装Polars #
Polars是一个库,安装起来就像调用相应编程语言的包管理器一样简单。
pip install polars
#或者对于那些不支持高级矢量扩展指令集2(AVX2)的旧CPU
pip install polars-lts-cpu
cargo add polars -F lazy
# Or Cargo.toml
[dependencies]
polars = { version = "x", features = ["lazy", ...]}
大索引 #
默认情况下,Polars dataframes的行数限制为2^32(约43亿)行。通过启用大索引扩展功能,可将此限制提升至2^64(约1800京)行:
pip install polars-u64-idx
cargo add polars -F bigidx
# Or Cargo.toml
[dependencies]
polars = { version = "x", features = ["bigidx", ...] }
旧款CPU #
在不支持高级矢量扩展指令集( AVX)的旧款 CPU 上为 Python 安装 Polars,请运行:
pip install polars-lts-cpu
导入polars #
要使用polars库,只需将其导入到你的项目中即可:
import polars as pl
use polars::prelude::*;
特性标志 #
通过使用上述命令,你可以将 Polars 的核心部分安装到你的系统上。然而,根据你的使用场景,你可能还需要安装一些可选的依赖项。将这些设置为可选的目的是尽量减少占用空间。根据编程语言的不同,相应的标志也有所不同。在整个用户指南中,当所使用的某项功能需要额外的依赖项时,将会提别提醒。
Python #
# 示例
pip install 'polars[numpy,fsspec]'
All #
标志 | 说明 |
---|---|
all | 安装所有可选的依赖项。 |
GPU #
标志 | 说明 |
---|---|
gpu | 在英伟达(NVIDIA)图形处理器(GPU)上运行查询。 |
说明
有关更详细的说明和先决条件,请参阅 GPU支持相关内容。
互操作性 #
标志 | 说明 |
---|---|
pandas | Convert data to and from pandas dataframes/series. |
numpy | Convert data to and from NumPy arrays. |
pyarrow | Convert data to and from PyArrow tables/arrays. |
pydantic | Convert data from Pydantic models to Polars. |
Excel #
标志 | 说明 |
---|---|
calamine | Read from Excel files with the calamine engine. |
openpyxl | Read from Excel files with the openpyxl engine. |
xlsx2csv | Read from Excel files with the xlsx2csv engine. |
xlsxwriter | Write to Excel files with the XlsxWriter engine. |
excel | Install all supported Excel engines. |
数据库 #
标志 | 说明 |
---|---|
adbc | Read from and write to databases with the Arrow Database Connectivity (ADBC) engine. |
connectorx | Read from databases with the ConnectorX engine. |
sqlalchemy | Write to databases with the SQLAlchemy engine. |
database | Install all supported database engines. |
云 #
标志 | 说明 |
---|---|
fsspec | Read from and write to remote file systems. |
其他I/O #
标志 | 说明 |
---|---|
deltalake | Read from and write to Delta tables. |
iceberg | Read from Apache Iceberg tables. |
其他 #
标志 | 说明 |
---|---|
async | Collect LazyFrames asynchronously. |
cloudpickle | Serialize user-defined functions. |
graph | Visualize LazyFrames as a graph. |
plot | Plot dataframes through the plot namespace. |
style | Style dataframes through the style namespace. |
timezone | Timezone support.仅使用Windows时才需要 |
Rust #
# Cargo.toml
[dependencies]
polars = { version = "0.26.1", features = ["lazy", "temporal", "describe", "json", "parquet", "dtype-datetime"] }
可选择启用的功能如下:
- 额外的数据类型:
dtype-date
dtype-datetime
dtype-time
dtype-duration
dtype-i8
dtype-i16
dtype-u8
dtype-u16
dtype-categorical
dtype-struct
lazy
- Lazy API:regex
- 在列选择中使用正则表达式.dot_diagram
- 根据惰性逻辑计划创建点图。
sql
- 将 SQL 查询传递给 Polars。streaming
- 能够处理比内存容量更大的数据集。random
- 生成包含随机采样值的数组ndarray
- 将DataFrame
(数据框)转换为ndarray
(多维数组)temporal
- 针对时间数据类型在 Chrono(时间库)和 Polars(数据处理库)之间进行转换timezones
- 激活时区支持。strings
- Extra string utilities forStringChunked
:string_pad
- forpad_start
,pad_end
,zfill
.string_to_integer
- forparse_int
.
object
- Support for generic ChunkedArrays calledObjectChunked<T>
(generic overT
). These are downcastable from Series through the Any trait.- 性能相关:
nightly
- Several nightly only features such as SIMD and specialization.performant
- more fast paths, slower compile times.bigidx
- Activate this feature if you expect » $2^{32}$ rows. This allows polars to scale up way beyond that by usingu64
as an index. Polars will be a bit slower with this feature activated as many data structures are less cache efficient.cse
- Activate common subplan elimination optimization.
- IO相关:
serde
- Support for serde serialization and deserialization. Can be used for JSON and more serde supported serialization formats.serde-lazy
- Support for serde serialization and deserialization. Can be used for JSON and more serde supported serialization formats.parquet
- Read Apache Parquet format.json
- JSON serialization.ipc
- Arrow’s IPC format serialization.decompress
- Automatically infer compression of csvs and decompress them. Supported compressions:- gzip
- zlib
- zstd
- Dataframe操作:
dynamic_group_by
- Group by based on a time window instead of predefined keys. Also activates rolling window group by operations.sort_multiple
- Allow sorting a dataframe on multiple columns.rows
- Create dataframe from rows and extract rows fromdataframes
. Also activatespivot
andtranspose
operations.join_asof
- Join ASOF, to join on nearest keys instead of exact equality match.cross_join
- Create the Cartesian product of two dataframes.semi_anti_join
- SEMI and ANTI joins.row_hash
- Utility to hash dataframe rows toUInt64Chunked
.diagonal_concat
- Diagonal concatenation thereby combining different schemas.dataframe_arithmetic
- Arithmetic between dataframes and other dataframes or series.partition_by
- Split into multiple dataframes partitioned by groups.
- Series/表达式操作:
is_in
- Check for membership in Series.zip_with
- Zip twoSeries
/ChunkedArray
s.round_series
- round underlying float types of series.repeat_by
- Repeat element in an array a number of times specified by another array.is_first_distinct
- Check if element is first unique value.is_last_distinct
- Check if element is last unique value.checked_arithmetic
- checked arithmetic returningNone
on invalid operations.dot_product
- Dot/inner product on series and expressions.concat_str
- Concatenate string data in linear time.reinterpret
- Utility to reinterpret bits to signed/unsigned.take_opt_iter
- Take from a series withIterator<Item=Option<usize>>
.mode
- Return the most frequently occurring value(s).cum_agg
-cum_sum
,cum_min
, andcum_max
, aggregations.rolling_window
- rolling window functions, likerolling_mean
.interpolate
- InterpolateNone
values.extract_jsonpath
- Runjsonpath
queries onStringChunked
.list
- List utils:list_gather
- take sublist by multiple indices.
rank
- Ranking algorithms.moment
- Kurtosis and skew statistics.ewma
- Exponential moving average windows.abs
- Get absolute values of series.arange
- Range operation on series.product
- Compute the product of a series.diff
-diff
operation.pct_change
- Compute change percentages.unique_counts
- Count unique values in expressions.log
- Logarithms for series.list_to_struct
- ConvertList
toStruct
data types.list_count
- Count elements in lists.list_eval
- Apply expressions over list elements.cumulative_eval
- Apply expressions over cumulatively increasing windows.arg_where
- Get indices where condition holds.search_sorted
- Find indices where elements should be inserted to maintain order.offset_by
- Add an offset to dates that take months and leap years into account.trigonometry
- 三角函数.sign
- 计算一个序列中每个元素的符号(正、负或零)。propagate_nans
-NaN
-propagating min/max aggregations.
- Dataframe美化格式化:
fmt
- 激活Dataframe格式化功能.