Arrow producer/consumer #
Using pyarrow #
Polars can move data in and out of arrow zero copy. This can be done either via pyarrow or natively. Let’s first start by showing the pyarrow solution:
{{code_block(‘user-guide/misc/arrow’,’to_arrow’,[])}}
pyarrow.Table
foo: int64
bar: large_string
----
foo: [[1,2,3]]
bar: [["ham","spam","jam"]]
Or if you want to ensure the output is zero-copy:
{{code_block(‘user-guide/misc/arrow’,’to_arrow_zero’,[])}}
pyarrow.Table
foo: int64
bar: string_view
----
foo: [[1,2,3]]
bar: [["ham","spam","jam"]]
Importing from pyarrow can be achieved with pl.from_arrow
.
Using the Arrow PyCapsule Interface #
As of Polars v1.3 and higher, Polars implements the Arrow PyCapsule Interface, a protocol for sharing Arrow data across Python libraries.
Exporting data from Polars to pyarrow #
To convert a Polars DataFrame
to a pyarrow.Table
, use the pyarrow.table
constructor:
!!! note
This requires pyarrow v15 or higher.
{{code_block(‘user-guide/misc/arrow_pycapsule’,’to_arrow’,[])}}
pyarrow.Table
foo: int64
bar: string_view
----
foo: [[1,2,3]]
bar: [["ham","spam","jam"]]
To convert a Polars Series
to a pyarrow.ChunkedArray
, use the pyarrow.chunked_array
constructor.
{{code_block(‘user-guide/misc/arrow_pycapsule’,’to_arrow_series’,[])}}
[
[
1,
2,
3
]
]
You can also pass a Series
to the pyarrow.array
constructor to create a contiguous array. Note
that this will not be zero-copy if the underlying Series
had multiple chunks.
{{code_block(‘user-guide/misc/arrow_pycapsule’,’to_arrow_array_rechunk’,[])}}
[
1,
2,
3
]
Importing data from pyarrow to Polars #
We can pass the pyarrow Table
back to Polars by using the polars.DataFrame
constructor:
{{code_block(‘user-guide/misc/arrow_pycapsule’,’to_polars’,[])}}
shape: (3, 2)
┌─────┬──────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪══════╡
│ 1 ┆ ham │
│ 2 ┆ spam │
│ 3 ┆ jam │
└─────┴──────┘
Similarly, we can pass the pyarrow ChunkedArray
or Array
back to Polars by using the
polars.Series
constructor:
{{code_block(‘user-guide/misc/arrow_pycapsule’,’to_polars_series’,[])}}
shape: (3,)
Series: '' [i64]
[
1
2
3
]
Usage with other arrow libraries #
There’s a
growing list of
libraries that support the PyCapsule Interface directly. Polars Series
and DataFrame
objects
work automatically with every such library.
For library maintainers #
If you’re developing a library that you wish to integrate with Polars, it’s suggested to implement the Arrow PyCapsule Interface yourself. This comes with a number of benefits:
- Zero-copy exchange for both Polars Series and DataFrame
- No required dependency on pyarrow.
- No direct dependency on Polars.
- Harder to cause memory leaks than handling pointers as raw integers.
- Automatic zero-copy integration other PyCapsule Interface-supported libraries.
Using Polars directly #
Polars can also consume and export to and import from the Arrow C Data Interface directly. This is recommended for libraries that don’t support the Arrow PyCapsule Interface and want to interop with Polars without requiring a pyarrow installation.
- To export
ArrowArray
C structs, Polars exposes:Series._export_arrow_to_c
. - To import an
ArrowArray
C struct, Polars exposesSeries._import_arrow_from_c
.