Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 14360

Polars UDF - returning and concatenating dataframes

$
0
0

I have a polars dataframe that contains arguments to functions.

import polars as pldf = pl.DataFrame(    {"foo": [1, 2, 3],"bar": [6.0, 7.0, 8.0],"ham": ["a", "b", "c"],    })
shape: (3, 3)┌─────┬─────┬─────┐│ foo ┆ bar ┆ ham ││ --- ┆ --- ┆ --- ││ i64 ┆ f64 ┆ str │╞═════╪═════╪═════╡│ 1   ┆ 6.0 ┆ a   ││ 2   ┆ 7.0 ┆ b   ││ 3   ┆ 8.0 ┆ c   │└─────┴─────┴─────┘

I want to apply a UDF to each row, performs a calculation, and then return a dataframe for each row. Each returned dataframe has the same schema, but a varying number of rows. The result should be a single dataframe.

e.g., simplified dummy example:

I tried this:

def myUDF(row_tuple):    foo, bar, ham = row_tuple    result = pl.DataFrame({"a": foo + bar,"b": ham    })    return (result,)
df.map_rows(myUDF)shape: (3, 1)┌────────────────┐│ column_0       ││ ---            ││ object         │╞════════════════╡│ shape: (1, 2)  ││┌─────┬─────┐│││ a …││ shape: (1, 2)  ││┌─────┬─────┐│││ a …││ shape: (1, 2)  ││┌──────┬─────┐│││ a…│└────────────────┘

This seems to work, but it means putting everything in a python dictionary. I'm worried about performance.

def myUDF(row_tuple):    foo, bar, ham = row_tuple    result = pl.DataFrame({"a": foo + bar,"b": ham    })    return (result.to_dict(),)df.map_rows(myUDF).unnest("column_0").explode("a", "b")shape: (3, 2)┌──────┬─────┐│ a    ┆ b   ││ ---  ┆ --- ││ f64  ┆ str │╞══════╪═════╡│ 7.0  ┆ a   ││ 9.0  ┆ b   ││ 11.0 ┆ c   │└──────┴─────┘

Whats the "correct" way of doing this in polars?


Viewing all articles
Browse latest Browse all 14360

Trending Articles