I have a polars dataframe that contains arguments to functions.
import polars as pldf = pl.DataFrame( {"foo": [1, 2, 3],"bar": [6.0, 7.0, 8.0],"ham": ["a", "b", "c"], })
shape: (3, 3)┌─────┬─────┬─────┐│ foo ┆ bar ┆ ham ││ --- ┆ --- ┆ --- ││ i64 ┆ f64 ┆ str │╞═════╪═════╪═════╡│ 1 ┆ 6.0 ┆ a ││ 2 ┆ 7.0 ┆ b ││ 3 ┆ 8.0 ┆ c │└─────┴─────┴─────┘
I want to apply a UDF to each row, performs a calculation, and then return a dataframe for each row. Each returned dataframe has the same schema, but a varying number of rows. The result should be a single dataframe.
e.g., simplified dummy example:
I tried this:
def myUDF(row_tuple): foo, bar, ham = row_tuple result = pl.DataFrame({"a": foo + bar,"b": ham }) return (result,)
df.map_rows(myUDF)shape: (3, 1)┌────────────────┐│ column_0 ││ --- ││ object │╞════════════════╡│ shape: (1, 2) ││┌─────┬─────┐│││ a …││ shape: (1, 2) ││┌─────┬─────┐│││ a …││ shape: (1, 2) ││┌──────┬─────┐│││ a…│└────────────────┘
This seems to work, but it means putting everything in a python dictionary. I'm worried about performance.
def myUDF(row_tuple): foo, bar, ham = row_tuple result = pl.DataFrame({"a": foo + bar,"b": ham }) return (result.to_dict(),)df.map_rows(myUDF).unnest("column_0").explode("a", "b")shape: (3, 2)┌──────┬─────┐│ a ┆ b ││ --- ┆ --- ││ f64 ┆ str │╞══════╪═════╡│ 7.0 ┆ a ││ 9.0 ┆ b ││ 11.0 ┆ c │└──────┴─────┘
Whats the "correct" way of doing this in polars?