I am currently "converting" from pandas to polars as I really like the api. This question is a more generally question to a previous question of mine (see here)
I have the following dataframe
# Dummy datadf = pl.DataFrame({"Buy_Signal": [1, 0, 1, 0, 1, 0, 0], "Returns": [0.01, 0.02, 0.03, 0.02, 0.01, 0.00, -0.01],})
I want to ultimately do aggregations on column Returns
conditional on different intervals - which are given by column Buy_Signal
. In the above case the length is from each 1 to the end of the dataframe. The resulting dataframe should therefore look like this
| group | Returns ||------: |--------: || u32 | f64 || 1 | 0.01 || 1 | 0.02 || 1 | 0.03 || 1 | 0.02 || 1 | 0.01 || 1 | 0.0 || 1 | -0.01 || 2 | 0.03 || 2 | 0.02 || 2 | 0.01 || 2 | 0.0 || 2 | -0.01 || 3 | 0.01 || 3 | 0.0 || 3 | -0.01 |
One approach posted as an answer to my previous question is the following:
# Build overlapping group indexidx = df.select(index= pl.when(pl.col("Buy_Signal") == 1) .then(pl.int_ranges(pl.int_range(pl.len()), pl.len() ))).explode(pl.col("index")).drop_nulls().cast(pl.UInt32)# Join index with original datadf = (df.with_row_index() .join(idx, on="index") .with_columns(group = (pl.col("index") == pl.col("index").max()) .shift().cum_sum().backward_fill() + 1) .select(["group", "Returns"]))df
Question: are there other good solutions to this problem.
With good I mean (i) readable and/or (ii) fast.
My actual problem contains much larger datasets.
Thanks