Quantcast
Viewing all articles
Browse latest Browse all 14126

Polars dataframe: overlapping groups

I am currently "converting" from pandas to polars as I really like the api. This question is a more generally question to a previous question of mine (see here)

I have the following dataframe

# Dummy datadf = pl.DataFrame({"Buy_Signal": [1, 0, 1, 0, 1, 0, 0], "Returns": [0.01, 0.02, 0.03, 0.02, 0.01, 0.00, -0.01],})

I want to ultimately do aggregations on column Returns conditional on different intervals - which are given by column Buy_Signal. In the above case the length is from each 1 to the end of the dataframe. The resulting dataframe should therefore look like this

| group     | Returns   ||------:    |--------:  ||   u32     |     f64   ||     1     |    0.01   ||     1     |    0.02   ||     1     |    0.03   ||     1     |    0.02   ||     1     |    0.01   ||     1     |     0.0   ||     1     |   -0.01   ||     2     |    0.03   ||     2     |    0.02   ||     2     |    0.01   ||     2     |     0.0   ||     2     |   -0.01   ||     3     |    0.01   ||     3     |     0.0   ||     3     |   -0.01   |

One approach posted as an answer to my previous question is the following:

# Build overlapping group indexidx = df.select(index=          pl.when(pl.col("Buy_Signal") == 1)          .then(pl.int_ranges(pl.int_range(pl.len()), pl.len()  ))).explode(pl.col("index")).drop_nulls().cast(pl.UInt32)# Join index with original datadf = (df.with_row_index()    .join(idx, on="index")    .with_columns(group = (pl.col("index") == pl.col("index").max())                .shift().cum_sum().backward_fill() + 1)    .select(["group", "Returns"]))df

Question: are there other good solutions to this problem.

With good I mean (i) readable and/or (ii) fast.

My actual problem contains much larger datasets.

Thanks


Viewing all articles
Browse latest Browse all 14126

Trending Articles