Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23160

Python - Rolling Indexing in Polars library?

$
0
0

I'd like to ask around if anyone knows how to do rolling indexing in polars?I have personally tried a few solutions which did not work for me (I'll show them below):

What I'd like to do: Indexing the number of occurrences within the past X days by NameExample: Let's say I'd like to index occurrences within the past 2 days:

NameDateCounter
John1 Jan 231
John1 Jan 232
John1 Jan 233
John1 Jan 234
John2 Jan 235
John2 Jan 236
John2 Jan 237
John2 Jan 238
John3 Jan 235
John3 Jan 236
New Guy1 Jan 231

In this case, the counter resets to "1" starting from the past X days (e.g. for 3 Jan 23, it starts "1" from 2 Jan 23), or if a new name is detected

What I've tried:

df.groupby_rolling(index_column='Date', period='2d', by='Name', check_sorted=False).agg((pl.col("Date").rank(method='ordinal')).alias("Counter"))

The above does not work because it outputs:

NameDateCounter
John1 Jan 231,2,3,4
John1 Jan 231,2,3,4
John1 Jan 231,2,3,4
John1 Jan 231,2,3,4
John2 Jan 231...8
John2 Jan 231...8
John2 Jan 231...8
John2 Jan 231...8
John3 Jan 231...6
John3 Jan 231...6
New Guy1 Jan 231
df.with_columns( Counter=pl.col("mask").rolling_sum(window_size='2d', by="Date") )

Where I made a column "Mask" which is just a column of "1"s, and tried to sum them, but it outputs:

NameDateMaskCounter
John1 Jan 2314
John1 Jan 2314
John1 Jan 2314
John1 Jan 2314
John2 Jan 2318
John2 Jan 2318
John2 Jan 2318
John2 Jan 2318
John3 Jan 2316
John3 Jan 2316

And it also cannot handle "New Guy" correctly because rolling_sum cannot do by="Name", "Date"

df.with_columns(Counter = pl.col("Date").rank(method='ordinal').over(["Name", "Date"]) )

The above code works correctly, but can only be used for indexing within the same day (i.e. period="1d")

Additional Notes: I also did this in Excel, and also using a brute/raw method of using a "for"-loop. Both worked perfectly, however they struggled with huge amounts of data.

What I read:Some references to help in answers: (Most didn't work because they have fixed rolling window instead of a dynamic window by "Date")

How to implement rolling rank in Polars version 0.19

https://github.com/pola-rs/polars/issues/4808

How to do group_by_rolling grouped by day by hour in polars in Python?

How to groupby and rolling in polars?

https://docs.pola.rs/py-polars/html/reference/series/api/polars.Series.rank.html

https://docs.pola.rs/py-polars/html/reference/dataframe/api/polars.DataFrame.groupby_rolling.html


Viewing all articles
Browse latest Browse all 23160

Trending Articles