Pyspark Group By Date Range

I have a sample pyspark dataframe that can be created like this

sample_df = spark.createDataFrame([    ('2020-01-01', '2021-01-01', 1),    ('2020-02-01', '2021-02-01', 1),    ('2021-01-15', '2022-01-15', 2),    ('2022-01-15', '2023-01-15', 2),    ('2022-02-01', '2023-02-01', 3),    ('2022-03-01', '2023-03-01', 3),    ('2023-03-01', '2024-03-01', 4),  ], ['item_date', 'max_window', 'expected_grouping_index'])

After sorting by item_date, I want to assume the first item starts a grouping. Any following item that is less than or equal to the first items max_window (which will always be the same number of days added to the item_date for the entire df, about 365 days in this example) will be given the same grouping_index.

If an item does not fall inside the grouping, it will start a new grouping and be given another arbitrary grouping_index. Then all following items will be assessed based on that items max_window. And so on.

The grouping_index is just a means goal, I eventually want to only keep the first row in each group.

How can I achieve this without a UDF or converting to a pandas df?

Pyspark Group By Date Range

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...