I've had a look and can't seem to find a solution to this issue. I'm wanting to calculate the rolling sum of the previous 30 days' worth of data at each date in the dataframe - by subgroup - for a set of data that isn't daily - it's spaced fairly irregularly. I've been attempting to use ChatGPT which is getting in a twist over it.
Initially the suggestion was that I'd not converted the Date column to datetime format to allow for the rolling calculation, but now from the code below:
import pandas as pdfrom datetime import datetime, timedeltaimport numpy as np# Create a dataset with irregularly spaced dates spanning two yearsnp.random.seed(42)date_rng = pd.date_range(start='2022-01-01', end='2023-12-31', freq='10D') # Every 10 daysdata = {'Date': np.random.choice(date_rng, size=30),'Group': np.random.choice(['A', 'B'], size=30),'Value': np.random.randint(1, 30, size=30)}df = pd.DataFrame(data)# Sort DataFrame by datedf.sort_values(by='Date', inplace=True)df['Date'] = pd.to_datetime(df['Date'])# Calculate cumulative sum by group within the previous 30 days from each daydf['RollingSum_Last30Days'] = df.groupby('Group')['Value'].transform(lambda x: x.rolling(window='30D', min_periods=1).sum())I'm getting an error of:
ValueError: window must be an integer 0 or greaterI've found conflicting comments online as to whether the format '30D' works in rolling windows but I'm none the wiser as to a solution to this. Any help appreciated.
Running in VSCode in Python 3.11.8.