I've been racking my brain trying to figure out the best way to do this. I want to find the rolling sum of the previous 30 days but my 'day' column is not in datetime format.
Sample data
df = pd.DataFrame({'client': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B'], 'day': [319, 323, 336, 352, 379, 424, 461, 486, 496, 499, 303, 334, 346, 373, 374, 395, 401, 408, 458, 492],'foo': [5.0, 2.0, np.nan, np.nan, np.nan, np.nan, np.nan, 7.0, np.nan, np.nan, 8.0, 7.0, 22.0, np.nan, 13.0, np.nan, np.nan, 5.0, 11.0, np.nan]}>>> df client day foo0 A 319 5.01 A 323 2.02 A 336 NaN3 A 352 NaN4 A 379 NaN5 A 424 NaN6 A 461 NaN7 A 486 7.08 A 496 NaN9 A 499 NaN10 B 303 8.011 B 334 7.012 B 346 22.013 B 373 NaN14 B 374 13.015 B 395 NaN16 B 401 NaN17 B 408 5.018 B 458 11.019 B 492 NaN
I want a new column showing the rolling sum of 'foo' every 30 days.
So far I've tried:
df['foo_30day'] = df.groupby('client').rolling(30, on='day', min_periods=1)['foo'].sum().values
But it looks like it's taking the rolling sum of the last 30 rows.
I was also thinking of maybe changing the 'day' column to a datetime format, then using rolling('30D')
but I'm not sure how or even if that's the best approach. I've also tried to use a groupby reindex to stretch the 'day' column and doing a simple rolling(30)
but it's not working for me.
Any advice would be greatly appreciated.