Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 13951

Rolling sum within 30 non-datetime days

$
0
0

I've been racking my brain trying to figure out the best way to do this. I want to find the rolling sum of the previous 30 days but my 'day' column is not in datetime format.

Sample data

df = pd.DataFrame({'client': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B'], 'day': [319, 323, 336, 352, 379, 424, 461, 486, 496, 499, 303, 334, 346, 373, 374, 395, 401, 408, 458, 492],'foo': [5.0, 2.0, np.nan, np.nan, np.nan, np.nan, np.nan, 7.0, np.nan, np.nan, 8.0, 7.0, 22.0, np.nan, 13.0, np.nan, np.nan, 5.0, 11.0, np.nan]}>>> df   client  day   foo0       A  319   5.01       A  323   2.02       A  336   NaN3       A  352   NaN4       A  379   NaN5       A  424   NaN6       A  461   NaN7       A  486   7.08       A  496   NaN9       A  499   NaN10      B  303   8.011      B  334   7.012      B  346  22.013      B  373   NaN14      B  374  13.015      B  395   NaN16      B  401   NaN17      B  408   5.018      B  458  11.019      B  492   NaN

I want a new column showing the rolling sum of 'foo' every 30 days.

So far I've tried:

df['foo_30day'] = df.groupby('client').rolling(30, on='day', min_periods=1)['foo'].sum().values

But it looks like it's taking the rolling sum of the last 30 rows.

I was also thinking of maybe changing the 'day' column to a datetime format, then using rolling('30D') but I'm not sure how or even if that's the best approach. I've also tried to use a groupby reindex to stretch the 'day' column and doing a simple rolling(30) but it's not working for me.

Any advice would be greatly appreciated.


Viewing all articles
Browse latest Browse all 13951

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>