Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 14155

why my pandas groupby ffill will(may?) fill between different groupbys?

$
0
0
df = df.query('Time >= @start_time and Time < @end_time')df.loc[:, 'date'] = pd.to_datetime(df['date'])df['ret'] = (df.groupby(['date', 'sym'])['Close']               .ffill()               .pct_change())

when I used this to calculate the return, I found that the first row of each groupbyer's ['ret'] is not NaN. It shows a huge number which wouldn't be the return of a minute bar. So I guess pandas ffill between different groups. But I don't know how to solve this.

For example,My DataFrame's columns are date, sym, Time, and Close.

The result is supposed to be

Value
30986938NaN
309869390.000934
309869400.001386
30986941-0.000461
309869420.000462
30986943-0.000180
309869440.000180

but it gives

Value
30986938-0.148827
309869390.000934
309869400.001386
30986941-0.000461
309869420.000462
30986943-0.000180
309869440.000180

I tried use apply/transform(lambda x: x.ffill()) or groupby(, as_index=False). All don't work.

I found the bug. I should use

df.groupby(['date', 'sym'])['Close'].apply(lambda x: x.ffill().pct_change())

Viewing all articles
Browse latest Browse all 14155

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>