df = df.query('Time >= @start_time and Time < @end_time')df.loc[:, 'date'] = pd.to_datetime(df['date'])df['ret'] = (df.groupby(['date', 'sym'])['Close'] .ffill() .pct_change())
when I used this to calculate the return, I found that the first row of each groupbyer's ['ret']
is not NaN
. It shows a huge number which wouldn't be the return of a minute bar. So I guess pandas ffill
between different groups. But I don't know how to solve this.
For example,My DataFrame's columns are date, sym, Time, and Close.
The result is supposed to be
Value | |
---|---|
30986938 | NaN |
30986939 | 0.000934 |
30986940 | 0.001386 |
30986941 | -0.000461 |
30986942 | 0.000462 |
30986943 | -0.000180 |
30986944 | 0.000180 |
but it gives
Value | |
---|---|
30986938 | -0.148827 |
30986939 | 0.000934 |
30986940 | 0.001386 |
30986941 | -0.000461 |
30986942 | 0.000462 |
30986943 | -0.000180 |
30986944 | 0.000180 |
I tried use apply
/transform(lambda x: x.ffill())
or groupby(, as_index=False)
. All don't work.
I found the bug. I should use
df.groupby(['date', 'sym'])['Close'].apply(lambda x: x.ffill().pct_change())