I have a dataframe full of bookings for one room (rows: booking_id, check-in date and check-out date that I want to transform into a timeseries indexed by all year days (index: days of year, feature: booked or not).
I have calculated the duration of the bookings, and reindexed the dataframe daily.Now I need to forward-fill the dataframe, but only a limited number of times: the duration of each booking.
Tried iterating through each row with ffill but it applies to the entire dataframe, not to selected rows.Any idea how I can do that?
Here is my code:
import numpy as npimport pandas as pd#create dataframedata=[[1, '2019-01-01', '2019-01-02', 1], [2, '2019-01-03', '2019-01-07', 4], [3, '2019-01-10','2019-01-13', 3]]df = pd.DataFrame(data, columns=['booking_id', 'check-in', 'check-out', 'duration'])#cast dates to datetime formatsdf['check-in'] = pd.to_datetime(df['check-in'])df['check-out'] = pd.to_datetime(df['check-out'])#create timeseries indexed on check-in datedf2 = df.set_index('check-in')#create new index and reindex timeseriesidx = pd.date_range(min(df['check-in']), max(df['check-out']), freq='D')ts = df2.reindex(idx)I have this:
booking_id check-out duration2019-01-01 1.0 2019-01-02 1.02019-01-02 NaN NaT NaN2019-01-03 2.0 2019-01-07 4.02019-01-04 NaN NaT NaN2019-01-05 NaN NaT NaN2019-01-06 NaN NaT NaN2019-01-07 NaN NaT NaN2019-01-08 NaN NaT NaN2019-01-09 NaN NaT NaN2019-01-10 3.0 2019-01-13 3.02019-01-11 NaN NaT NaN2019-01-12 NaN NaT NaN2019-01-13 NaN NaT NaNI expect to have:
booking_id check-out duration2019-01-01 1.0 2019-01-02 1.02019-01-02 1.0 2019-01-02 1.02019-01-03 2.0 2019-01-07 4.02019-01-04 2.0 2019-01-07 4.02019-01-05 2.0 2019-01-07 4.02019-01-06 2.0 2019-01-07 4.02019-01-07 NaN NaT NaN2019-01-08 NaN NaT NaN2019-01-09 NaN NaT NaN2019-01-10 3.0 2019-01-13 3.02019-01-11 3.0 2019-01-13 3.02019-01-12 3.0 2019-01-13 3.02019-01-13 NaN NaT NaN