I have a dataframe, df_prices, which contain information about a set of objects. AI have 3 columns: 'uid', 'date' and 'price', where 'uid' has multiple (repeated) entries:
df_pricesindex uid date price 0 123 02-02-2000 111001 123 01-01-2000 222002 123 03-03-2000 440003 123 04-04-2000 660004 456 03-03-2000 777005 456 02-02-2000 888006 456 04-04-2000 66600... ... ... ...98 987 01-01-2005 1230099 987 04-04-2005 45600100 987 05-05-2005 78900
I want to group the dataframe based on the 'uid' column, then sort each group based on 'date' column, and extract the 'price' on the "oldest" date.
The result will saved in another dataframe with only 'uid' (with no duplicated entries) and 'price' which contains the oldest price.
So, the needed result should be:
df_oldest_pricesindex uid price 0 123 222001 456 88800... ... ... 40 987 12300
I have managed to achieve the result with an iterrows loop:
uids = df_prices['uid']uids_unique = list(set(uids))df_oldest_prices['uid'] = uids_uniquedf_oldest_prices['price'] = np.nanfor index, row in df_oldest_prices.iterrows(): uid = row['uid'] group = df_prices.loc[df_prices['uid']==uid] groupsorted = group.sort_values('date') firstrow = groupsorted.iloc[0] price = firstrow['price'] df_oldest_prices.at[index, 'price'] = price
However, I need a vectorized pandas code, for optimized speed.
I have carefully checked all the questions/answers in this question, however none of them is suitable for my task. In particular, in my task I can not use a pivot, because I don't need aggregated data. I need to sort data in 'date', and then select a single element in the 'price' column, the one on the same row as the oldest sorted date. So, no aggregation, no pivoting.