Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 13891

Grouping rows in a Pandas dataframe based on one column, sorting based on a second column, and selecting elements in a third column (not pivoting) [duplicate]

$
0
0

I have a dataframe, df_prices, which contain information about a set of objects. AI have 3 columns: 'uid', 'date' and 'price', where 'uid' has multiple (repeated) entries:

df_pricesindex   uid date        price 0       123 02-02-2000  111001       123 01-01-2000  222002       123 03-03-2000  440003       123 04-04-2000  660004       456 03-03-2000  777005       456 02-02-2000  888006       456 04-04-2000  66600...     ... ...         ...98      987 01-01-2005  1230099      987 04-04-2005  45600100     987 05-05-2005  78900

I want to group the dataframe based on the 'uid' column, then sort each group based on 'date' column, and extract the 'price' on the "oldest" date.

The result will saved in another dataframe with only 'uid' (with no duplicated entries) and 'price' which contains the oldest price.

So, the needed result should be:

df_oldest_pricesindex   uid   price 0       123   222001       456   88800...     ...  ...     40      987   12300

I have managed to achieve the result with an iterrows loop:

uids = df_prices['uid']uids_unique = list(set(uids))df_oldest_prices['uid'] = uids_uniquedf_oldest_prices['price'] = np.nanfor index, row in df_oldest_prices.iterrows():    uid = row['uid']    group = df_prices.loc[df_prices['uid']==uid]    groupsorted = group.sort_values('date')    firstrow = groupsorted.iloc[0]    price = firstrow['price']    df_oldest_prices.at[index, 'price'] = price

However, I need a vectorized pandas code, for optimized speed.

I have carefully checked all the questions/answers in this question, however none of them is suitable for my task. In particular, in my task I can not use a pivot, because I don't need aggregated data. I need to sort data in 'date', and then select a single element in the 'price' column, the one on the same row as the oldest sorted date. So, no aggregation, no pivoting.


Viewing all articles
Browse latest Browse all 13891

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>