Quantcast
Viewing all articles
Browse latest Browse all 14069

how to fetch the latest row while joining two datasets and the latest row should be less than the date from the dataset 'A' [duplicate]

Consider I have a dataset which has a date column that is generated everyday as given below.

DF_A            ID  name    qty   date1   abc     20   17/01/20221   abc     10   18/01/20222   def     10   24/01/20222   def     40   25/01/20222   def     67   26/01/2022DF_BID  name    price_dt    price1   abc   18/01/2022    23.561   abc   17/01/2022    10.561   abc   16/01/2022    44.331   abc   15/01/2022    56.112   def   25/01/2022    2.982   def   26/01/2022    4.922   def   27/01/2022    4.882   def   24/01/2022    3.332   def   23/01/2022    8.472   def   22/01/2022    3.89

I'm joining the DF_A with DF_B and I only need the recent price_dt record that is less than the date column. This can be done my joining the 2 DF's and dropping the duplicates by sorting the price_dt DESC but the challenge is the DF size is so huge and joining is not feasible. so im looking to reduce the rows in DF_B before joining.

Code that I tried

DF_C =  pd.merge(DF_A,DF_B,on='ID',how='left')# (This actually give 40 rows which is not optimum way of doing for larger dataset)Expected_DF = DF_C.sort_values(by=['price_dt'], ascending=False)Expected_DF = Expected_DF.drop_duplicates(subset=['ID','name','date'],keep='first')

Expected_DF:

ID  name    qty    date     price_dt    price1   abc    20   17/01/2022  16/01/2022  44.331   abc    10   18/01/2022  17/01/2022  10.562   def    10   24/01/2022  23/01/2022  8.472   def    40   25/01/2022  24/01/2022  3.332   def    67   26/01/2022  25/01/2022  2.98

Im looking for feasible method when I can reduce the memory usage instead of fetching all the matching records from DF_B


Viewing all articles
Browse latest Browse all 14069

Trending Articles