I have two datasets: A and B.
If dataset A's year, ID, delivery, type, and vendor columns all match with dataset B's Tags, ID, qtr, TYPE, and msc columns, then I want to replace the project_id of the matching row in dataset A with the Project Name from the corresponding row in dataset B. Otherwise, I don't want to change the project_id in A.
dataset A:
Year ID deliv Gen type vendor project_id2022 BR Q2 2022 L aa d BR2 aa1 Q2 2022 - L2022 BR Q2 2022 L dd d BR2 dd1 Q2 2022 - L2022 BR Q2 2022 L dd d BR2 dd2 Q2 2022 - L2022 BR Q3 2022 L bb d BR2 bb1 Q3 2022 - L2022 BR Q4 2022 L aa d BR2 aa1 Q4 2022 - L2022 BR Q4 2022 L dd nd BR2 dd1 Q4 2022 - Ldataset B:
Project Name Tags ID qtr TYPE msc NUMBB H_AA01 Q4 2022 2022 BOLOL Q4 2022 aa d 01BR2 H_DD_nd02 Q4 2022 2022 BR Q4 2022 dd nd 02BR2 BB01 Q3.2022 2022 BR Q3 2022 bb d 01BR2 H_DD01 Q2 2022 2022 BR Q2 2022 dd d 01BR2 H_DD02 Q2 2022 2022 BR Q2 2022 dd d 02BR2 H_AA01 Q2 2022 2022 BR Q2 2022 aa d 01desired result:
Year ID delivery Gen type vendor project_id2022 BR Q2 2022 L aa d BR2 H_AA01 Q2 20222022 BR Q2 2022 L dd d BR2 H_DD01 Q2 20222022 BR Q2 2022 L dd d BR2 H_DD02 Q2 20222022 BR Q3 2022 L bb d BR2 BB01 Q3.20222022 BR Q4 2022 L aa d BR2 aa1 Q4 2022 - L2022 BR Q4 2022 L dd nd BR2 H_DD_nd02 Q4 2022Here is my current attempt:
df_merged = pd.merge(df_A, df_B[['Project Name', 'Tags', 'ID', 'qtr', 'TYPE', 'msc']], how='left', left_on=['Year', 'ID', 'delivery', 'type', 'vendor'], right_on=['Tags', 'ID', 'qtr', 'TYPE', 'msc'])# Replacing 'project_id' in A with 'Project Name' from B where there is a matchdf_merged['project_id'] = df_merged['Project Name'].combine_first(df_merged['project_id'])# Dropping unnecessary columns from the mergedf_final = df_merged.drop(['Project Name', 'Tags', 'qtr', 'TYPE', 'msc'], axis=1)However, the above script blows up my dataset, creating unnecessary rows and multiple columns.
The final output should have the same enumber of rows and columns as the original dataset. The only difference is that the project_id column is being updated. How do I properly perform this operation?