Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

transform and replace values in existing column IF multiple matching values exist using pandas

$
0
0

I have two datasets: A and B.

If dataset A's year, ID, delivery, type, and vendor columns all match with dataset B's Tags, ID, qtr, TYPE, and msc columns, then I want to replace the project_id of the matching row in dataset A with the Project Name from the corresponding row in dataset B. Otherwise, I don't want to change the project_id in A.

dataset A:

Year    ID  deliv   Gen type    vendor  project_id2022    BR  Q2 2022 L   aa      d       BR2 aa1 Q2 2022 - L2022    BR  Q2 2022 L   dd      d       BR2 dd1 Q2 2022 - L2022    BR  Q2 2022 L   dd      d       BR2 dd2 Q2 2022 - L2022    BR  Q3 2022 L   bb      d       BR2 bb1 Q3 2022 - L2022    BR  Q4 2022 L   aa      d       BR2 aa1 Q4 2022 - L2022    BR  Q4 2022 L   dd      nd      BR2 dd1 Q4 2022 - L

dataset B:

Project Name          Tags  ID      qtr     TYPE    msc NUMBB H_AA01 Q4 2022     2022  BOLOL   Q4 2022 aa      d   01BR2 H_DD_nd02 Q4 2022 2022  BR      Q4 2022 dd      nd  02BR2 BB01 Q3.2022      2022  BR      Q3 2022 bb      d   01BR2 H_DD01 Q2 2022    2022  BR      Q2 2022 dd      d   01BR2 H_DD02 Q2 2022    2022  BR      Q2 2022 dd      d   02BR2 H_AA01 Q2 2022    2022  BR      Q2 2022 aa      d   01

desired result:

Year    ID  delivery    Gen type    vendor  project_id2022    BR  Q2 2022     L   aa      d       BR2 H_AA01 Q2 20222022    BR  Q2 2022     L   dd      d       BR2 H_DD01 Q2 20222022    BR  Q2 2022     L   dd      d       BR2 H_DD02 Q2 20222022    BR  Q3 2022     L   bb      d       BR2 BB01 Q3.20222022    BR  Q4 2022     L   aa      d       BR2 aa1 Q4 2022 - L2022    BR  Q4 2022     L   dd      nd      BR2 H_DD_nd02 Q4 2022

Here is my current attempt:

df_merged = pd.merge(df_A, df_B[['Project Name', 'Tags', 'ID', 'qtr', 'TYPE', 'msc']],                      how='left',                      left_on=['Year', 'ID', 'delivery', 'type', 'vendor'],                      right_on=['Tags', 'ID', 'qtr', 'TYPE', 'msc'])# Replacing 'project_id' in A with 'Project Name' from B where there is a matchdf_merged['project_id'] = df_merged['Project Name'].combine_first(df_merged['project_id'])# Dropping unnecessary columns from the mergedf_final = df_merged.drop(['Project Name', 'Tags', 'qtr', 'TYPE', 'msc'], axis=1)

However, the above script blows up my dataset, creating unnecessary rows and multiple columns.

The final output should have the same enumber of rows and columns as the original dataset. The only difference is that the project_id column is being updated. How do I properly perform this operation?


Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>