Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 13891

How to Populate Null Values in Columns After Outer Join in Python Pandas

$
0
0

My goal is to join two dataframes from different sources in Python using Pandas and then fill null values in columns with corresponding values in the same column.

The dataframes have similar columns, but some text/object columns may have different values due to variations in the data sources. For instance, the "Name" column in one dataframe might contain "Nick M." while in the other it's "Nick Maison". However, certain columns such as "Date" (formatted as YYYY-MM-DD), "Order ID" (numeric), and "Employee ID" (numeric) have consistent values across both dataframes (we join dataframes based on them). Worth mentioning, some columns may not even exist in one or another dataframe, but should also be filled.

import pandas as pd# Create DataFrame df1df1_data = {'Date (df1)': ['2024-03-18', '2024-03-18', '2024-03-18', '2024-03-18', '2024-03-18', "2024-03-19", "2024-03-19"],'Order Id (df1)': [1, 2, 3, 4, 5, 1, 2],'Employee Id (df1)': [825, 825, 825, 825, 825, 825, 825],'Name (df1)': ['Nick M.', 'Nick M.', 'Nick M.', 'Nick M.', 'Nick M.', 'Nick M.', 'Nick M.'],'Region (df1)': ['SD', 'SD', 'SD', 'SD', 'SD', 'SD', 'SD'],'Value (df1)': [25, 37, 18, 24, 56, 77, 25]}df1 = pd.DataFrame(df1_data)# Create DataFrame df2df2_data = {'Date (df2)': ['2024-03-18', '2024-03-18', '2024-03-18', "2024-03-19", "2024-03-19", "2024-03-19", "2024-03-19"],'Order Id (df2)': [1, 2, 3, 1, 2, 3, 4],'Employee Id (df2)': [825, 825, 825, 825, 825, 825, 825],  'Name (df2)': ['Nick Mason', 'Nick Mason', 'Nick Mason', 'Nick Mason', 'Nick Mason', 'Nick Mason', 'Nick Mason'],  'Region (df2)': ['San Diego', 'San Diego', 'San Diego', 'San Diego', 'San Diego', 'San Diego', 'San Diego'],  'Value (df2)': [25, 37, 19, 22, 17, 9, 76]  }df2 = pd.DataFrame(df2_data)# Combine DataFramesouter_joined_df = pd.merge(                            df1,                            df2,                            how = 'outer',                            left_on = ['Date (df1)', 'Employee Id (df1)', "Order Id (df1)"],                            right_on = ['Date (df2)', 'Employee Id (df2)', "Order Id (df2)"]                        )# Display the resultouter_joined_df

Here is the output of joined dataframes. Null values colored in yellow should be filled.

enter image description here

I tried below code and it works for Date, Order Id and Employee Id columns as expected (because they are the same across two dataframes and we join based on them), but not for other, because they may have different values. Basically, the logic in this code is if Null, then fill with values from the same row in specified column. However, since values may be different, filled column becomes messy, because it has multiple variations of the same value.

outer_joined_df['Date (df1)'] = outer_joined_df['Date (df1)'].combine_first(outer_joined_df['Date (df2)'])outer_joined_df['Date (df2)'] = outer_joined_df['Date (df2)'].combine_first(outer_joined_df['Date (df1)'])outer_joined_df['Order Id (df1)'] = outer_joined_df['Order Id (df1)'].combine_first(outer_joined_df['Order Id (df2)'])outer_joined_df['Order Id (df2)'] = outer_joined_df['Order Id (df2)'].combine_first(outer_joined_df['Order Id (df1)'])outer_joined_df['Employee Id (df1)'] = outer_joined_df['Employee Id (df1)'].combine_first(outer_joined_df['Employee Id (df2)'])outer_joined_df['Employee Id (df2)'] = outer_joined_df['Employee Id (df2)'].combine_first(outer_joined_df['Employee Id (df1)'])outer_joined_df['Name (df1)'] = outer_joined_df['Name (df1)'].combine_first(outer_joined_df['Name (df2)'])outer_joined_df['Name (df2)'] = outer_joined_df['Name (df2)'].combine_first(outer_joined_df['Name (df1)'])outer_joined_df['Region (df1)'] = outer_joined_df['Region (df1)'].combine_first(outer_joined_df['Region (df2)'])outer_joined_df['Region (df2)'] = outer_joined_df['Region (df2)'].combine_first(outer_joined_df['Region (df1)'])

Here is the output:

enter image description here

As you can see, it populated the data, but not the way I want.

Output I need:

enter image description here


Viewing all articles
Browse latest Browse all 13891

Trending Articles