I am using Pandas 2.1.3.
I am trying to join two dataframes on multiple index levels, and one of the index levels has NA's. The minimum reproducible example looks something like this:
a = pd.DataFrame({'idx_a':['A', 'A', 'B'],'idx_b':['alpha', 'beta', 'gamma'],'idx_c': [1.0, 1.0, 1.0],'x':[10, 20, 30]}).set_index(['idx_a', 'idx_b', 'idx_c'])b = pd.DataFrame({'idx_b':['gamma', 'delta', 'epsilon', np.nan, np.nan],'idx_c': [1.0, 1.0, 1.0, 1.0, 1.0],'y':[100, 200, 300, 400, 500]}).set_index(['idx_b', 'idx_c'])c = a.join( b, how='inner', on=['idx_b', 'idx_c'])print(a) xidx_a idx_b idx_c A alpha 1.0 10 beta 1.0 20B gamma 1.0 30print(b) yidx_b idx_c gamma 1.0 100delta 1.0 200epsilon 1.0 300NaN 1.0 400 1.0 500print(c) x yidx_a idx_b idx_c B gamma 1.0 30 100 1.0 30 400 1.0 30 500
I would have expected:
print(c) x yidx_a idx_b idx_c B gamma 1.0 30 100
Why is join
matching on the NaN
values?