Pandas one-hot encoding with multiple like columns

I have several 'condition' columns in a dataset. These columns are all eligible to receive the same coded input. This is only to allow multiple conditions to be associated with a single record - which column the code winds up in carries no meaning.

In the sample below there are only 5 unique values across the 3 condition columns, although if you consider each column separately, there are 3 unique values in each. So when I apply one-hot encoding to these variables together I get 9 new columns, but I only want 5 (one for each unique value in the collective set of columns).

Here is a sample of the original data:

| cond1 | cond2 | cond3 | target ||-------|-------|-------|--------|| I219  | E119  | I48   | 1      || I500  |       |       | 0      || I48   | I500  | F171  | 1      || I219  | E119  | I500  | 0      || I219  | I48   |       | 0      |

Here's what I tried:

import pandas as pddf = pd.read_csv('micro.csv', dtype='object')df['cond1'] = pd.Categorical(df['cond1'])df['cond2'] = pd.Categorical(df['cond2'])df['cond3'] = pd.Categorical(df['cond3'])dummies = pd.get_dummies(df[['cond1', 'cond2', 'cond3']], prefix = 'cond')dummies

Which gives me:

| cond_I219 | cond_I48 | cond_I500 | cond_E119 | cond_I48 | cond_I500 | cond_F171 | cond_I48 | cond_I500 ||-----------|----------|-----------|-----------|----------|-----------|-----------|----------|-----------|| 1         | 0        | 0         | 1         | 0        | 0         | 0         | 1        | 0         || 0         | 0        | 1         | 0         | 0        | 0         | 0         | 0        | 0         || 0         | 1        | 0         | 0         | 0        | 1         | 1         | 0        | 0         || 1         | 0        | 0         | 1         | 0        | 0         | 0         | 0        | 1         || 1         | 0        | 0         | 0         | 1        | 0         | 0         | 0        | 0         |

So I have multiple coded columns for any code that appears in more than one column (I48 and I500).. I would like only a single column for each so I can check for correlations between individual codes and my target variable.

Is there a way to do this? This is the result I'm after:

| cond_I219 | cond_I48 | cond_I500 | cond_E119 | cond_F171 ||-----------|----------|-----------|-----------|-----------|| 1         | 1        | 0         | 1         | 0         || 0         | 0        | 1         | 0         | 0         || 0         | 1        | 1         | 0         | 1         || 1         | 0        | 1         | 1         | 0         || 1         | 1        | 0         | 0         | 0         |

Pandas one-hot encoding with multiple like columns

Trending Articles

Police confirm man stabbed to death in Selsdon was Andrew David Else of Croydon

Angry father ordered to compensate daughter’s male friend

Moondru Mudichu 20-07-2016 – Polimer tv Serial

Anthony Wahome Biography, Family, Wife and Children

Sniper: Ghost Warrior 3: Трейнер/Trainer (+17) [1.0 - 1.02] {FLiNG}

IN COURT: Full list of people sentenced at Northampton Magistrates’ Court

DMG Audio Limitless v1.01 WiN/OSX Incl Patched and Keygen-R2R

Madonna – Behind Me (feat. Guido Dos Santos) – Single [iTunes Plus M4A]

A/L Technology Stream – Subject combinations, Syllabuses and Teacher guides

Sri Lankan Actress Nadeesha Hemamali Hot Shoot

Jessica Carradero Lopez Arrested by Miami-Dade County Corrections on Dec 17,...

Prison officer charged!

Download: Rich Bizzy -Panono Ukwenda (Cover)

Jamani mm nauliza hivi second selection za form five zinatoka lini?

Reply: Betrayal at House on the Hill:: Rules:: Re: Haunt #6 - Spoilers Within

Gordian S01e01-73 [H264 - Ita Jap Ac3 - SoftSub Ita]

Hyper-V replication "Enabling Replication Failed"

Stories • Goddess Stepmom

Laura Pausini - Platinum Collection (3Cd) (2009) .mp3 - 320 Kbps

Joseph Bradley – Carlisle