Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 13951

Replicating records in Pandas based on some condition and efficiently [duplicate]

$
0
0

I have a pandas data frame with records like the below:

df = pd.DataFrame({'APPN': [1001, 1002, 1003, 1004, 1005, 1006],'Applct_Id_1': ['A', 'B', 'C', 'D', None, 'F'],'Applct_Id_2': [None, 'E', 'F', None, 'G', None],'Applct_Id_3': ['W', 'Z', None, 'Y', None],'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank'],'Age': [25, 30, 35, 40, 45, 50]})

Ideally, all values for APPN are unique. However, there are different Applct_Ids like Applct_Id_1 etc in each APPN. Like 1001 one has A (Applct_Id_1) and W (Applct_Id_3). Applct_Id_2 is None so not of interest. What I want to do is to replicate the records for in row with 1001 on Applct_Id_1 and Applct_Id_3. The idea I have is to create a new column called ID_Number and record the values for Applct_Id_1 and Applct_Id_3 for each APPN like 1001 for this example. This will be followed by a copy of row affecting this APPPN. I acknowledge that this will be different for other APPN. Therefore, the replication of records will only be for APPNs with more than 1 recorded Applct_Id in the dataset. At the end I want to achieve something like this for 1001 as an example.

new_df = pd.DataFrame({'APPN': [1001, 1001],'ID_Number': ['A', 'W'],   'Name': ['Alice', 'Alice'],'Age': [25, 25]})

How can I do this in an efficient way in Pandas as I'll be dealing with about 400K records?


Viewing all articles
Browse latest Browse all 13951

Trending Articles