I would like to convert one column of data to multiple columns in dataframe based on certain values/conditions.
Please find the code to generate the input dataframe
df1 = pd.DataFrame({'VARIABLE':['studyid',1,'age_interview', 65,'Gender','1.Male','2.Female','Ethnicity','1.Chinese','2.Indian','3.Malay']})
The data looks like as shown below
Please note that I may not know the column names in advance. But it usually follows this format. What I have shown above is a sample data and real data might have around 600-700 columns and data arranged in this fashion
What I would like to do is convert values which start with non-digits(characters) as new columns in dataframe. It can be a new dataframe.
I attempted to write a for loop but failed to due to the below error. Can you please help me achieve this outcome.
for i in range(3,len(df1)):#str(df1['VARIABLE'][i].contains('^\d')) if (df1['VARIABLE'][i].astype(str).contains('^\d') == True):
Through the above loop, I was trying to check whether first char is a digit, if yes, then retain it as a value (ex: 1,2,3 etc) and if it's a character (ex:gender, ethnicity etc), then create a new column. But guess this is an incorrect and lengthy approach
For example, in the above example, the columns would be studyid,age_interview,Gender,Ethnicity.
The final output would look like this
Can you please let me know if there is an elegant approach to do this?