I have a dataframe that looks like the following, with x number of person ids (more than 1000 persons), x number of transactions per person, and x number of variables (more than 1000 variables):
| Person_ID | transaction_ID | variable_1 | variable_2 | variable_3 | variable_X |
|---|---|---|---|---|---|
| person1 | transaction1 | 123 | 0 | 1 | abc |
| person1 | transaction2 | 456 | 1 | 0 | def |
| person1 | transaction3 | 123 | 0 | 1 | abc |
| personx | transaction1 | 123 | 0 | 1 | abc |
| personx | transaction2 | 456 | 0 | 1 | def |
I want to pad it with rows containing -10 at the beginning of every person id group so that the total number of rows per person id group is 6, like the following:
| Person_ID | transaction_ID | variable_1 | variable_2 | variable_3 | variable_X |
|---|---|---|---|---|---|
| person1 | -10 | -10 | -10 | -10 | -10 |
| person1 | -10 | -10 | -10 | -10 | -10 |
| person1 | -10 | -10 | -10 | -10 | -10 |
| person1 | transaction1 | 123 | 0 | 1 | abc |
| person1 | transaction2 | 456 | 1 | 0 | def |
| person1 | transaction3 | 123 | 0 | 1 | abc |
| personx | -10 | -10 | -10 | -10 | -10 |
| personx | -10 | -10 | -10 | -10 | -10 |
| personx | -10 | -10 | -10 | -10 | -10 |
| personx | -10 | -10 | -10 | -10 | -10 |
| personx | transaction1 | 123 | 0 | 1 | abc |
| personx | transaction2 | 456 | 0 | 1 | def |
Here is the code I tried (updated with concat) and the error below it.
df2 = pd.DataFrame([[''] * len(newdf.columns)], columns=newdf.columns)df2for row in newdf.groupby('person_id')['transaction_id']: x=newdf.groupby('person_id')['person_id'].nunique() if x.any() < 6: newdf=pd.concat([newdf, df2*(6-x)], ignore_index=True)RuntimeWarning: '<' not supported between instances of 'int' and 'tuple', sort order is undefined for incomparable objects. newdf=pd.concat([newdf, df2*(6-x)], ignore_index=True)It appended several NaN rows to the bottom of the dataframe, but not inbetween groups as needed. Thank you in advance as I am a beginner.