I have a df that contains a subject ID and an item ID. An item ID may appear multiple times for one subject ID and multiple item IDs may be assigned to one subject ID.
subid itemid0 0001 11111 0001 11122 0001 11133 0001 11134 0002 11145 0002 1114I also have a dictionary where each key is a subject ID and each value is all the item IDs that have been assigned to that subject.
dict = {(0001: '1111', '1112'), (0002: '1114')}I want to iterate through each row of the df and check a) whether the subid if found in the dictionary and b) if yes, whether the itemid is assigned to that subid in the dictionary. If the answer to either question is no, I want to remove that row from the df. In the above example, I'd want rows 2 and 3 removed because 0001: '1113' does not appear in the dictionary.
I know I'm way off on this. I started by trying to create a for loop using either df.iterrows() or df.index. I don't know if this is the right way to go about it, or where to go next. I get the "unhasable type: 'Series'" error for the code below. Any help is appreciated.
for index, row in df.iterrows(): if df['subid'] in dict: if df['itemid'] in dict: continue else: df.drop(index, inplace=True) else: df.drop(index, inplace=True)