Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23276

Filtering a python DataFrame based on whether two column values in each row are found within a dictionary

$
0
0

I have a df that contains a subject ID and an item ID. An item ID may appear multiple times for one subject ID and multiple item IDs may be assigned to one subject ID.

  subid itemid0 0001  11111 0001  11122 0001  11133 0001  11134 0002  11145 0002  1114

I also have a dictionary where each key is a subject ID and each value is all the item IDs that have been assigned to that subject.

dict = {(0001: '1111', '1112'), (0002: '1114')}

I want to iterate through each row of the df and check a) whether the subid if found in the dictionary and b) if yes, whether the itemid is assigned to that subid in the dictionary. If the answer to either question is no, I want to remove that row from the df. In the above example, I'd want rows 2 and 3 removed because 0001: '1113' does not appear in the dictionary.

I know I'm way off on this. I started by trying to create a for loop using either df.iterrows() or df.index. I don't know if this is the right way to go about it, or where to go next. I get the "unhasable type: 'Series'" error for the code below. Any help is appreciated.

for index, row in df.iterrows():    if df['subid'] in dict:        if df['itemid'] in dict:            continue        else:            df.drop(index, inplace=True)    else:        df.drop(index, inplace=True)

Viewing all articles
Browse latest Browse all 23276

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>