Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 13951

Finding overlapping rows with 2 or more identical values

$
0
0

EDIT: hopefully clarified the problem and corrected the first dataframe to match the result dataframe

example dataframe.

df = pd.DataFrame({'recipe':['meal 1','meal 2', 'meal 3', 'meal 4','meal 5'],'vegetable':['carrot','carrot','beets','carrot','artichoke'],'fruit':['banana','apple','banana','banana','banana'],'protein':['beef','chicken','beef','fish','fish'],'calories':[10, 50, 100, 150, 200]})

Assuming it's ordered (here by calories ASC) I'm trying to add a new column named 'master meal' to the DataFrame.This column will contain the name of the first recipe that shares a significant overlap in ingredients with the current recipe. A significant overlap is defined as sharing at least two ingredients.

If a recipe has already been used as a 'master meal' or has a 'master meal' assigned to it, it should not be considered for subsequent rows.

in this example, the result would be:

df = pd.DataFrame({'recipe':['meal 1','meal 2', 'meal 3', 'meal 4','meal 5'],'vegetable':['carrot','carrot','beets','carrot','artichoke'],'fruit':['banana','apple','banana','banana', 'banana'],'protein':['beef','chicken','beef','fish', 'fish'],'calories':[10, 50, 100, 150, 200],'master meal': ['meal 1',None,'meal 1','meal 1', None]})

(ie. 'meal 5' won't get the master meal value set to 'meal 4' because 'meal 4' has been tagged already)

I was able to build something with apply() where I compared each row to the rest of the data frame, but as you can imagine, it didn't work too well when applied to a bigger dataset.I scratched my head all day to find a vectorized approach without success.

Maybe you have a better idea? I don't know how I can avoid looping through the dataframe or if so, doing it efficiently.


Viewing all articles
Browse latest Browse all 13951

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>