Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 14185

Group by on entire dataframe vs group by on subset of columns of dataframe

$
0
0

I'm working in a codebase where I see a lot of groupby usage like this that operates on a subset of the columns of df

df[cols].groupby(some_column).nunique()[column2extract]

where cols includes some_column and column2extract, and in most coses cols = [some_column, column2extract]

Functionally, I think this is equivalent to

df.groupby(some_column).nunique()[column2extract]

Is there some advantage to the former that I should be aware of? I see this often throughout this codebase, and I feel I may be missing something.

Actually, I think the 2 are only equivalent when cols = [some_column, column2extract] and not necessarily equivalent when cols contain additional columns


Viewing all articles
Browse latest Browse all 14185

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>