I'm working in a codebase where I see a lot of groupby usage like this that operates on a subset of the columns of df
df[cols].groupby(some_column).nunique()[column2extract]
where cols
includes some_column
and column2extract
, and in most coses cols = [some_column, column2extract]
Functionally, I think this is equivalent to
df.groupby(some_column).nunique()[column2extract]
Is there some advantage to the former that I should be aware of? I see this often throughout this codebase, and I feel I may be missing something.
Actually, I think the 2 are only equivalent when cols = [some_column, column2extract]
and not necessarily equivalent when cols
contain additional columns