Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 16448

How to add a column or change data in each group after using group by in Pandas?

$
0
0

I am now using Pandas to handle some data. After I used group by in pandas, the simplified DataFrame's format is [MMSI(Vessel_ID), BaseTime, Location, Speed, Course,...].

I use

for MMSI, group in grouped_df:    print(MMSI)    print(group)

to print the data.

For example, one group of data is:

             MMSI         BaseDateTime       LAT        LON  SOG  COG1507  538007509.0  2022-12-08T00:02:25  49.29104 -123.19135  0.0  9.6   1508  538007509.0  2022-12-08T00:05:25  49.29102 -123.19138  0.0  9.6   

I want to add a column which is the time difference of two points.

Below is the Output I want

             MMSI         BaseDateTime       LAT        LON  SOG  COG   Time-diff1507  538007509.0  2022-12-08T00:02:25  49.29104 -123.19135  0.0  9.6   3.0(hours)1508  538007509.0  2022-12-08T00:05:25  49.29102 -123.19138  0.0  9.6   Na

So I use the code below to try to get the result:

for MMSI, group in grouped_df:    group = group.sort_values(by='BaseDateTime')    group['new-time'] = group.shift(-1)['BaseDateTime']    group.dropna()    for x in group.index:      group.loc[x,'time-diff'] = get_timediff(group.loc[x,'new-time'],group.loc[x,'BaseDateTime']) # A function to calculate the time difference    group['GROUP'] = group['time-diff'].fillna(np.inf).ge(2).cumsum()    # When Time-diff >= 2hours split them into different group

I can use print to show group['GROUP'] and group['time-diff']. The result is not shown after I tried to visit grouped_df again. There's a warning showing that my group in grouped_df is just a copy of a slice from a DataFrame and it recommend me using .loc[row_indexer,col_indexer] = value instead. But in this case I don't know how to use .loc to visit the specific [row,col].

At the very beginning, I tried to use

  grouped_df['new-time'] = grouped_df.shift(-1)['BaseDateTime']  grouped_df.dropna()

But it shows

'DataFrameGroupBy' object does not support item assignment

Now my solution is create an empty_df and then concatenate the groups in grouped_df step by step like this:

df['time-diff'] = pd.Series(dtype='float64')df['GROUP'] = pd.Series(dtype='int')grouped_df = df.groupby('MMSI')for MMSI, group in grouped_df:    # ... as the same as the code above    group = group.sort_values(by='BaseDateTime')    group['new-time'] = group.shift(-1)['BaseDateTime']    group.dropna()    for x in group.index:      group.loc[x,'time-diff'] = get_timediff(group.loc[x,'new-time'],group.loc[x,'BaseDateTime']) # A function to calculate the time difference    group['GROUP'] = group['time-diff'].fillna(np.inf).ge(2).cumsum()    # ... as the same as the code above    frame = [empty_df, group]    empty_df = pd.concat(frames)

I am not satisfied with this solution but I didn't find a proper way to change the value in grouped_df.

I'm now trying to use the solution from this question to handle the DataFrame before group by.

Can someone help me?


Viewing all articles
Browse latest Browse all 16448

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>