Quantcast
Viewing all articles
Browse latest Browse all 14069

Pandas groupby two columns and create a new column based on difference in days from third column

I have a dataframe with creation_timestamps and personal_id's. Each personal_id has one or more application_id's. An application_id can have a bunch of rows, but they all have the same creation_timestamp.

I want to create a column that indicates the days between two application_id's for a given personal_id. All the offer rows within the application_id should show the same days between the two application_id's.

Here's the code I've tried to modify in a bunch of ways:

# Calculate days between consecutive applications for each personal_iddf['days_between_applications'] = (    df.groupby(['personal_id'])['creation_timestamp']    .diff()    .dt.days)

Here's a preview of the current result:

personal_idapplication_idcreation_timestampdays_between_applications
1007c1de3552cfb0e18e5d77199f2019-05-16 06:53:57.817842NaN
1008c1de3552cfb0e18e5d77199f2019-05-16 06:53:57.8178420.0
1010c1de3552cfb0e18e5d77199f2019-05-16 06:53:57.8178420.0
1006c1de3552cfb0e18e5d77199f2019-05-16 06:53:57.8178420.0
7094153c1de3552cfb0 1f64dd61aee22023-08-07 11:01:45.5881731533.0
7094147c1de3552cfb0 1f64dd61aee2 2023-08-07 11:01:45.588173 0.0

This is what I'm trying to achieve:

personal_idapplication_idcreation_timestampdays_between_applications
1007c1de3552cfb0e18e5d77199f2019-05-16 06:53:57.817842NaN
1008c1de3552cfb0e18e5d77199f2019-05-16 06:53:57.817842NaN
1010c1de3552cfb0e18e5d77199f2019-05-16 06:53:57.817842NaN
1006c1de3552cfb0e18e5d77199f2019-05-16 06:53:57.817842NaN
7094153c1de3552cfb0 1f64dd61aee22023-08-07 11:01:45.5881731533.0
7094147c1de3552cfb0 1f64dd61aee2 2023-08-07 11:01:45.588173 1533.0

I've tried to forward fill, sort and then forward fill, groupby personal_id and application_id etc, but nothing has worked.


Viewing all articles
Browse latest Browse all 14069

Trending Articles