Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 14097

Python : name inner pair of values as null and when u have outer pair of values are the same

$
0
0

enter image description here

i want to derive derive_this_column with help of duplicate_thype and lat_lng column.

so what it does is it is finding the duplicate values of lat_lng and it defines first_duplicate and last_duplicate based on occurences.

however, if u see last line and first line both lat_lng pair is same so we have mark them as first_ and last_duplicates, however within that range we have another 2 pair of first_ and last_ duplicates which is causing the data discrepancy,so,whenever i find any pair of first and last_ duplicate inside a first_ and last_duplicats having same lat_lng values i wanna mark them as null

i tried this code however, it is giving everything as null.

import pandas as pdRK_df['first_duplicate'] = RK_df['duplicate_type'].eq('first_duplicate') & RK_df['duplicate_type'].notna()RK_df['last_duplicate'] = RK_df['duplicate_type'].eq('last_duplicate') & RK_df['duplicate_type'].notna()# Find pairs of first and last duplicates with the same lat_lngduplicates_within_range = RK_df.groupby('lat_lng')['first_duplicate', 'last_duplicate'].transform('sum')# Mark duplicates_within_range as null if both first_duplicate and last_duplicate are presentRK_df['derive_this_column'] = RK_df.apply(    lambda row: 'null' if row['first_duplicate'] and row['last_duplicate'] and duplicates_within_range.loc[row.name, 'first_duplicate'] > 1 else row['duplicate_type'],    axis=1)# Drop the intermediate columns used for calculationRK_df.drop(['first_duplicate', 'last_duplicate'], axis=1, inplace=True)

Viewing all articles
Browse latest Browse all 14097

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>