Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

How to optimize regrouping code for dataframe

$
0
0

I want to optimize code which regroup my pandas dataframe (dk) by joins:

dk = pd.DataFrame({'Point': {0: 15, 1: 16, 2: 16, 3: 17, 4: 17, 5: 18, 6: 18, 7: 19, 8: 20},'join': {0: 0, 1: 0, 2: 1, 3: 1, 4: 2, 5: 2, 6: 3, 7: 3, 8: 4}})

If there two groups with difference joins have one same point, set to both groups one join. And so for all dataframe. I did it with simple code:

dk['new'] = dk['join']for i in dk.index:    for j in range(i+1, dk.shape[0]):        if dk['Point'][i] == dk['Point'][j]:            dk['new'][j] = dk['join'][i]            dk.loc[(dk['join'] == dk['join'][j]), 'new'] = dk['new'][i]   

Result that I want:

df = {'Point': {0: 15, 1: 16, 2: 16, 3: 17, 4: 17, 5: 18, 6: 18, 7: 19, 8: 20},'join': {0: 0, 1: 0, 2: 1, 3: 1, 4: 2, 5: 2, 6: 3, 7: 3, 8: 4},'new': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 4}}

But I need to release it for big data which has more than 450k rows. Do you have any idea how to optimize it or other modules for this problem? Thanks in advance


Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>