Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 18848

Vectorize nestled loops for pairwise distance calculation

$
0
0

How to make the script below more efficient? This is a follow-up to my previous post Python nested loop issue
It currently takes the best part of two hours to process input tables consisting in about 15000 and 1500 rows. Manually processing my data in Excel takes me an order of magnitude less time - not ideal!

I understand that iterrows is a bad approach to the problem, and that vectorisation is the way forward, but I am a bit dumbfounded at how it would work in regards to the second for loop.

The following script extract takes two dataframes,

  • qinsy_file_2
  • segy_vlookup (ignore the naming on that one).

For every row in the qinsy_file_2, it iterates through segy_vlookup to calculate distances between coordinates in each file. If this distance is less than a pre-given value (here named buffer), it will get transcribed to a new dataframe out_df(otherwise it will pass over the row).

# Loop through Qinsy filefor index_qinsy,row_qinsy in qinsy_file_2.iterrows():    # Loop through SEGY navigation    for index_segy,row_segy in segy_vlookup.iterrows():        # Calculate distance between points        if ((((segy_vlookup["CDP_X"][index_segy] - qinsy_file_2["CMP Easting"][index_qinsy])**2) + ((segy_vlookup["CDP_Y"][index_segy] - qinsy_file_2["CMP Northing"][index_qinsy])**2))**0.5)<= buffer:            # Append rows less than the distance modifier to the new dataframe             out_df=pd.concat([out_df,row_qinsy])            break        else:                pass

So far I have read through the following:


Viewing all articles
Browse latest Browse all 18848

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>