Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 14271

MultiProcessing in Python Enduring Order of Queue

$
0
0

I am trying to run a loess regression across multiple processors but when my code adds each concurrent dataframe to be appended to the queue they are all out of order and the resulting graph looks awful.

def smooth_data_mp(data_frame):    num_processes = 8    chunk_size = 125000    fraction = 125 / chunk_size    print(data_frame.head())    result_queue = Manager().Queue()    with Pool(processes=num_processes) as pool:        pool.starmap(process_data, [(data_frame, i, chunk_size, fraction, result_queue) for i in                                    range(len(data_frame) // chunk_size)])    # Collect results from the queue in order    result_list = pd.DataFrame(result_queue.get())    while not result_queue.empty():        result_list = pd.concat([result_list, result_queue.get()])    return result_listdef process_data(dataframe, i, chunk_size, fraction, result_queue):    start_frame = chunk_size * i    end_frame = min(chunk_size * (i + 1), len(dataframe))  # Ensure end_frame doesn't exceed length of sampleData    print(f'{start_frame}, {end_frame}') # just for debugging    new_data_frame = calculate_loess_on_subset(dataframe[start_frame:end_frame], chunk_size, fraction, i)    result_queue.put(new_data_frame)

How do I ensure that each dataframe added to the queue in the process_data function is added in the order of occurrence from the original dataset instead of just when the process finishes?

I have tried using different queue types like a regular queue and a manager queue but only the manager worked... but I'm not sure how to fix the problems.


Viewing all articles
Browse latest Browse all 14271

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>