I have a long running Python script that takes 1-2 hours to complete. It's running on a 4gb container with 1 CPU.
The script fetches and processes data in a for loop. Something like the following:
for i in ENOUGH_API_CALLS_TO_TAKE_2_HOURS: data = fetch_data() process_data(data)
The 4gb container crashes halfway through script execution due to lack of memory. There's no way any individual API call comes close to retrieving 4gb of data though.
After using tracemalloc
to debug, I think Python is slowly eating memory on each API call without releasing it back to the OS. Eventually crashing the process by exceeding memory limits.
I've read threads that discuss using multiprocessing to ensure memory gets released when tasks complete. But here I only have 1 CPU so I don't have a second processor to work with.
Is there any other way to release memory back to the OS from inside my main thread?
Note I've tried gc.collect()
without any success.