I have been trying to merge small parquet files each with 10 k rows and for each set the number of small files will be 60-100. So resulting into around 600k rows minimum in the merged parquet file.
I have been trying to use pandas concat.it is working fine with around 10-15 small files merge.
But as the set may be consists of 50-100 files. The process it is getting killed while running python script with memory limit breached
So i am looking for a memory efficient way to merge any number of small parquet in range of 100 file set
Used pandas read parquet to read each individual dataframe and combine them with pd.conact(all dataframe)
Is there a better library other than pandas or if possible in pandas how it can be done efficiently.
Time is not constraint. It can run for some long time as well.