Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 14011

Breaking up dataframe into chunks for a loop

$
0
0

I am using a loop (that was answered in this question) to iteratively open several csv files, transpose them, and concatenate them into a large dataframe. Each csv file is 15 mb and over 10,000 rows. There are over 1000 files. I am finding that the first 50 loops happen within a few seconds but then each loop takes a minute. I wouldn't mind keeping my computer on overnight but I may need to do this multiple times and I'm worried that it will get exponentially slower. Is there a more memory efficient way to do this such breaking up df into chunks of 50 rows each and then concatenating all of them at the end?

In the following code, df is a dataframe of 1000 rows that has columns to indicate folder and file name.

 merged_data = pd.DataFrame() count = 0 for index, row in df.iterrows():    folder_name = row['File ID'].strip()    file_name = row['File Name'].strip()    file_path = os.path.join(root_path, folder_name, file_name)    file_data = pd.read_csv(file_path, names=['Case', f'{folder_name}_{file_name}'], sep='\t')    file_data_transposed = file_data.set_index('Case').T.reset_index(drop=True)    file_data_transposed.insert(loc=0, column='folder_file_id', value=str(folder_name+'_'+file_name))    merged_data = pd.concat([merged_data, file_data_transposed], axis=0, ignore_index=True)    count = count + 1    print(count)

Viewing all articles
Browse latest Browse all 14011

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>