Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

Speed up filtering large number of files from a folder

$
0
0

I have a folder with a very large number of files that I would like to process. But before that, I need to filter the files and obtain only the .tif files from the directory. There is no nesting so it's just one directory with all these files but there a total of 475,000 files.

I've tried with glob, iglob and os.scandir but it seems like the issue is traversing the large number of files itself.

Using glob:

from glob import globpath = r"\\path\to\the\directory\*.tif"# This glob takes too longfiles = glob(path)

Using iglob:

from glob import iglobpath = r"\\path\to\the\directory\*.tif"# This iglob takes too longfiles = iglob(path)for file in files:<do something>

Using listdir:

from os import listdirpath = r"\\path\to\the\directory"# This listdir takes too longfor file in listdir(path):    if file.endswith(".tif"):<do something>

However, I have not been able to measure the time taken by these approaches because the time taken is already quite high.

One last constraint is that the files are on a NFS which causes a delay as well, so, I would like to know if there is a way to speed up the process by, say, multiprocessing.

Thanks in advance.


Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>