I have a folder with a very large number of files that I would like to process. But before that, I need to filter the files and obtain only the .tif files from the directory. There is no nesting so it's just one directory with all these files but there a total of 475,000 files.
I've tried with glob, iglob and os.scandir but it seems like the issue is traversing the large number of files itself.
Using glob:
from glob import globpath = r"\\path\to\the\directory\*.tif"# This glob takes too longfiles = glob(path)Using iglob:
from glob import iglobpath = r"\\path\to\the\directory\*.tif"# This iglob takes too longfiles = iglob(path)for file in files:<do something>Using listdir:
from os import listdirpath = r"\\path\to\the\directory"# This listdir takes too longfor file in listdir(path): if file.endswith(".tif"):<do something>However, I have not been able to measure the time taken by these approaches because the time taken is already quite high.
One last constraint is that the files are on a NFS which causes a delay as well, so, I would like to know if there is a way to speed up the process by, say, multiprocessing.
Thanks in advance.