Edited for clarity: I'm trying to get the url of a certain file type within certain folders and subfolders. I have a list of folders that I never want to enter during the walk called bannedDir
. If any words from bannedDir
appears, I want to skip these directories entirely from the walk, which I believe I have done. I have a list of regex expressions called flaggedDir
. If any of the words from flaggedDir
are in the root directory, I want to do something to everything below that root.
What I want to do below that root is first, exclude searching in directories that are in excludedDir
which consists of all entries from bannedDir and some entries from flaggedDir
. Then I want to get the mtime of all xls files or fileType
from the remaining folders. Then store the maximum mtime from that list of mtimes which I've called iniList
.
Current code below.
for root, dirs, files in os.walk(topDir, topdown=True):dirs[:] = [d for d in dirs if d not in bannedDir] if flaggedDir.search(root) is not None: dirs[:] = [d for d in dirs if d not in excludedDir] for name in files: if name.lower().endswith(fileTypes): lastModif = []; timeIndex = []; fileLocation = os.path.join(root, name); time = os.path.getmtime(fileLocation); timeIndex.append(time); lastModif.append([fileLocation,time]); if len(lastModif) > 0: iniList.append(max(lastModif, key=lambda item: item[1]));
so for example,
topDir = [C:\\Test\]fileTypes = '.xls'bannedDir = [a,b]flaggedDir = [c,d]excludedDir = [a,b,c]dir a -- file 1.xls,dir b -- file 5.exe,dir c -- file 2.exe,dir d -- file 3.xls, file 4.exe, file 5.xls
I should be able to get only file3.xls & file 5.xls as dir a, b and c should have been skipped. Then I should just get file 3.xls as 3 has an mtime of 5000 whereas file 5 has an mtime of 2000. My question is it appears I'm traversing twice over certain directories with my code. I'm also not getting the maximum each subdirectory. How do I fix this?