Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 16506

How to skip certain os.walk() directories and process some of those remaining in a special way?

$
0
0

Edited for clarity: I'm trying to get the url of a certain file type within certain folders and subfolders. I have a list of folders that I never want to enter during the walk called bannedDir. If any words from bannedDir appears, I want to skip these directories entirely from the walk, which I believe I have done. I have a list of regex expressions called flaggedDir. If any of the words from flaggedDir are in the root directory, I want to do something to everything below that root.

What I want to do below that root is first, exclude searching in directories that are in excludedDir which consists of all entries from bannedDir and some entries from flaggedDir. Then I want to get the mtime of all xls files or fileType from the remaining folders. Then store the maximum mtime from that list of mtimes which I've called iniList.

Current code below.

for root, dirs, files in os.walk(topDir, topdown=True):dirs[:] = [d for d in dirs if d not in bannedDir]       if flaggedDir.search(root) is not None:    dirs[:] = [d for d in dirs if d not in excludedDir]       for name in files:           if name.lower().endswith(fileTypes):               lastModif = [];               timeIndex = [];               fileLocation = os.path.join(root, name);               time = os.path.getmtime(fileLocation);               timeIndex.append(time);                                            lastModif.append([fileLocation,time]);       if len(lastModif) > 0:        iniList.append(max(lastModif, key=lambda item: item[1]));

so for example,

topDir = [C:\\Test\]fileTypes = '.xls'bannedDir = [a,b]flaggedDir = [c,d]excludedDir = [a,b,c]dir a -- file 1.xls,dir b -- file 5.exe,dir c -- file 2.exe,dir d -- file 3.xls, file 4.exe, file 5.xls

I should be able to get only file3.xls & file 5.xls as dir a, b and c should have been skipped. Then I should just get file 3.xls as 3 has an mtime of 5000 whereas file 5 has an mtime of 2000. My question is it appears I'm traversing twice over certain directories with my code. I'm also not getting the maximum each subdirectory. How do I fix this?


Viewing all articles
Browse latest Browse all 16506

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>