Do I understand os.walk right?
os.walk
returns a generator, that creates a tuple of values (current_path, directories in current_path, files in current_path).
Every time the generator is called it will follow each directory recursively until no further sub-directories are available from the initial directory that walk was called upon.
As such,
os.walk('C:\dir1\dir2\startdir').next()[0] # returns 'C:\dir1\dir2\startdir'
os.walk('C:\dir1\dir2\startdir').next()[1] # returns all the dirs in 'C:\dir1\dir2\startdir'
os.walk('C:\dir1\dir2\startdir').next()[2] # returns all the files in 'C:\dir1\dir2\startdir'
So
import os.path
....
for path, directories, files in os.walk('C:\dir1\dir2\startdir'):
if file in files:
print('found %s' % os.path.join(path, file))
or this
def search_file(directory = None, file = None):
assert os.path.isdir(directory)
for cur_path, directories, files in os.walk(directory):
if file in files:
return os.path.join(directory, cur_path, file)
return None
or if you want to look for file you can do this:
import os
def search_file(directory = None, file = None):
assert os.path.isdir(directory)
current_path, directories, files = os.walk(directory).next()
if file in files:
return os.path.join(directory, file)
elif directories == '':
return None
else:
for new_directory in directories:
result = search_file(directory = os.path.join(directory, new_directory), file = file)
if result:
return result
return None
Minimal runnable example
This is how I like to learn stuff:
mkdir root
cd root
mkdir \
d0 \
d1 \
d0/d0_d1
touch \
f0 \
d0/d0_f0 \
d0/d0_f1 \
d0/d0_d1/d0_d1_f0
tree
Output:
.
├── d0
│ ├── d0_d1
│ │ └── d0_d1_f0
│ ├── d0_f0
│ └── d0_f1
├── d1
└── f0
main.py
#!/usr/bin/env python3
import os
for path, dirnames, filenames in os.walk('root'):
print('{} {} {}'.format(repr(path), repr(dirnames), repr(filenames)))
Output:
'root' ['d0', 'd1'] ['f0']
'root/d0' ['d0_d1'] ['d0_f0', 'd0_f1']
'root/d0/d0_d1' [] ['d0_d1_f0']
'root/d1' [] []
This makes everything clear:
-
path
is the root directory of each step -
dirnames
is a list of directory basenames in eachpath
-
filenames
is a list of file basenames in eachpath
Tested on Ubuntu 16.04, Python 3.5.2.
Modifying dirnames
changes the tree recursion
This is basically the only other thing you have to keep in mind.
E.g., if you do the following operations on dirnames
, it affects the traversal:
- sort
- filter
Walk file or directory
If the input to traverse is either a file or directory, you can handle it like this:
#!/usr/bin/env python3
import os
import sys
def walk_file_or_dir(root):
if os.path.isfile(root):
dirname, basename = os.path.split(root)
yield dirname, [], [basename]
else:
for path, dirnames, filenames in os.walk(root):
yield path, dirnames, filenames
for path, dirnames, filenames in walk_file_or_dir(sys.argv[1]):
print(path, dirnames, filenames)