Do I understand os.walk right?

os.walk returns a generator, that creates a tuple of values (current_path, directories in current_path, files in current_path).

Every time the generator is called it will follow each directory recursively until no further sub-directories are available from the initial directory that walk was called upon.

As such,

os.walk('C:\dir1\dir2\startdir').next()[0] # returns 'C:\dir1\dir2\startdir'
os.walk('C:\dir1\dir2\startdir').next()[1] # returns all the dirs in 'C:\dir1\dir2\startdir'
os.walk('C:\dir1\dir2\startdir').next()[2] # returns all the files in 'C:\dir1\dir2\startdir'

So

import os.path
....
for path, directories, files in os.walk('C:\dir1\dir2\startdir'):
     if file in files:
          print('found %s' % os.path.join(path, file))

or this

def search_file(directory = None, file = None):
    assert os.path.isdir(directory)
    for cur_path, directories, files in os.walk(directory):
        if file in files:
            return os.path.join(directory, cur_path, file)
    return None

or if you want to look for file you can do this:

import os
def search_file(directory = None, file = None):
    assert os.path.isdir(directory)
    current_path, directories, files = os.walk(directory).next()
    if file in files:
        return os.path.join(directory, file)
    elif directories == '':
        return None
    else:
        for new_directory in directories:
            result = search_file(directory = os.path.join(directory, new_directory), file = file)
            if result:
                return result
        return None

Minimal runnable example

This is how I like to learn stuff:

mkdir root
cd root
mkdir \
  d0 \
  d1 \
  d0/d0_d1
touch \
  f0 \
  d0/d0_f0 \
  d0/d0_f1 \
  d0/d0_d1/d0_d1_f0
tree

Output:

.
├── d0
│   ├── d0_d1
│   │   └── d0_d1_f0
│   ├── d0_f0
│   └── d0_f1
├── d1
└── f0

main.py

#!/usr/bin/env python3
import os
for path, dirnames, filenames in os.walk('root'):
    print('{} {} {}'.format(repr(path), repr(dirnames), repr(filenames)))

Output:

'root' ['d0', 'd1'] ['f0']
'root/d0' ['d0_d1'] ['d0_f0', 'd0_f1']
'root/d0/d0_d1' [] ['d0_d1_f0']
'root/d1' [] []

This makes everything clear:

  • path is the root directory of each step
  • dirnames is a list of directory basenames in each path
  • filenames is a list of file basenames in each path

Tested on Ubuntu 16.04, Python 3.5.2.

Modifying dirnames changes the tree recursion

This is basically the only other thing you have to keep in mind.

E.g., if you do the following operations on dirnames, it affects the traversal:

  • sort
  • filter

Walk file or directory

If the input to traverse is either a file or directory, you can handle it like this:

#!/usr/bin/env python3

import os
import sys

def walk_file_or_dir(root):
    if os.path.isfile(root):
        dirname, basename = os.path.split(root)
        yield dirname, [], [basename]
    else:
        for path, dirnames, filenames in os.walk(root):
            yield path, dirnames, filenames

for path, dirnames, filenames in walk_file_or_dir(sys.argv[1]):
    print(path, dirnames, filenames)