I am trying to do something to all the files under a given path. I don't want to collect all the file names beforehand then do something with them, so I tried this:

import os
import stat

def explore(p):
  s = ''
  list = os.listdir(p)
  for a in list:
    path = p + '/' + a
    stat_info = os.lstat(path )
    if stat.S_ISDIR(stat_info.st_mode):
     explore(path)
    else:
      yield path

if __name__ == "__main__":
  for x in explore('.'):
    print '-->', x

But this code skips over directories when it hits them, instead of yielding their contents. What am I doing wrong?


Solution 1:

Iterators do not work recursively like that. You have to re-yield each result, by replacing

explore(path)

with something like

for value in explore(path):
    yield value

Python 3.3 added the syntax yield from X, as proposed in PEP 380, to serve this purpose. With it you can do this instead:

yield from explore(path)

If you're using generators as coroutines, this syntax also supports the use of generator.send() to pass values back into the recursively-invoked generators. The simple for loop above would not.

Solution 2:

The problem is this line of code:

explore(path)

What does it do?

  • calls explore with the new path
  • explore runs, creating a generator
  • the generator is returned to the spot where explore(path) was executed . . .
  • and is discarded

Why is it discarded? It wasn't assigned to anything, it wasn't iterated over -- it was completely ignored.

If you want to do something with the results, well, you have to do something with them! ;)

The easiest way to fix your code is:

for name in explore(path):
    yield name

When you are confident you understand what's going on, you'll probably want to use os.walk() instead.

Once you have migrated to Python 3.3 (assuming all works out as planned) you will be able to use the new yield from syntax and the easiest way to fix your code at that point will be:

yield from explore(path)

Solution 3:

Use os.walk instead of reinventing the wheel.

In particular, following the examples in the library documentation, here is an untested attempt:

import os
from os.path import join

def hellothere(somepath):
    for root, dirs, files in os.walk(somepath):
        for curfile in files:
            yield join(root, curfile)


# call and get full list of results:
allfiles = [ x for x in hellothere("...") ]

# iterate over results lazily:
for x in hellothere("..."):
    print x