Iterate through folders, then subfolders and print filenames with path to text file

Charles' answer is good, but can be improved upon to increase speed and efficiency. Each item produced by os.walk() (See docs) is a tuple of three items. Those items are:

  1. The working directory
  2. A list of strings naming any sub-directories present in the working directory
  3. A list of files present in the working directory

Knowing this, much of Charles' code can be condensed with the modification of a forloop:

import os

def list_files(dir):
    r = []
    for root, dirs, files in os.walk(dir):
        for name in files:
            r.append(os.path.join(root, name))
    return r

Use os.walk(). The following will output a list of all files within the subdirectories of "dir". The results can be manipulated to suit you needs:

import os                                                                                                             
                                                                                                                      
def list_files(dir):                                                                                                  
    r = []                                                                                                            
    subdirs = [x[0] for x in os.walk(dir)]                                                                            
    for subdir in subdirs:                                                                                            
        files = os.walk(subdir).next()[2]                                                                             
        if (len(files) > 0):                                                                                          
            for file in files:                                                                                        
                r.append(os.path.join(subdir, file))                                                                         
    return r                                                                                                          

For python 3, change next() to __next__().


This will help to list specific file extension. In my sub-folders i have many files but i am only interested parquet files.

import os
dir = r'/home/output/'
def list_files(dir):
r = []
for root, dirs, files in os.walk(dir):
    for name in files:
        filepath = root + os.sep + name
        if filepath.endswith(".snappy.parquet"):
            r.append(os.path.join(root, name))
return r