List directory tree structure in python?
I know that we can use os.walk()
to list all sub-directories or all files in a directory. However, I would like to list the full directory tree content:
- Subdirectory 1:
- file11
- file12
- Sub-sub-directory 11:
- file111
- file112
- Subdirectory 2:
- file21
- sub-sub-directory 21
- sub-sub-directory 22
- sub-sub-sub-directory 221
- file 2211
How to best achieve this in Python?
Here's a function to do that with formatting:
import os
def list_files(startpath):
for root, dirs, files in os.walk(startpath):
level = root.replace(startpath, '').count(os.sep)
indent = ' ' * 4 * (level)
print('{}{}/'.format(indent, os.path.basename(root)))
subindent = ' ' * 4 * (level + 1)
for f in files:
print('{}{}'.format(subindent, f))
Similar to answers above, but for python3, arguably readable and arguably extensible:
from pathlib import Path
class DisplayablePath(object):
display_filename_prefix_middle = '├──'
display_filename_prefix_last = '└──'
display_parent_prefix_middle = ' '
display_parent_prefix_last = '│ '
def __init__(self, path, parent_path, is_last):
self.path = Path(str(path))
self.parent = parent_path
self.is_last = is_last
if self.parent:
self.depth = self.parent.depth + 1
else:
self.depth = 0
@property
def displayname(self):
if self.path.is_dir():
return self.path.name + '/'
return self.path.name
@classmethod
def make_tree(cls, root, parent=None, is_last=False, criteria=None):
root = Path(str(root))
criteria = criteria or cls._default_criteria
displayable_root = cls(root, parent, is_last)
yield displayable_root
children = sorted(list(path
for path in root.iterdir()
if criteria(path)),
key=lambda s: str(s).lower())
count = 1
for path in children:
is_last = count == len(children)
if path.is_dir():
yield from cls.make_tree(path,
parent=displayable_root,
is_last=is_last,
criteria=criteria)
else:
yield cls(path, displayable_root, is_last)
count += 1
@classmethod
def _default_criteria(cls, path):
return True
@property
def displayname(self):
if self.path.is_dir():
return self.path.name + '/'
return self.path.name
def displayable(self):
if self.parent is None:
return self.displayname
_filename_prefix = (self.display_filename_prefix_last
if self.is_last
else self.display_filename_prefix_middle)
parts = ['{!s} {!s}'.format(_filename_prefix,
self.displayname)]
parent = self.parent
while parent and parent.parent is not None:
parts.append(self.display_parent_prefix_middle
if parent.is_last
else self.display_parent_prefix_last)
parent = parent.parent
return ''.join(reversed(parts))
Example usage:
paths = DisplayablePath.make_tree(Path('doc'))
for path in paths:
print(path.displayable())
Example output:
doc/
├── _static/
│ ├── embedded/
│ │ ├── deep_file
│ │ └── very/
│ │ └── deep/
│ │ └── folder/
│ │ └── very_deep_file
│ └── less_deep_file
├── about.rst
├── conf.py
└── index.rst
Notes
- This uses recursion. It will raise a RecursionError on really deep folder trees
- The tree is lazily evaluated. It should behave well on really wide folder trees. Immediate children of a given folder are not lazily evaluated, though.
Edit:
- Added bonus! criteria callback for filtering paths.
List directory tree structure in Python?
We usually prefer to just use GNU tree, but we don't always have tree
on every system, and sometimes Python 3 is available. A good answer here could be easily copy-pasted and not make GNU tree
a requirement.
tree
's output looks like this:
$ tree
.
├── package
│ ├── __init__.py
│ ├── __main__.py
│ ├── subpackage
│ │ ├── __init__.py
│ │ ├── __main__.py
│ │ └── module.py
│ └── subpackage2
│ ├── __init__.py
│ ├── __main__.py
│ └── module2.py
└── package2
└── __init__.py
4 directories, 9 files
I created the above directory structure in my home directory under a directory I call pyscratch
.
I also see other answers here that approach that sort of output, but I think we can do better, with simpler, more modern code and lazily evaluating approaches.
Tree in Python
To begin with, let's use an example that
- uses the Python 3
Path
object - uses the
yield
andyield from
expressions (that create a generator function) - uses recursion for elegant simplicity
- uses comments and some type annotations for extra clarity
from pathlib import Path
# prefix components:
space = ' '
branch = '│ '
# pointers:
tee = '├── '
last = '└── '
def tree(dir_path: Path, prefix: str=''):
"""A recursive generator, given a directory Path object
will yield a visual tree structure line by line
with each line prefixed by the same characters
"""
contents = list(dir_path.iterdir())
# contents each get pointers that are ├── with a final └── :
pointers = [tee] * (len(contents) - 1) + [last]
for pointer, path in zip(pointers, contents):
yield prefix + pointer + path.name
if path.is_dir(): # extend the prefix and recurse:
extension = branch if pointer == tee else space
# i.e. space because last, └── , above so no more |
yield from tree(path, prefix=prefix+extension)
and now:
for line in tree(Path.home() / 'pyscratch'):
print(line)
prints:
├── package
│ ├── __init__.py
│ ├── __main__.py
│ ├── subpackage
│ │ ├── __init__.py
│ │ ├── __main__.py
│ │ └── module.py
│ └── subpackage2
│ ├── __init__.py
│ ├── __main__.py
│ └── module2.py
└── package2
└── __init__.py
We do need to materialize each directory into a list because we need to know how long it is, but afterwards we throw the list away. For deep and broad recursion this should be lazy enough.
The above code, with the comments, should be sufficient to fully understand what we're doing here, but feel free to step through it with a debugger to better grock it if you need to.
More features
Now GNU tree
gives us a couple of useful features that I'd like to have with this function:
- prints the subject directory name first (does so automatically, ours does not)
- prints the count of
n directories, m files
- option to limit recursion,
-L level
- option to limit to just directories,
-d
Also, when there is a huge tree, it is useful to limit the iteration (e.g. with islice
) to avoid locking up your interpreter with text, as at some point the output becomes too verbose to be useful. We can make this arbitrarily high by default - say 1000
.
So let's remove the previous comments and fill out this functionality:
from pathlib import Path
from itertools import islice
space = ' '
branch = '│ '
tee = '├── '
last = '└── '
def tree(dir_path: Path, level: int=-1, limit_to_directories: bool=False,
length_limit: int=1000):
"""Given a directory Path object print a visual tree structure"""
dir_path = Path(dir_path) # accept string coerceable to Path
files = 0
directories = 0
def inner(dir_path: Path, prefix: str='', level=-1):
nonlocal files, directories
if not level:
return # 0, stop iterating
if limit_to_directories:
contents = [d for d in dir_path.iterdir() if d.is_dir()]
else:
contents = list(dir_path.iterdir())
pointers = [tee] * (len(contents) - 1) + [last]
for pointer, path in zip(pointers, contents):
if path.is_dir():
yield prefix + pointer + path.name
directories += 1
extension = branch if pointer == tee else space
yield from inner(path, prefix=prefix+extension, level=level-1)
elif not limit_to_directories:
yield prefix + pointer + path.name
files += 1
print(dir_path.name)
iterator = inner(dir_path, level=level)
for line in islice(iterator, length_limit):
print(line)
if next(iterator, None):
print(f'... length_limit, {length_limit}, reached, counted:')
print(f'\n{directories} directories' + (f', {files} files' if files else ''))
And now we can get the same sort of output as tree
:
tree(Path.home() / 'pyscratch')
prints:
pyscratch
├── package
│ ├── __init__.py
│ ├── __main__.py
│ ├── subpackage
│ │ ├── __init__.py
│ │ ├── __main__.py
│ │ └── module.py
│ └── subpackage2
│ ├── __init__.py
│ ├── __main__.py
│ └── module2.py
└── package2
└── __init__.py
4 directories, 9 files
And we can restrict to levels:
tree(Path.home() / 'pyscratch', level=2)
prints:
pyscratch
├── package
│ ├── __init__.py
│ ├── __main__.py
│ ├── subpackage
│ └── subpackage2
└── package2
└── __init__.py
4 directories, 3 files
And we can limit the output to directories:
tree(Path.home() / 'pyscratch', level=2, limit_to_directories=True)
prints:
pyscratch
├── package
│ ├── subpackage
│ └── subpackage2
└── package2
4 directories
Retrospective
In retrospect, we could have used path.glob
for matching. We could also perhaps use path.rglob
for recursive globbing, but that would require a rewrite. We could also use itertools.tee
instead of materializing a list of directory contents, but that could have negative tradeoffs and would probably make the code even more complex.
Comments are welcome!
A solution without your indentation:
for path, dirs, files in os.walk(given_path):
print path
for f in files:
print f
os.walk already does the top-down, depth-first walk you are looking for.
Ignoring the dirs list prevents the overlapping you mention.