How can I search sub-folders using glob.glob module? [duplicate]
I want to open a series of subfolders in a folder and find some text files and print some lines of the text files. I am using this:
configfiles = glob.glob('C:/Users/sam/Desktop/file1/*.txt')
But this cannot access the subfolders as well. Does anyone know how I can use the same command to access subfolders as well?
In Python 3.5 and newer use the new recursive **/
functionality:
configfiles = glob.glob('C:/Users/sam/Desktop/file1/**/*.txt', recursive=True)
When recursive
is set, **
followed by a path separator matches 0 or more subdirectories.
In earlier Python versions, glob.glob()
cannot list files in subdirectories recursively.
In that case I'd use os.walk()
combined with fnmatch.filter()
instead:
import os
import fnmatch
path = 'C:/Users/sam/Desktop/file1'
configfiles = [os.path.join(dirpath, f)
for dirpath, dirnames, files in os.walk(path)
for f in fnmatch.filter(files, '*.txt')]
This'll walk your directories recursively and return all absolute pathnames to matching .txt
files. In this specific case the fnmatch.filter()
may be overkill, you could also use a .endswith()
test:
import os
path = 'C:/Users/sam/Desktop/file1'
configfiles = [os.path.join(dirpath, f)
for dirpath, dirnames, files in os.walk(path)
for f in files if f.endswith('.txt')]
There's a lot of confusion on this topic. Let me see if I can clarify it (Python 3.7):
-
glob.glob('*.txt') :
matches all files ending in '.txt' in current directory -
glob.glob('*/*.txt') :
same as 1 -
glob.glob('**/*.txt') :
matches all files ending in '.txt' in the immediate subdirectories only, but not in the current directory -
glob.glob('*.txt',recursive=True) :
same as 1 -
glob.glob('*/*.txt',recursive=True) :
same as 3 -
glob.glob('**/*.txt',recursive=True):
matches all files ending in '.txt' in the current directory and in all subdirectories
So it's best to always specify recursive=True.
To find files in immediate subdirectories:
configfiles = glob.glob(r'C:\Users\sam\Desktop\*\*.txt')
For a recursive version that traverse all subdirectories, you could use **
and pass recursive=True
since Python 3.5:
configfiles = glob.glob(r'C:\Users\sam\Desktop\**\*.txt', recursive=True)
Both function calls return lists. You could use glob.iglob()
to return paths one by one. Or use pathlib
:
from pathlib import Path
path = Path(r'C:\Users\sam\Desktop')
txt_files_only_subdirs = path.glob('*/*.txt')
txt_files_all_recursively = path.rglob('*.txt') # including the current dir
Both methods return iterators (you can get paths one by one).
The glob2 package supports wild cards and is reasonably fast
code = '''
import glob2
glob2.glob("files/*/**")
'''
timeit.timeit(code, number=1)
On my laptop it takes approximately 2 seconds to match >60,000 file paths.
You can use Formic with Python 2.6
import formic
fileset = formic.FileSet(include="**/*.txt", directory="C:/Users/sam/Desktop/")
Disclosure - I am the author of this package.