Replace strings in files by Python, recursively in given directory and its subdirectories?

Solution 1:

Put all this code into a file called mass_replace. Under Linux or Mac OS X, you can do chmod +x mass_replace and then just run this. Under Windows, you can run it with python mass_replace followed by the appropriate arguments.

#!/usr/bin/python

import os
import re
import sys

# list of extensions to replace
DEFAULT_REPLACE_EXTENSIONS = None
# example: uncomment next line to only replace *.c, *.h, and/or *.txt
# DEFAULT_REPLACE_EXTENSIONS = (".c", ".h", ".txt")

def try_to_replace(fname, replace_extensions=DEFAULT_REPLACE_EXTENSIONS):
    if replace_extensions:
        return fname.lower().endswith(replace_extensions)
    return True


def file_replace(fname, pat, s_after):
    # first, see if the pattern is even in the file.
    with open(fname) as f:
        if not any(re.search(pat, line) for line in f):
            return # pattern does not occur in file so we are done.

    # pattern is in the file, so perform replace operation.
    with open(fname) as f:
        out_fname = fname + ".tmp"
        out = open(out_fname, "w")
        for line in f:
            out.write(re.sub(pat, s_after, line))
        out.close()
        os.rename(out_fname, fname)


def mass_replace(dir_name, s_before, s_after, replace_extensions=DEFAULT_REPLACE_EXTENSIONS):
    pat = re.compile(s_before)
    for dirpath, dirnames, filenames in os.walk(dir_name):
        for fname in filenames:
            if try_to_replace(fname, replace_extensions):
                fullname = os.path.join(dirpath, fname)
                file_replace(fullname, pat, s_after)

if len(sys.argv) != 4:
    u = "Usage: mass_replace <dir_name> <string_before> <string_after>\n"
    sys.stderr.write(u)
    sys.exit(1)

mass_replace(sys.argv[1], sys.argv[2], sys.argv[3])

EDIT: I have changed the above code from the original answer. There are several changes. First, mass_replace() now calls re.compile() to pre-compile the search pattern; second, to check what extension the file has, we now pass in a tuple of file extensions to .endswith() rather than calling .endswith() three times; third, it now uses the with statement available in recent versions of Python; and finally, file_replace() now checks to see if the pattern is found within the file, and doesn't rewrite the file if the pattern is not found. (The old version would rewrite every file, changing the timestamps even if the output file was identical to the input file; this was inelegant.)

EDIT: I changed this to default to replacing every file, but with one line you can edit to limit it to particular extensions. I think replacing every file is a more useful out-of-the-box default. This could be extended with a list of extensions or filenames not to touch, options to make it case insensitive, etc.

EDIT: In a comment, @asciimo pointed out a bug. I edited this to fix the bug. str.endswith() is documented to accept a tuple of strings to try, but not a list. Fixed. Also, I made a couple of the functions accept an optional argument to let you pass in a tuple of extensions; it should be pretty easy to modify this to accept a command-line argument to specify which extensions.

Solution 2:

Do you really need regular expressions?

import os

def recursive_replace( root, pattern, replace )
    for dir, subdirs, names in os.walk( root ):
        for name in names:
            path = os.path.join( dir, name )
            text = open( path ).read()
            if pattern in text:
                open( path, 'w' ).write( text.replace( pattern, replace ) )

Solution 3:

Of course, if you just want to get it done without coding it up, use find and xargs:

find /home/noa/Desktop/codes -type f -print0 | \
xargs -0 sed --in-place "s/dbname=noa user=noa/dbname=masi user=masi"

(And you could likely do this with find's -exec or something as well, but I prefer xargs.)

Solution 4:

This is how I would find and replace strings in files using python. This is a simple little function that will recursively search a directories for a string and replace it with a string. You can also limit files with a certain file extension like the example below.

import os, fnmatch
def findReplace(directory, find, replace, filePattern):
    for path, dirs, files in os.walk(os.path.abspath(directory)):
        for filename in fnmatch.filter(files, filePattern):
            filepath = os.path.join(path, filename)
            with open(filepath) as f:
                s = f.read()
            s = s.replace(find, replace)
            with open(filepath, "w") as f:
                f.write(s)

This allows you to do something like:

findReplace("some_dir", "find this", "replace with this", "*.txt")