How to substitute text from files in git history?
I've always used an interface based git client (smartGit) and thus don't have much experience with the git console.
However, I now face the need to substitute a string in all .txt files from history (so, not erasing the whole file but just substituting a string). I found the following command:
git filter-branch --tree-filter 'git ls-files -z "*.php" |xargs -0 perl -p -i -e "s#(PASSWORD1|PASSWORD2|PASSWORD3)#xXxXxXxXxXx#g"' -- --all
I tried this, and unfortunately noticed that while the password did get changed, all binary files got corrupted. Images, etc. would all be corrupted.
Is there a better way to do this that won't corrupt my binary files?
Thanks.
EDIT:
I got mixed up with something. The actual code that caused binary files to get corrupted was:
$ git filter-branch --tree-filter "find . -type f -exec sed -i -e 's/originalpassword/newpassword/g' {} \;"
The code at the top actually removed all files with my password strangely enough.
I'd recommend using the BFG Repo-Cleaner, a simpler, faster alternative to git-filter-branch
specifically designed for rewriting files from Git history.
You should carefully follow these steps here: https://rtyley.github.io/bfg-repo-cleaner/#usage - but the core bit is just this: download the BFG's jar (requires Java 7 or above) and run this command:
$ java -jar bfg.jar --replace-text replacements.txt -fi *.php my-repo.git
The replacements.txt
file should contain all the substitutions you want to do, in a format like this (one entry per line - note the comments shouldn't be included):
PASSWORD1 # Replace literal string 'PASSWORD1' with '***REMOVED***' (default)
PASSWORD2==>examplePass # replace with 'examplePass' instead
PASSWORD3==> # replace with the empty string
regex:password=\w+==>password= # Replace, using a regex
regex:\r(\n)==>$1 # Replace Windows newlines with Unix newlines
Your entire repository history will be scanned, and .php
files (under 1MB in size) will have the substitutions performed: any matching string (that isn't in your latest commit) will be replaced.
Full disclosure: I'm the author of the BFG Repo-Cleaner.
You can avoid touching undesired files by passing -name "pattern"
to find
.
This works for me:
git filter-branch --tree-filter "find . -name '*.php' -exec sed -i -e \
's/originalpassword/newpassword/g' {} \;"
With Git 2.24 (Q4 2019), git filter-branch
(and BFG) is deprecated.
newren/git-filter-repo
does NOT do what you want.
It has an example that is ALMOST what you want in its example section:
cd repo
git filter-repo --path-glob '*.txt' --replace-text expressions.txt
with expressions.txt
:
literal:originalpassword==>newpassword
However, WARNING: As Hasturkun adds in the comments
Using
--path-glob
(or--path
) causesgit filter-branch
to only keep files matching those specifications.
The functionality to only replace text in specific files is available in bfg-ish as-fi
, or thelint-history
script.
Otherwise, it looks like this is only currently possible with a custom commit callback.
Seenewren/git-filter-repo
issue 74
Which makes senses, considering the --replace-text
option is itself a blob callback.