What's the fastest way to edit hundreds of Git commit messages?

I have a fairly large Git repository with 1000s of commits, originally imported from SVN. Before I make my repo public, I'd like to clean up a few hundred commit messages that don't make sense in my new repo, as well as to remove all that git-svn informational text that got added.

I know that I can use 'git rebase -i' and then 'git commit --amend' to edit each individual commit message, but with hundreds of messages to be edited, that's a huge pain in the you-know-what.

Is there any faster way to edit all of these commit messages? Ideally I'd have every commit message listed in a single file where I could edit them all in one place.

Thanks!


That's an old question but as there is no mention of git filter-branch, I just add my two cents.

I recently had to mass-replace text in commit message, replacing a block of text by another without changing the rest of the commit messages. For instance, I had to replace Refs: #xxxxx with Refs: #22917.

I used git filter-branch like this

git filter-branch --msg-filter 'sed "s/Refs: #xxxxx/Refs: #22917/g"' master..my_branch
  • I used the option --msg-filter to edit only the commit message but you can use other filters to change files, edit full commit infos, etc.
  • I limited filter-branch by applying it only to the commits that were not in master (master..my_branch) but you can apply it on your whole branch by omitting the range of commits.

As suggested in the doc, try this on a copy of your branch. Hope that helps.


Sources used for the answer

  • Use case on when to use the function : https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History#The-Nuclear-Option:-filter-branch
  • Function reference (with the list of options) : https://git-scm.com/docs/git-filter-branch
  • Examples of rewrite : https://davidwalsh.name/update-git-commit-messages

This is easy to do as follows:

  • Perform first import.
  • Export all commits into text:

    git format-patch -10000
    

    Number should be more than total commits. This will create lots of files named NNNNN-commit-description.patch.

  • Edit these files using some script. (Do not touch anything in them except for top with commit messages).
  • Copy or move edited files to empty git repo or branch.
  • Import all edited commits back:

    git am *.patch
    

This will work only with single branch, but it works very well.


git-filter-repo https://github.com/newren/git-filter-repo is now recommend. I used it like:

PS C:\repository> git filter-repo --commit-callback '
>> msg = commit.message.decode(\"utf-8\")
>> newmsg = msg.replace(\"old string\", \"new string\")
>> commit.message = newmsg.encode(\"utf-8\")
>> ' --force
New history written in 328.30 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 087f91945a blah blah
Enumerating objects: 346091, done.
Counting objects: 100% (346091/346091), done.
Delta compression using up to 8 threads
Compressing objects: 100% (82068/82068), done.
Writing objects: 100% (346091/346091), done.
Total 346091 (delta 259364), reused 346030 (delta 259303), pack-reused 0
Completely finished after 443.37 seconds.
PS C:\repository>

you probably don't want to copy the powershell extra things, so here is just the command:

git filter-repo --commit-callback '
msg = commit.message.decode(\"utf-8\")
newmsg = msg.replace(\"old string\", \"new string\")
commit.message = newmsg.encode(\"utf-8\")
' --force

If you want to hit all the branches don't use --refs HEAD. If you don't want to use --force you can run it on a clean git clone --no-checkout. This got me started: https://blog.kawzeg.com/2019/12/19/git-filter-repo.html


You can use git rebase -i and replace pick with reword (or just r). Then git rebasing stops on every commit giving you a chance to edit the message.

The only disadvantages are that you don't see all messages at once and that you can't go back when you spot an error.


A great and simple way to do this would be to use git filter-branch --msg-filter "" with a python script.

The python script would look something like this:

import os
import sys
import re

pattern = re.compile("(?i)Issue-\d{1,4}")


commit_id = os.environ["GIT_COMMIT"]
message   = sys.stdin.read()

if len(message) > 0:

    if pattern.search(message):
        message = pattern_conn1.sub("Issue",message)

print message

The command line call you would make is git filter-branch -f --msg-filter "python /path/to/git-script.py"