Filtering a diff with a regular expression

It seems that it would be extremely handy to be able to filter a diff so that trivial changes are not displayed. I would like to write a regular expression which would be run on the line and then pass it another string that uses the captured arguments to generate a canonical form. If the lines before and after produce the same output, then they would be removed from the diff.

For example, I am working on a PHP code base where a significant number of array accesses are written as my_array[my_key] when they should be my_array["my_key"] to prevent issues if a my_key constant is defined. It would be useful to generate a diff where the only change on the line wasn't adding some quotes.

I can't change them all at once, as we don't have the resources to test the entire code base, so am fixing this whenever I make a change to a function. How can I achieve this? Is there anything else similar to this that I can use to achieve a similar result. For example, a simpler method might be to skip the canonical form and just see if the input is transformed into the output. BTW, I am using Git


Solution 1:

grepdiff can be used to filter the hunks in the diff file.

$ git diff -U1 | grepdiff 'console' --output-matching=hunk

It shows only the hunks that match with the given string "console".

Solution 2:

$ git diff --help

-G<regex>
    Look for differences whose added or removed line matches the given <regex>.

EDIT:

After some tests I've got something like

git diff -b -w --word-diff-regex='.*\[[^"]*\]'

Then I've got output like:

diff --git a/test.php b/test.php
index 62a2de0..b76891f 100644
--- a/test.php
+++ b/test.php
@@ -1,3 +1,5 @@
<?php

{+$my_array[my_key]+} = "test";

?>
diff --git a/test1.php b/test1.php
index 62a2de0..6102fed 100644
--- a/test1.php
+++ b/test1.php
@@ -1,3 +1,5 @@
<?php

some_other_stuff();

?>

Maybe it will help you. I found it here http://www.rhinocerus.net/forum/lang-lisp/659593-git-word-diff-regex-lisp-source.html and there is more information on this thread

EDIT2:

git diff -G'\[[A-Za-z_]*\]' --pickaxe-regex

Solution 3:

There does not seem to be any options to Git's diff command to support what you want to do. However, you could use the GIT_EXTERNAL_DIFF environment variable and a custom script (or any executable created using your preferred scripting or programming language) to manipulate a patch.

I'll assume you are on Linux; if not, you could tweak this concept to suit your environment. Let's say you have a Git repo where HEAD has a file file05 that contains:

line 26662: $my_array[my_key]

And a file file06 that contains:

line 19768: $my_array[my_key]
line 19769: $my_array[my_key]
line 19770: $my_array[my_key]
line 19771: $my_array[my_key]
line 19772: $my_array[my_key]
line 19773: $my_array[my_key]
line 19775: $my_array[my_key]
line 19776: $my_array[my_key]

You change file05 to:

line 26662: $my_array["my_key"]

And you change file06 to:

line 19768: $my_array[my_key]
line 19769: $my_array["my_key"]
line 19770: $my_array[my_key]
line 19771: $my_array[my_key]
line 19772: $my_array[my_key]
line 19773: $my_array[my_key]
line 19775: $my_array[my_key2]
line 19776: $my_array[my_key]

Using the following shell script, let's call it mydiff.sh and place it somewhere that's in our PATH:

#!/bin/bash
echo "$@"
git diff-files --patch --word-diff=porcelain "${5}" | awk '
/^-./ {rec = FNR; prev = substr($0, 2);}
FNR == rec + 1 && /^+./ {
    ln = substr($0, 2);
    gsub("\\[\"", "[", ln);
    gsub("\"\\]", "]", ln);
    if (prev == ln) {
        print " " ln;
    } else {
        print "-" prev;
        print "+" ln;
    }
}
FNR != rec && FNR != rec + 1 {print;}
'

Executing the command:

GIT_EXTERNAL_DIFF=mydiff.sh git --no-pager diff

Will output:

file05 /tmp/r2aBca_file05 d86525edcf5ec0157366ea6c41bc6e4965b3be1e 100644 file05 0000000000000000000000000000000000000000 100644
index d86525e..c2180dc 100644
--- a/file05
+++ b/file05
@@ -1 +1 @@
 line 26662: 
 $my_array[my_key]
~
file06 /tmp/2lgz7J_file06 d84a44f9a9aac6fb82e6ffb94db0eec5c575787d 100644 file06 0000000000000000000000000000000000000000 100644
index d84a44f..bc27446 100644
--- a/file06
+++ b/file06
@@ -1,8 +1,8 @@
 line 19768: $my_array[my_key]
~
 line 19769: 
 $my_array[my_key]
~
 line 19770: $my_array[my_key]
~
 line 19771: $my_array[my_key]
~
 line 19772: $my_array[my_key]
~
 line 19773: $my_array[my_key]
~
 line 19775: 
-$my_array[my_key]
+$my_array[my_key2]
~
 line 19776: $my_array[my_key]
~

This output does not show changes for the added quotes in file05 and file06. The external diff script basically uses the Git diff-files command to create the patch and filters the output through a GNU awk script to manipulate it. This sample script does not handle all the different combinations of old and new files mentioned for GIT_EXTERNAL_DIFF nor does it output a valid patch, but it should be enough to get you started.

You could use Perl regular expressions, Python difflib or whatever you're comfortable with to implement an external diff tool that suits your needs.