Is git good with binary files? [closed]

Is git good with binary files?

If I have a lot of uncompressed files being modified, and many compressed files never (or almost never) modified, would git handle it well? For example, if I insert or remove the middle and insert data near the end it will notice it as it does with text?

If git isn't good with binary files, what tool might I consider?


Solution 1:

Out of the box, git can easily add binary files to its index, and also store them in an efficient way unless you do frequent updates on large uncompressable files.

The problems begin when git needs to generate diffs and merges: git cannot generate meaningful diffs, or merge binary files in any way that could make sense. So all merges, rebases or cherrypicks involving a change to a binary file will involve you making a manual conflict resolution on that binary file.

You need to decide whether the binary file changes are rare enough that you can live with the extra manual work they cause in the normal git workflow involving merges, rebases, cherrypicks.

Solution 2:

In addition to other answers.

  • You can send a diff to binary file using so called binary diff format. It is not human-readable, and it can only be applied if you have exact preimage in your repository, i.e. without any fuzz.
    An example:

    diff --git a/gitweb/git-favicon.png b/gitweb/git-favicon.png
    index de637c0608090162a6ce6b51d5f9bfe512cf8bcf..aae35a70e70351fe6dcb3e905e2e388cf0cb0ac3 100
    GIT binary patch
    delta 85
    zcmZ3&SUf?+pEJNG#Pt9J149GD|NsBH{?u>)*{Yr{jv*Y^lOtGJcy4sCvGS>LGzvuT
    nGSco!%*slUXkjQ0+{(x>@rZKt$^5c~Kn)C@u6{1-oD!M<s|Fj6
    
    delta 135
    zcmXS3!Z<;to+rR3#Pt9J149GDe=s<ftM(tr<t*@sEM{Qf76xHPhFNnYfP!|OE{-7;
    zjI0MY3OYE5upapO?DR{I1pyyR7cx(jY7y^{FfMCvb5IaiQM`NJfeQjFwttKJyJNq@
    hveI=@x=fAo=hV3$-MIWu9%vGSr>mdKI;RB2CICA_GnfDX
    
  • You can use textconv gitattribute to have git diff show human-readable diff for binary files, or parts of binary files. For example for *.jpg files it can be difference in EXIF information, for PDF files it can be difference between their text representation (pdf2text or something like that).

HTH.

Solution 3:

If you've got really large binary files, you can use git-annex to store the data outside of the repository. Check out: http://git-annex.branchable.com/