Encrypt files before sending them to Cloud

Recently I switched to GIT along and picked bitbucket to host my code.

Now with all the snowden scandal, I decided to review my choices.

I really don't want to host my data on my own server, I'd rather keep using bitbucket for that.

I have nothing but good words about bitbucket themselves, but I just want to make sure my private data (source code) stay that way.

I know many folks used TruCrypt with DropBox, I wonder if this is feasible with bitbucket (ie. force git to transparently and automatically encrypt source code & files before it's sent to bitbucket), if so anyone could share how to do that? I googled a lot but couldn't figure out the proper way to do that.


One solution could be to use git's smudge and clean filters. This solution has some serious downsides though, which we will come to later.

You'll need to write two scripts that act as filters, i.e. read from standard input and write to standard output. From the documentation:

A filter driver consists of a clean command and a smudge command, either of which can be left unspecified. Upon checkout, when the smudge command is specified, the command is fed the blob object from its standard input, and its standard output is used to update the worktree file. Similarly, the clean command is used to convert the contents of worktree file upon checkin.

For example when using openssl we can write the files fooenc.sh:

#!/bin/sh
openssl enc -bf -nopad -pass pass:1KjeHD8d6YUI80bIIEAQ9iYr@njqLw3T

and foodec.sh:

#!/bin/sh
openssl enc -bf -nopad -d -pass pass:1KjeHD8d6YUI80bIIEAQ9iYr@njqLw3T

Note that these scripts should be kept outside of the repository and that they should be kept secret since they contain the key! Otherwise they're convenient because they don't ask for a passphrase everytime they are called.

A somewhat more secure alternative might be to use GPG.

In the .git/config file in your repository you should specify these filters;

[filter "crypt"]
    clean = fooenc.sh
    smudge = foodec.sh

This is not a typo! See the documentation excerpt above. This setup encrypts data on checkin, and decrypts on checkout.

Then in the repository's .git/info/attributes file, you specify to use this filter for all files;

* filter=crypt

As long as the filtering scripts are available, the working directory will contain readable files. But the git objects will be encrypted.

Note that this precludes actually using the files on any machine that doesn't have the necessary scripts. So bitbucket would only work as a storage.

Now for the downside; This solution also makes tools like git diff and everything that depends on that useless, since git's objects are now encrypted blobs.

Edit: There are utuilities like git-crypt or git-encrypt to help you with encrypting your repo's contents.

And there is a solution to the diff problem; using a special filter for diffs; using textconv with an additional script to decrypt the blobs before they are diff-ed.