Is git-annex appropriate for my scenario?

I have a git repository with source codes I want to put in the open on github.

However, I also have gigabytes of data that I don't want to have in the open and in the repo - they are big, they are proprietary, they are "burdened" with copyrights and so on. However, those are also logically "part of the same project" and I wish to have some control over their history (basically, what git already does).

Right now, I have them in the directory "data" in the repository and I have the directory ignored and I resign on getting them to git.

However, I have read about git-annex and it seems it can do what I want. So, I have two questions.

  • Is git annex appropriate for me?
  • How exactly should I use git annex for my scenario? Meaning - which commands should I use and how?

    I have tried to read the official documentation but it talks about use cases that I don't care about. I have the data on one computer only and I don't think I will be moving them soon (it's nice to have the possibility, but it's not why I want to use git annex). Also, the documentation is pretty hard to read.


Solution 1:

Git-annex could indeed help you out on big binary blobs of data. I think you should however consider not to put this in the same repository as the one with your source code. It would one need to download lots of data in order to clone your repository and it will be hard to reclaim space if those big files get updated some time.

Therefore, I suggest to take a look at Git submodules and make /data a submodule to another repository containing mostly or only Git-annex data.

I think this approach will help to keep your source code repository clean and fast, yet provides a way to use version control to some extent on the big binary blobs.

edit/update: I think it actually does not make much of a difference whether you create a submodule for this or not. In the end, it's just Git annex and users can download the files on demand; there's nothing default downloading all files on the clone.