Using Git for writing thesis [closed]

I am planning to use Git for writing my thesis with Latex. As Git is specifically designed for software development, would it be feasible for my requirements? If it is a good choice for me, then what special and unique features are available in Git which are ideal for writing a thesis. Also I want to know what precautions I should take before getting into the Git work flow. I am a complete beginner for Git, so what should be my starting point before I get into it.


Solution 1:

There are some technical considerations and best practices. I am going for the second one, specifically for writing your thesis and/or papers. For the technical ones, you can check any git tutorial.

  1. Define the directory structure for your thesis. You can change it later, and use git for tracking the changes. Having a good structure would make your life easier.

  2. Work with multiple files (use include and/or input in LaTeX). You can split them by chapters or sections. This will make easier to track changes that involve specific parts of your thesis (e.g. git log content/introduction.tex).

  3. Track only the files you are going to touch, not the ones auto-generated. Creating a proper .gitignore file will help you a lot (LaTeX generate plenty of working files).

  4. As in programs, do micro-commits, that is: one commit per idea/feature/fix/activity.

  5. Every time you commit, write meaningful messages (high level) that explains what you were trying to achieve in every change. After a week you might not remember what you tried to accomplish.

  6. Keeping track of every activity/idea/fix [see (4) and (5)] could be very helpful to know how much you have done (using git log). You can write your advance report for your supervisor(s) based on git log. Even more, you can share the repository with your supervisor (using a web interface), and they can check whatever you have been doing in your thesis. For the next meeting, they will know what to expect (it will depend on how fond are your supervisors on following a RSS).

  7. Using git will be useful for keeping you in a good mood (sometimes you would feel you have not done too much, but having track of every change will help you to keep things in perspective).

  8. For every progress report you send, create a tag. For the next report, you can checkout both version and apply latexdiff. It will be useful for tracking changes between versions you submit for revision. This also will help you to check if you addressed the feedback you received for the previous report.

At last but not least, I recommend you to read "A successful Git branching model". It is a very short article on a git workflow. You can apply the same concepts when you write your thesis. For instance, if you are writing an experiment, you can create a branch for it, and merge it once it is "ready." If you have to revisit it later, it would be easier to see what were the changes involved and why.

Solution 2:

When I was writing my PhD thesis,¹ I used git to manage the document and all its figures, and I'm very glad that I did so, not least because it makes it easy to write a script that graphs your progresss as you're going along ;) The chief advantages I found were:

  • Since git is a distributed version control system, it's easy to work on multiple machines. If you need the latest version from your laptop on your desktop machine, you can just pull directly from the laptop and work there. When you leave, you go to your laptop and pull from the desktop machine.
  • If you work on multiple machines, you effectively have a recent backup of your work (including its complete history), and if you want to create further backups you can just push to a new bare repository elsewhere (as VonC's answer points out).
  • You can make large changes to your document knowing that the previous version is securely stored, and that if you want to retrieve the old version, that's easy to do.
  • Being able to commit to your repository when you're offline is very useful, particularly since not having internet access makes it much easier to write ;) I also kept PDFs of all the papers I cited in the same repository to make it easier to work offline, although this vastly inflated the repository, so some might advice against that.

The chief advice that I'd give:

  • Commit frequently, and always make sure that you keep the output of git status empty, either by adding files you need, or listing them in .gitignore. You don't want to risk having important files untracked.
  • Never use history rewriting commands (e.g. git rebase), just to be safe and never use git's dangerous commands like git reset --hard and git checkout -f. No one will ever see your complete repository, so you don't care what the history looks like - it's much more important that you don't do anything that might lose (or make it more difficult to retrieve) your work.
  • When you're looking at differences between your versions, use the --color-words option to git diff. Otherwise, your diffs will be line-based, and if you reformat a paragraph in LaTeX, it'll be hard to see what the real changes are - git diff --color-words ignores the line-breaks, and just shows the old words in red and the new words in green.

¹ ... with LyX rather than directly in LaTeX, but the issues are essentially the same.

Solution 3:

This is mainly just meant as a comment, but it turned out a bit too long, so I am posting it as an answer.

I used darcs for my Master's thesis, and have been using RCS, CVS, SVN, and Git for lots of documentation / writing projects in the past. All of these tools provide the basic feature I want -- ability to review my changes, go back in history, check in "undo points" when I start writing something new.

There are old and tried recommendations for writing documentation with version control. Using a text-only source format is important for getting sane diffs. In addition, a useful tip I picked up (IIRC from Kernighan, writing about keeping Troff source in version control) is to make sure all lines are reasonably short. I tend to whack enter every few lines, with an eye towards keeping one particular clause or idiom on one line, so that the diff will be minimal if I decide to revise that particular detail later.

Solution 4:

Git will work. Latex is effectively source code, so it should be perfectly fine.

That said,Git, while awesome, has as slightly steep learning curve because it allows for a lot of things for collaborating with multiple people, handling diverging histories,etc. Its really big advantage is in merging conflicts ( what happens if I change a file and someone else changes a file and we both try to upload/commit it to some server?).

If you just want to version your thesis, you are unlikely to even hit the conflicting merge case (since you are the only one editing it), let alone the multiple histories case.
I'd use something simpler like SVN, which while worse for doing the two things I described, fits your needs and is easier to learn.

Also, git stores everything in a .git file in the folder you are in. If you delete that folder , your data is gone.