Does Git prevent data degradation
I read that ZFS and Btrfs use checksums to prevent data degradation and I read that Git has integrity through hashing essentially everything with each commit.
I was going to use a Git server on a Linux NAS with Btrfs RAID 1 for storage, but if Git has integrity I guess this wouldn't be necessary (at least not if preventing data degradation is all I want).
Question: So does Git's integrity though hashing essentially everything with each commit prevent or help against bit-rot?
Git's hashing only happens at the time commits are created, and from there on the hashes are used to identify the commits. This in no way ensures the integrity of the files. Git repos can get corrupted and lose data. In fact, git has a built-in command to detect this kind of loss, git fsck, but as the documentation says, you are responsible for restoring any corrupted data from backups.
Depends on what you mean by "prevent".
(First of all, bit-rot is a term with multiple definitions. This question is not about code becoming unrunnable due to lack of maintenance.)
If you mean by "prevent" that it will likely detect corruption by decay of bits, yes, that will work. It will however not help to fix that corruption: the hashes only provide error detection, not correction.
This is generally what is meant by "integrity": The possibility to detect unauthorized/unintended manipulation of data, not the possibility to prevent or correct it.
You would generally still want a RAID1 together with backups (possibly implemented with ZFS snapshots or similar, I am not familiar with the ZFS semantics on RAID1 + snapshots), for several reasons:
if a disk fails fatally, you either need a RAID1 (or a recent backup) to restore your data; no error correction can correct for a whole disk failing, unless it has a full copy of the data (RAID1). For a short downtime, you essentially must have RAID1.
if you accidentally delete parts or whole of the repository, you need a backup (RAID1 doesn’t protect you since it immediately reflects the change to all devices)
Block-level RAID1 (e.g. via LVM or similar) with only two disks in itself will not protect you against silent decay of data though: the RAID controller cannot know which of the two disks holds the correct data. You need additional information for that, like a checksum over files. This is where the ZSF and btrfs checksums come in: they can be used (which is not to say that they are used in these cases, I don’t know how ZFS or btrfs handle things there) to distinguish which of the two disks holds the correct data.
prevent bit-rot
No, it does not, in no way at all. There is no RAID-like redundancy introduced by git. If the files in your .git
directory suffer bit-rot, you will lose stuff just as usual.
help against bit-rot?
Yyyy...no. It does not help against bit-rot occuring, but it will help to detect bit-rot. But at no point during normal use does it do so by its own account (well obviously it does when you check out some objects and so on, but not for your history). You would have to create cron jobs to recalculate the hashes from the content and compare them to the actual hashes. It is pretty trivial to do so, as git
hashes are literally simply the content hashes, it is trivial to recalculate them and git fsck
does so for you. But when it detects bit-rot, there is nothing in particular that it can do against it. Specifically, as larger chunks are automatically compressed, you will likely incur total chunk loss if a bit in a larger object is flipped.