Why does git hash-object return a different hash than openssl sha1?
You see a difference because git hash-object
doesn't just take a hash of the bytes in the file - it prepends the string "blob " followed by the file size and a NUL to the file's contents before hashing. There are more details in this other answer on Stack Overflow:
- How to assign a Git SHA1's to a file without Git?
Or, to convince yourself, try something like:
$ echo -n hello | git hash-object --stdin
b6fc4c620b67d95f953a5c1c1230aaab5db5a1b0
$ printf 'blob 5\0hello' > test.txt
$ openssl sha1 test.txt
SHA1(test.txt)= b6fc4c620b67d95f953a5c1c1230aaab5db5a1b0
The SHA1 digest is calculated over a header string followed by the file data. The header consists of the object type, a space and the object length in bytes as decimal. This is separated from the data by a null byte.
So:
$ git hash-object foo.txt
f70f10e4db19068f79bc43844b49f3eece45c4e8
$ ( perl -e '$size = (-s shift); print "blob $size\x00"' foo.txt \
&& cat foo.txt ) | openssl sha1
f70f10e4db19068f79bc43844b49f3eece45c4e8
One consequence of this is that "the" empty tree and "the" empty blob have different IDs. That is:
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 always means "empty file" 4b825dc642cb6eb9a060e54bf8d69288fbee4904 always means "empty directory"
You will find that you can in fact do git ls-tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
in a new git repository with no objects registered, because it is recognised as a special case and never actually stored (with modern Git versions). By contrast, if you add an empty file to your repo, a blob "e69de29bb2d1d6434b8b29ae775ad8c2e48c5391" will be stored.
Git stores objects as [Object Type, Object Length, delimeter (\0), Content] In your case:
$ echo "A" | git hash-object --stdin
f70f10e4db19068f79bc43844b49f3eece45c4e8
Try to calculate hash as:
$ echo -e "blob 2\0A" | shasum
f70f10e4db19068f79bc43844b49f3eece45c4e8 -
Note using -e (for bash shell) and adjusting length for newline.