Why the push of a commit about 4 files is showing "Writing hundreds of objects..."?

I am getting lost with git. I made a commit about 4 files. My repository on Github has around 60 files. I would like to push these 4 files on my remote repository.

But when I push this commit, I see in the progress logs that the command is writing hundreds of objects. This worries me a lot. I am afraid to overwrite some important files on my remote repository.

Here is what I see on the terminal:

E:\Dropbox\cff\Python\Projet_Compilation\Projet_debug3>git push origin 1effeafeaeed20a44a8acffcfd575e39da314f26:main
Enumerating objects: 277, done.
Counting objects: 100% (158/158), done.
Delta compression using up to 6 threads
Compressing objects: 100% (103/103), done.
Writing objects:  14% (15/103)

So I search for the commands to know which files are in my commit:

E:\Dropbox\cff\Python\Projet_Compilation\Projet_debug3>git diff-tree --no-commit-id --name-only -r dca7f4f
modules/robots/platform/Smartphone_platform_1.py
modules/robots/platform/Smartphone_platform_2.py
modules/robots/platform/Smartphone_platform_3.py
modules/robots/platform/Smartphone_platform_4.py

This command confirmed to me that 4 files only will be written on the remote repository.

So why do the commands logs in the console show the writing of hundreds of objects? When I do it with PyCHarm, I can even see the total size of these objects which are a few Mega Octets. My 4 files are simple Python scripts that represent maybe a few Ko, that is all.

It is really confusing to me. Can anyone explain me, please?

The object counts that Git prints are not particularly useful to humans. They're not completely useless, they just have no clear and direct correspondence to what most humans really want to know.

In your case, what you wanted to know—though you did not know that you wanted to know this—is why git push was sending some very large commit(s) with very large file(s), and the answer to that is that git push works by commits, not files. Git itself works by commits, not files.

The commits are numbered. The numbers are random-looking, huge and ugly things represented in hexadecimal, that are pretty much useless to humans. But these numbers are how Git finds the commits. For human purposes, we hide the numbers behind names, such as branch names. The branch name has nothing to do with the commits themselves: it's just a way for us to have our Git software remember hash IDs for us.

Ultimately, each Git commit is—or represents anyway, to the point that you might as well think of it as being—made up of two parts:

Every commit contains a full snapshot of every file, as of the form it had at the time you (or whoever) told Git snapshot all the files. There are a bunch of caveats and "but"s here, but each commit is tantamount to a tar, rar, WinZip, or other archive of every file.
And, every commit contains some metadata, or information about the commit itself. This includes things like the name and email address of the person who made the commit—you, in this case, taken from your user.name and user.email settings.¹ There is a lot more in here, such as some date and time stamps, and the parts of a commit that make Git actually work, but we won't cover them here.

All parts of every commit are completely read-only. Once made, no commit can ever be changed. If you make a "bad" commit, you have to eject it off the end of the branch. Only end-of-branch commits can be ejected like this, so to eject a bad commit that's not already at the end, you have to eject multiple commits, all the way back to the bad one.

After that, you can start making "good" commits (that contain the files you want them to contain, and/or the metadata that you want, all in place of the bad ones). The new-and-improved replacement commits have different numbers from the originals. But, since we (humans) never actually look at the numbers, choosing to look at branch names instead, we never notice that this has happened because we have our Git refer to our new-and-improved commits through the name now.

When you run git push, your Git calls up some other Git software (which might not even be Git: it might be, e.g., libgit2 or jgit or Go-Git or something, as long as it "speaks Git"). That other software works with some other repository. That other repository has its branch names. But your repository and their repository will share commits. They do this via the commit numbers (not the branch names). Your Git lists out any new commits for them, that they don't have yet, and then your Git will send those commits—along with their supporting internal objects—and you'll see those:

Counting objects: ... done

messages at this point. Their Git software may now inspect the commits for goodness, or whatever they care about, before accepting or rejecting your Git's request that they update one of their branch names to remember the newest commit (in addition to, or instead of, the old last commit their name remembered before).

If they don't like your commit(s), you may have to eject some off the end of your branch(es) and build new ones that they do like. If that involves removing some large file(s) from some early commits—these removals then propagate through the later commits as well—that's fine. There are also some tools to do this; see, e.g., How to remove/delete a large file from commit history in the Git repository?

¹Note that this is the only place that Git uses these settings. When you authenticate to some web site for a push or fetch operation, Git doesn't do the authentication itself: it pawns that off on some other program or library. Each other program-or-library has its own authentication methods, which—by design—do not involve looking at user.name.

Why the push of a commit about 4 files is showing "Writing hundreds of objects..."?

Related

Recent Posts