Definition of "downstream" and "upstream"
In terms of source control, you're downstream when you copy (clone, checkout, etc) from a repository. Information flowed "downstream" to you.
When you make changes, you usually want to send them back "upstream" so they make it into that repository so that everyone pulling from the same source is working with all the same changes. This is mostly a social issue of how everyone can coordinate their work rather than a technical requirement of source control. You want to get your changes into the main project so you're not tracking divergent lines of development.
Sometimes you'll read about package or release managers (the people, not the tool) talking about submitting changes to "upstream". That usually means they had to adjust the original sources so they could create a package for their system. They don't want to keep making those changes, so if they send them "upstream" to the original source, they shouldn't have to deal with the same issue in the next release.
When you read in git tag
man page:
One important aspect of git is it is distributed, and being distributed largely means there is no inherent "upstream" or "downstream" in the system.
, that simply means there is no absolute upstream repo or downstream repo.
Those notions are always relative between two repos and depends on the way data flows:
If "yourRepo" has declared "otherRepo" as a remote one, then:
- you are pulling from upstream "otherRepo" ("otherRepo" is "upstream from you", and you are "downstream for otherRepo").
- you are pushing to upstream ("otherRepo" is still "upstream", where the information now goes back to).
Note the "from" and "for": you are not just "downstream", you are "downstream from/for", hence the relative aspect.
The DVCS (Distributed Version Control System) twist is: you have no idea what downstream actually is, beside your own repo relative to the remote repos you have declared.
- you know what upstream is (the repos you are pulling from or pushing to)
- you don't know what downstream is made of (the other repos pulling from or pushing to your repo).
Basically:
In term of "flow of data", your repo is at the bottom ("downstream") of a flow coming from upstream repos ("pull from") and going back to (the same or other) upstream repos ("push to").
You can see an illustration in the git-rebase
man page with the paragraph "RECOVERING FROM UPSTREAM REBASE":
It means you are pulling from an "upstream" repo where a rebase took place, and you (the "downstream" repo) is stuck with the consequence (lots of duplicate commits, because the branch rebased upstream recreated the commits of the same branch you have locally).
That is bad because for one "upstream" repo, there can be many downstream repos (i.e. repos pulling from the upstream one, with the rebased branch), all of them having potentially to deal with the duplicate commits.
Again, with the "flow of data" analogy, in a DVCS, one bad command "upstream" can have a "ripple effect" downstream.
Note: this is not limited to data.
It also applies to parameters, as git commands (like the "porcelain" ones) often call internally other git commands (the "plumbing" ones). See rev-parse
man page:
Many git porcelainish commands take mixture of flags (i.e. parameters that begin with a dash '
-
') and parameters meant for the underlyinggit rev-list
command they use internally and flags and parameters for the other commands they use downstream ofgit rev-list
. This command is used to distinguish between them.
Upstream (as related to) Tracking
The term upstream also has some unambiguous meaning as comes to the suite of GIT tools, especially relative to tracking
For example :
$git rev-list --count --left-right "@{upstream}"...HEAD >4 12
will print (the last cached value of) the number of commits behind (left) and ahead (right) of your current working branch, relative to the (if any) currently tracking remote branch for this local branch. It will print an error message otherwise:
>error: No upstream branch found for ''
- As has already been said, you may have any number of remotes for one local repository, for example, if you fork a repository from github, then issue a 'pull request', you most certainly have at least two:
origin
(your forked repo on github) andupstream
(the repo on github you forked from). Those are just interchangeable names, only the 'git@...' url identifies them.
Your
.git/config
reads :[remote "origin"] fetch = +refs/heads/*:refs/remotes/origin/* url = [email protected]:myusername/reponame.git [remote "upstream"] fetch = +refs/heads/*:refs/remotes/upstream/* url = [email protected]:authorname/reponame.git
- On the other hand, @{upstream}'s meaning for GIT is unique :
it is 'the branch' (if any) on 'said remote', which is tracking the 'current branch' on your 'local repository'.
It's the branch you fetch/pull from whenever you issue a plain
git fetch
/git pull
, without arguments.
Let's say want to set the remote branch origin/master to be the tracking branch for the local master branch you've checked out. Just issue :
$ git branch --set-upstream master origin/master > Branch master set up to track remote branch master from origin.
This adds 2 parameters in
.git/config
:[branch "master"] remote = origin merge = refs/heads/master
now try (provided 'upstream' remote has a 'dev' branch)
$ git branch --set-upstream master upstream/dev > Branch master set up to track remote branch dev from upstream.
.git/config
now reads:[branch "master"] remote = upstream merge = refs/heads/dev
git-push(1)
Manual Page :-u --set-upstream
For every branch that is up to date or successfully pushed, add upstream (tracking) reference, used by argument-less git-pull(1) and other commands. For more information, see
branch.<name>.merge
in git-config(1).
git-config(1)
Manual Page :branch.<name>.merge
Defines, together with
branch.<name>.remote
, the upstream branch for the given branch. It tells git fetch/git pull/git rebase which branch to merge and can also affect git push (see push.default). \ (...)branch.<name>.remote
When in branch < name >, it tells git fetch and git push which remote to fetch from/push to. It defaults to origin if no remote is configured. origin is also used if you are not on any branch.
Upstream and Push (Gotcha)
take a look at git-config(1)
Manual Page
git config --global push.default upstream git config --global push.default tracking (deprecated)
This is to prevent accidental pushes to branches which you’re not ready to push yet.
That's a bit of informal terminology.
As far as Git is concerned, every other repository is just a remote.
Generally speaking, upstream is where you cloned from (the origin). Downstream is any project that integrates your work with other works.
The terms are not restricted to Git repositories.
For instance, Ubuntu is a Debian derivative, so Debian is upstream for Ubuntu.