Github API: Retrieve all commits for all branches for a repo
According to the V2 documentation, you can list all commits for a branch with:
commits/list/:user_id/:repository/:branch
I am not seeing the same functionality in the V3 documentation.
I would like to collect all branches using something like:
https://api.github.com/repos/:user/:repo/branches
And then iterate through them, pulling all commits for each. Alternatively, if there's a way to pull all commits for all branches for a repo directly, that would work just as well if not better. Any ideas?
UPDATE: I tried passing the branch :sha as a param as follows:
params = {:page => 1, :per_page => 100, :sha => b}
The problem is that when i do this, it doesn't page the results properly. I feel like we're approaching this incorrectly. Any thoughts?
Solution 1:
I have encountered the exact same problem. I did manage to acquire all the commits for all branches within a repository (probably not that efficient due to the API).
Approach to retrieve all commits for all branches in a repository
As you mentioned, first you gather all the branches:
# https://api.github.com/repos/:user/:repo/branches
https://api.github.com/repos/twitter/bootstrap/branches
The key that you are missing is that APIv3 for getting commits operates using a reference commit (the parameter for the API call to list commits on a repository sha). So you need to make sure when you collect the branches that you also pick up their latest sha:
Trimmed result of branch API call for twitter/bootstrap
[
{
"commit": {
"url": "https://api.github.com/repos/twitter/bootstrap/commits/8b19016c3bec59acb74d95a50efce70af2117382",
"sha": "8b19016c3bec59acb74d95a50efce70af2117382"
},
"name": "gh-pages"
},
{
"commit": {
"url": "https://api.github.com/repos/twitter/bootstrap/commits/d335adf644b213a5ebc9cee3f37f781ad55194ef",
"sha": "d335adf644b213a5ebc9cee3f37f781ad55194ef"
},
"name": "master"
}
]
Working with last commit's sha
So as we see the two branches here have different sha, these are the latest commit sha on those branches. What you can do now is to iterate through each branch from their latest sha:
# With sha parameter of the branch's lastest sha
# https://api.github.com/repos/:user/:repo/commits
https://api.github.com/repos/twitter/bootstrap/commits?per_page=100&sha=d335adf644b213a5ebc9cee3f37f781ad55194ef
So the above API call will list the last 100 commits of the master branch of twitter/bootstrap. Working with the API you have to specify the next commit's sha to get the next 100 commits. We can use the last commit's sha (which is 7a8d6b19767a92b1c4ea45d88d4eedc2b29bf1fa using the current example) as input for the next API call:
# Next API call for commits (use the last commit's sha)
# https://api.github.com/repos/:user/:repo/commits
https://api.github.com/repos/twitter/bootstrap/commits?per_page=100&sha=7a8d6b19767a92b1c4ea45d88d4eedc2b29bf1fa
This process is repeated until the last commit's sha is the same as the API's call sha parameter.
Next branch
That is it for one branch. Now you apply the same approach for the other branch (work from the latest sha).
There is a large issue with this approach... Since branches share some identical commits you will see the same commits over-and-over again as you move to another branch.
I can image that there is a much more efficient way to accomplish this, yet this worked for me.
Solution 2:
I asked this same question for GitHub support, and they answered me this:
GETing /repos/:owner/:repo/commits should do the trick. You can pass the branch name in the
sha
parameter. For example, to get the first page of commits from the '3.0.0-wip' branch of the twitter/bootstrap repository, you would use the following curl request:curl https://api.github.com/repos/twitter/bootstrap/commits?sha=3.0.0-wip
The docs also describe how to use pagination to get the remaining commits for this branch.
As long as you are making authenticated requests, you can make up to 5,000 requests per hour.
I used the rails github-api in my app as follows(using https://github.com/peter-murach/github gem):
github_connection = Github.new :client_id => 'your_id', :client_secret => 'your_secret', :oauth_token => 'your_oath_token'
branches_info = {}
all_branches = git_connection.repos.list_branches owner,repo_name
all_branches.body.each do |branch|
branches_info["#{branch.name}".to_s] = "#{branch.commit.url}"
end
branches_info.keys.each do |branch|
commits_list.push (git_connection.repos.commits.list owner,repo_name, start_date, end_date, :sha => "branch_name")
end