How to get GitLab commits of a file using python gitlab module?
We are trying to get the commits of each file in a Gitlab repository. We are using the Python Gitlab module. We could get the commits of a repository but couldn't get the commits of individual files in the repository. Can someone help us with this?
The commit history of a single file is not exposed through the GitLab API directly. Therefore, there is no direct functionality for this in the python-gitlab gitlab
module.
However, you can obtain, effectively, the same information by using available APIs. Specifically, you can either use the repository commits API and diff APIs or the files blame API.
Using the commits API
For example, using the commits API, you can list all commits and their diffs, then associate file changes for each commit.
import gitlab
from collections import defaultdict
TOKEN = 'Your API Token'
gl = gitlab.Gitlab('https://gitlab.example.com', private_token=TOKEN)
project = gl.projects.get(1234)
commits = project.commits.list(all=True)
# file paths and a list of commits which create/modify/delete the file
file_map = defaultdict(list)
for c in commits:
diff = c.diff()
files_changed = set()
for change in diff:
files_changed.add(change['old_path'])
files_changed.add(change['new_path'])
for path in files_changed:
file_map[path].append(c)
# show list of commits which modified README.md
print(file_map['README.md'])
Using the blame API
Using the commits API requires obtaining the diff for every commit, which can take a long time on large repositories.
If you're only interested in the commits which change a single file, traversing the blame tree can be more efficient. However, note that you may also miss commits (for example, commits in other branches or diverged trees) using this method.
def search_blame(project, filename, base_ref=None):
if base_ref is None:
base_ref = project.default_branch
commits = set()
refs_to_check = [base_ref,]
seen = set()
while refs_to_check:
ref = refs_to_check.pop()
if ref in seen:
continue
seen.add(ref)
blame = project.files.blame(filename, ref)
for change in blame:
commit_id = change['commit']['id']
if commit_id not in seen:
refs_to_check.append(commit_id)
refs_to_check.extend(change['commit']['parent_ids'])
for c in change['commit']['parent_ids']:
commits.add(c)
commits.add(commit_id)
return commits
# show commits in blame tree for README.md
# only includes commits in the default branch
print(search_blame(project, 'README.md'))