Automatically remove *.pyc files and otherwise-empty directories when I check out a new branch

So here's an interesting situation when using git and python, and I'm sure it happens for other situations as well.

Let's say I make a git repo with a folder /foo/. In that folder I put /foo/program.py. I run program.py and program.pyc is created. I have *.pyc in the .gitignore file, so git doesn't track it.

Now let's say I make another branch, dev. In this dev branch, I remove the /foo/ folder entirely.

Now I switch back to the master branch, and /foo/ reappears. I run the program.py and the program.pyc file reappears. All is well.

I switch back to my dev branch. The /foo/ directory should disappear. It only exists in the master branch, not the dev branch. However, it is still there. Why? Because the ignored program.pyc file prevents the folder from being deleted when switching branches.

The solution to this problem is to recursively delete all *.pyc files before switching branches. I can do that easily with this command.

find . -name "*.pyc" -exec rm '{}' ';'

The problem is that it is annoying to have to remember to do this almost every time I change branches. I could make an alias for this command, but then I still have to remember to type it every time I change branches. I could also make an alias for git-branch, but that's no good either. The git branch command does other things besides just change branches, and I don't want to delete all pyc files every time I use it. Heck, I might even use it in a non-python repo, then what?

Is there a way to set a git hook that only executes when I change branches? Or is there some other way to set all *.pyc files to get erased whenever I switch branches?


Solution 1:

There is a post-checkout hook, to be placed in .git/hooks/post-checkout. There's probably a sample there, possibly named .sample or possibly not executable, depending on your git version. Short description: it gets three parameters, the previous HEAD, the new HEAD, and a flag which is 1 if the branch changed and 0 if it was merely a file checkout. See man githooks for more information! You should be able to write a shell script to do what you need and put it there.

Edit: I realize you're looking to do this pre-checkout, so that the checkout automatically cleans up directories which become empty. There's no pre-checkout hook, though, so you'll have to use your script to remove the directories too.

Another note: Aliases are part of gitconfig, which can be local to a repository (in .git/config, not ~/.gitconfig). If you choose to do this with aliases (for git-checkout, not git-branch) you can easily put them only in python-related repositories. Also in this case, I'd make an alias specifically for this purpose (e.g. cc for checkout clean). You can still use checkout (or another aliased form of it) if you don't want to clean up pyc files.

Solution 2:

Just copying and updating a good solution by Apreche that was buried in the comments:

Save this shell script to the file /path/to/repo/.git/hooks/post-checkout, and make it executable.

#! /bin/sh

# Start from the repository root.
cd ./$(git rev-parse --show-cdup)

# Delete .pyc files and empty directories.
find . -name "*.pyc" -delete
find . -type d -empty -delete

Solution 3:

Another option is to not solve this as a git problem at all, but as a Python problem. You can use the PYTHONDONTWRITEBYTECODE environment variable to prevent Python from writing .pyc files in the first place. Then you won't have anything to clean up when you switch branches.

Solution 4:

My solution is more compatible with git: Git removes only enpty directories where any file has been deleted by checkout. It doesn't search the complete workcopy tree. That is useful for big repositories or repositories with a very big ignored tree, like virtual environments by tox package for testing many different with Python versions etc.

My first implementation explains the principle very clearly: Only pyc files related to files under version control are cleaned. It's for reasons of efficiency and unwanted side effects.

#!/bin/bash
# A hook that removes orphan "*.pyc" files for "*.py" beeing deleted.
# It doesn not clean anything e.g. for .py files deleted manually.
oldrev="$1"
newrev="$2"
# ignored param: branchcheckout="$3"

for x in $(git diff --name-only --diff-filter=DR $oldrev..$newrev | grep "\.py$")
do
    if test -a ${x}c && ! test -a ${x}; then
        rm ${x}c
    fi
done

The post-checkout hook receive the three useful parameters that allow to get known exactly which files have been deleted by the git checkout, without searching the complete tree.

After reading the question I rewrote my hook code to Python and extended it according to your requirements about empty directories.

My complete short source code (Python) is in
https://gist.github.com/hynekcer/476a593a3fc584278b87#file-post-checkout-py

The doc string:

"""
A hook to git that removes orphan files "*.pyc" and "*.pyo" for "*.py"
beeing deleted or renamed by git checkout. It also removes their empty parent
directories.
Nothing is cleaned for .py files deleted manually or by "git rm" etc.
Place it to "my_local_repository/.git/hooks/post-checkout" and make it executable
"""
  • The problem with *.pyc files is not important for Python 3, because *.pyc files in __pycache__ can not be executed without the related *.py* file in its parent directory.

  • No change directory is necessary, because hooks are started everytimes in the root of the repository.

  • Cache directories for compiled code __pycache__ are cleaned completely, because they are never important (don't take part in any binary distribution) and also for high efficiency because deleting by parts __pycache__/some_name.*.pyc could be slow.