A survey of git best practices

How do you know if something you recommend is a best practice?

At work, new college grads are starting and I volunteered to give a talk on git best practices. I have strong views, but I looked for articles I could use to give a more complete picture.

It was weirdly addicting. After looking at three or four, I kept going through page after page of Google search results.

I looked for two things. How often was a topic mentioned? And did people agree or disagree?

These two questions lead to a better understanding of what practices to recommend.

Strong agreement: readable commits 🔗︎

Nearly every article I found recommended the following two practices:

Single-purpose commits
Meaningful commit messages

These practices support what I call readable commits. Just as we’re advised to write code for future readers, we should structure commits for future readers as well.

In the short term, readable commits allow faster, more effective code review. In the long term, readable commits provide future maintainers with context to understand changes to the codebase.

On single-purpose commits 🔗︎

The advice I’ve read on single-purpose commits talks about two related concepts: small and atomic.

What is small? None of the articles I read gave specific advice, but I look to two sources for guidance. First, this study of code reviews shows that defect discovery falls sharply over 200 lines per review and recommends no more than 400. The idea is summarized well by this tweet:

Second, I look to Miller’s Law. It says that humans can hold about seven items in working memory. That implies an upper bound of about seven diff hunks in a commit. After that, it’s harder for a reader to retain how each hunk relates to the others.

Advice on atomicity is straightforward: each commit should ‘do one thing’ and be complete. It shouldn’t include changes for multiple bugs or other clean-up fixes. If it does, it runs afoul of Miller’s Law, adding to the mental load on a reader.

Small and atomic commits are easier to review but also easier to revert cleanly if needed. They are less likely to have merge conflicts. They help git blame identify the rationale behind a line of code.

Can all commits be small? No, but it’s something to strive for. And big changes can often be broken up into smaller, atomic chunks. If a new feature requires refactoring, extending, or adapting some existing code, make the adaptation a separate commit from the new feature that builds on top of it.

On meaningful commit messages 🔗︎

Many articles I read cite How to Write a Git Commit Message by Chris Beams. His article covers both format and content.

Beams says to use a separate, short subject line of 50 characters and 72-character-wide body paragraphs after that. Use bullets or dashes in the body if they aid clarity.

The subject should be imperative, capitalized, and have no period. Most importantly, the subject line should complete this sentence:

If applied, this commit will <your subject line here>

Including a ticket or issue number as a prefix or suffix is also common.

Beams says the body of the message should explain the ‘what’ and ‘how’. Many articles expand on that idea to include ‘why’.

A good commit message reestablishes the author’s context for changing the codebase. This means summarizing what changed, how it resolves an issue or need, and why this particular change was made instead of another option.

This benefits others on the team both during code review and later when looking back at the history of the codebase. And remember Eagleson’s law:

Any code of your own that you haven't looked at for six or more months might as well have been written by someone else.

If you follow the guidelines, your commit message will look like an email to future developers. Write the ‘email’ that you’d want to receive in six months when you have to work on that part of the codebase again.

I saw an insightful meta-observation that relates to this: a habit of good commit messages shows that a developer is a good collaborator. Be a good collaborator!

Strong disagreement: handling frequent commits 🔗︎

Two ideas came up frequently but conflict with each other: (a) commit early, commit often; and (b) don’t commit incomplete work. Articles disagreed on how to reconcile these.

Committing early and often avoids losing work and allows easy backtracking to try different approaches. It can help deliver readable commits because we’ve seen how smaller changes are easier to review.

But committing too often causes other problems. These ‘save game’ style commits of incomplete work aren’t atomic. They might not pass tests, which makes git bisect harder. They make code review harder if they introduce new bugs fixed in later commits.

I saw two great analogies for frequent commits: ‘sausages’ and ‘movies’.

The sausage model says that frequent commits are like a sausage factory – a messy process is okay as long as the result is tasty. The only question is whether to keep the intermediate commits visible or to squash them into a single commit once the work is done.

The movie model says that frequent commits are like shooting a movie – we can work out of order or make mistakes and reshoot, but we must reassemble the pieces into a series of coherent scenes (commits) before showing it in public.

The sausage model offers us two unappealing trade-offs. Squashing intermediate commits might result in something no longer small and single-purpose. But merging intermediate commits leaves messy, incomplete work with broken tests in the main branch. (This can be slightly less painful if isolated in a merge commit with git merge --no-ff.)

The movie model makes us do extra work to rearrange things after we have working code. In the best case, intermediate commits only need to be reordered or combined with git rebase -i. In the worst case, the intermediate commits need to be pulled apart and recommitted in pieces. Sometimes, the best strategy is to soft reset a branch of work and reassemble it into new commits with git add -p.

Personally, I favor the movie model and try to keep my intermediate work as clean as possible so it’s quick to rearrange later. (Don’t forget to use git rebase --exec to retest your reworked commits.)

Mentioned often: branch strategy 🔗︎

Branches were discussed about half as often as commits, but the advice was consistent. Three ideas came up repeatedly:

Branch frequently
Agree on a workflow
Don’t alter published branch history

Prefer short-lived branches for each feature or bug fix. Give them short, descriptive names. Rebase them onto the main branch frequently. Push frequently to save your work.

But having many branches causes confusion if a team isn’t using them the same way. Teams need to consider several questions to get everyone on the same page.

Should there be a consistent pattern for branch names? Should the main branch represent stable or unstable development? Should the main branch have a linear history or are merges allowed? If linear, is that by rebasing or squashing? Should there be a structured methodology like git-flow or github-flow?

Only a few articles gave an opinion on the answers. These recommended a linear, commit-squashing model as part of the solution to the sausage model of development.

I think it obvious, but many articles made a point to remind about not changing published history on main or other long-lived branches. (But rebasing and publishing a feature branch is okay.)

No consensus: everything else 🔗︎

Other ‘best’ practices I saw only appeared on a few sites at most. I think they’re usually good practices, so at least consider them:

Use the CLI, not a GUI (develops deeper mastery)
Divide work into separate repos (i.e. avoid monorepos)
Don’t commit generated files (they add churn or else get stale)
Don’t commit large binary files (keeps the repo smaller)
Keep a backup (e.g. on a personal cloud server)
Customize your .gitignore (less noisy git status)

I also saw recommended commands – techniques, rather than practices. In addition to ones I’ve already mentioned, consider learning:

git cherry-pick
git commit --amend
git log --graph
git log --oneline
git reset
git stash

Learn to create your own git aliases for any complex commands you use frequently.

Personally, I also recommend enabling rerere.

Learning more 🔗︎

Many articles mentioned the Pro Git book. If you’re new to git and are a visual learner, I encourage you to watch Michael Schwern explain git with tinkertoys in Git for Ages 4 and up.

Finally, I’ve posted a bibliography of the sources I reviewed for this article.