Convert to git and get more patches?

Today, I converted the ExtUtils::ParseXS repository from subversion to git, published it on github.com and announced it on the perl5-porters and module-build perl mailing lists. Within hours, it was cloned and I got my first pull request.

I don’t know if that’s the start of a trend or not, but it’s food for thought for anyone who questions whether git makes contributing easier. As a publisher, git means no need to centrally administer repository access. As a contributor, it means no need to request repository access (and wait for it to be granted). And for both, it means no need to manage patches outside the VCS. I’m convinced that making it easier for people to contribute means greater odds of getting contributions.

On the technical side, this was not the easiest conversion and I learned a lot about git history rewriting during the cleanup. Some of the highlights:

I used a modified clone of Schwern’s svn2git.pl port of the ruby svn2git program
I manually fixed up “tag branches” where the history was confused, but I could identify the release point. (Confirmed by identifying where there was no diff between tag branch and a point on the trunk branch)
I rewrote historical commits to fix empty commit messages

This latter one was a multi-part fix. Some commit entries said “*\** empty log message ***’. Others were literally empty. Worse, the very first commit was the “*** empty …” one. git rebase is the normal tool for fixing up history, but rebase can’t change the root commit. I worked around it following these instructions:

I created an empty commit on a new branch as a rebase target
I cherry-picked the first commit from the trunk onto the new branch
I interactively rebased the trunk onto the new branch (picking up everything after the cherry picked commit)

The interactive rebase gave me a list of commits and message synopses. I changed all the “*** empty …” ones from “pick” to “edit”. For the entirely empty ones, they showed up strangely in the rebase list, where ce86d39 was the empty commit and parent to f4a7235 with the log message shown:

pick ce86d39 >f4a7235 blah blah blah

It turns out the way to rebase this is to just break them apart:

edit ce86d39
pick f4a7235 blah blah blah</pre>

Then, during the rebase run, when it would stop on “*** empty …” commits, I would amend the commit message with a list of files changed (a lazy fix, I’ll admit) and continue:

$ git diff --name-only HEAD^ | \
  perl -E 'say "Updated " . join(q{, }, map { chomp; $_} <>)' | \
  git commit --amend --file=- && git rebase --continue

Voila – very lazy empty log editing. In the blank message cases, the rebase commit itself fails due to the empty message, but I could do a direct “git commit” (not amend because the rebase cherry-pick commit failed), give an appropriate log message, and then “git rebase –continue”.

After that was done, the history was great, but the tags were still on the old messy branch. But that was easy to fix with a little Perl programming:

I wrote a program to take a list of commits and tags and migrate the tags to the new branch where a commit in the new branch matches a previously tagged commit (i.e. in the old branch)

This would have been more complex if there had been merges to consider and I’m glad there weren’t. But the end result was a nice clean, well-tagged git history.