How I broke the MongoDB Go driver ecosystem

Reading time: 5 minutes

I unintentionally broke the MongoDB Go driver ecosystem one March weekend in 2021. Dependency management systems are hard.

trees on fire

In late 2018, I moved all my Go libraries from github.com/xdg to github.com/xdg-go to keep my Go work separate from other projects/languages.

This seemed painless, because GitHub automatically redirects repositories after a rename. For example, in a project that was importing github.com/xdg/stringprep, a dependency resolver would try to pull from that repository, and GitHub would transparently redirect that to github.com/xdg-go/stringprep.

Since then, Go modules caught on as the new way to manage dependencies, replacing Glide, dep, and many others. With modules, Go libraries include a file, go.mod that defines the module namespace and its dependencies and module versions are defined by tags in the repository. The first line of go.mod looks something like this:

module github.com/xdg-go/stringprep

Now you see the problem. Before go.mod, it didn’t matter if code imported xdg/stringprep or xdg-go/stringprep because the code wasn’t aware of its own namespace. With a go.mod, however, a dependency manager looking for xdg/stringprep would find a go.mod claiming to be xdg-go/stringprep – NOT the same code that it’s looking for.

Here’s where things went awry. Last weekend I dusted off some old repositories from before their module days and added go.mod files. Two of them, stringprep and scram, are used by the MongoDB Go driver.

Ah, you say, but Go dependency management is smart — it freezes on a tag or commit, so how could this change be a problem? Existing projects referencing github.com/xdg/* will pull the old tag/commit from before when it wasn’t a module.

In theory, this should work. The MongoDB driver’s go.mod referenced specific, untagged commits:

github.com/xdg/scram v0.0.0-20180814205039-7eeb5667e42c
github.com/xdg/stringprep v0.0.0-20180714160509-73f8eece6fdc

Any tools building up dependencies should reference these particular (non-module) versions.

But theory is no match for a horde of programmers and their dependency management tools.

It turns out that a lot of the community relies on go get -u foo to download foo and updated versions of foo’s direct dependencies. Of course, this defeats the whole point of freezing dependency versions, but that’s what people apparently want. But the way that go get -u works, in theory that shouldn’t update deep dependencies.

Theory, meet practice.

During the transition to modules, the tools also had to account for libraries that weren’t modularized. For example, the MongoDB Go driver v1.0.0 had no go.mod. So a project with a go.mod that relied on mongo-driver v1.0.0 would wind up with additional “indirect” dependencies to freeze deep dependencies in go.mod like this:

module foo

require (
    ...
	go.mongodb.org/mongo-driver v1.0.0
	github.com/xdg/scram v0.0.0-20180814205039-7eeb5667e42c // indirect
	github.com/xdg/stringprep v1.0.0 // indirect
    ...
)

In other words, deep dependencies just became shallow – and thus targets of go get -u foo. It doesn’t seem to matter that upgrading mongo-driver to a modularized version (v1.3.0+) should make those indirect dependencies go away. Before the go.mod can be recomputed without indirect dependencies, the attempt to update the xdg/* libraries fails because of the conversion to modules in the new namespace.

I got reports that other tools besides go get had similar problems from recursively upgrading dependencies. Asking lots of people to fix their toolchains and change their CI systems to stop auto-upgrading wasn’t a quick or kind way to handle the problem. I needed to fix it.

Fixing forward doesn’t preclude backwards compatibility.

To fix the problem going forward, I submitted a pull-request for the MongoDB Go driver to upgrade go.mod to use the modularized scram and stringprep libraries. This merged Monday after the weekend I broke things, and the Go driver team released v1.5.1 with the fix on Tuesday.

To fix the problem for people not upgrading the Go driver, I recreated the xdg/scram and xdg/stringprep repos on GitHub, thus breaking the automatic redirection, and pushed only commits through late 2018, effectively rewinding time for those repository paths to before the breakage over the weekend.

Recreating a historical repository is straightforward with git. First, I created an empty repository on GitHub. Then I added it as a new remote to my local repository. Then it was just a matter of pushing only the commits I wanted. I created a branch at the right point in history and then pushed that as “master” to the repository copy.

$ git remote add legacyrepo git@github.com:xdg/stringprep.git
$ git checkout -b legacymaster 73f8eec
$ git push legacyrepo legacymaster:master

I thought that fixed things. But I was wrong.

What I didn’t realize at the time and discovered afterwards is that the Go module proxy cached the bad tags on xdg/scram and xdg/stringprep from when they were redirected to xdg-go. So even fixing the GitHub redirect isn’t enough with newer versions of Go that automatically use the proxy.

For example, this go get -u still broke:

go get -u -v github.com/golang-migrate/migrate/v4/cmd/migrate
...
go get: github.com/xdg/scram@v0.0.0-20180814205039-7eeb5667e42c updating to
	github.com/xdg/scram@v1.0.2: parsing go.mod:
	module declares its path as: github.com/xdg-go/scram
	        but was required as: github.com/xdg/scram

There is an environment variable to turn off the proxy, and this worked without error, confirming my diagnosis that the proxy was the problem:

GOPROXY=direct go get -u -v github.com/golang-migrate/migrate/v4/cmd/migrate

There seems to be no way to remove something from the proxy, so fixing the problem required ’tricking’ the Go module proxy to ignore the bad tags. I tagged xdg/scram and xdg/stringprep with new tags higher than the versions that had been cached by the proxy. Then I ran go get on those tags to force the proxy to cache the new versions.

That did the trick. When go get -u looks for a newer version, the proxy informs them of the latest tags, which are cached from the new xdg repos instead of the original xdg-go repo with the name conflict.

With these changes, I believe I’ve fixed the chaos I caused. I offer a sincere apology to anyone who was affected.

P.S. This was worse because I wasn’t getting issue notifications.

Several people opened GitHub issues over the weekend and on Monday, but I didn’t see any of them until someone highlighted my nickname directly. I’d turned off the GitHub setting to automatically watch repositories I’d been added to because I was getting notifications from every new repository added at work. But I think this affected repositories I created myself after the setting change, so I wasn’t getting notifications from them, either. I’ve fixed this and made sure I’m watching all repos I’ve created recently.

•      •      •

If you enjoyed this or have feedback, please let me know by or