Good, fast or cheap -- pick again

I’m not sure when exactly I started thinking about new infrastructure for CPAN Testers, but it might have been a couple years ago around the time I released CPAN::Reporter 1.0. That was when I decided that Net::SMTP needed to be the default “transport” option, to avoid problems people were already having with local report submission via sendmail. It was a necessary change, but only a temporary fix for the bigger problem. At the Oslo QA hackathon in spring 2008, Ricardo, Jonas and I worked up the first draft of a framework for “CPAN Testers 2.0” (CT2.0).

What happened next was the inevitable result of the proverb: “good, fast, or cheap – pick two”. CT2.0 was designed to be a good replacement for CT1.0, and it was being done by (cheap) all volunteer labor. So, despite some progress towards a proof of concept a year later at the Birmingham QA hackathon, there has been no real end in sight for CT1.0.

That all changed last week. With a firm deadline to hit, it’s time to reconsider the good, fast or cheap tradeoff. Fast is now critical. I think the design is good enough. What can be done quickly won’t replace all of the CT1.0 ecosystem right away but just the core transport and report repository parts.

I think cheap is what is going to change. It’s still volunteer labor, but I think there’s a way to need less of it. My current hypothesis for a plan to hit the deadline is to implement the Metabase framework on top of Amazon Web Services (AWS). That offloads scalability and reliability concerns, changing those technical and administrative challenges into resource challenges.

I’ve already successfully demonstrated a proof of concept that the existing Test::Reporter based testing clients can feed a CT2.0 Metabase. How well the framework scales remains a big unknown, but by implementing on top of AWS, we can throw resources at the problem in the short-run by deploying more EC2 instances to deal with bottlenecks. If the SQLite databases that drive the CPAN Testers statistics sites have to be regenerated from scratch each night, that’s just a MapReduce job. If the Metabase web app is too slow to deal with the test report volume, we just deploy more instances and stick a load balancer in front.

Having that flexibility simplifies the job of getting CT2.0 off the ground to a handful of to-do’s:

Implement a Metabase backend on top of AWS
Create an AWS virtual machine to accept reports and publish to the CT2.0 Metabase on AWS
Migrate existing NNTP reports to the CT2.0 Metabase</li
- Implement a web app to serve up new and legacy reports from the CT2.0 Metabase
- Design a process to update the CPAN Testers stats database with newly uploaded reports (from scratch if necessary)
- Update CPAN Testers websites to link to the new CT2.0 reports archive site
- Get testers to switch to Test::Reporter::Transport::Metabase
I think that gets most of what we need by March 1 and without a whole lot of new code to write and test and without a lot of sysadmin or DBA time and attention required. It’s a limited implementation, but will solve the need of the Perl NOC to get CPAN Testers reports off their email infrastructure.

I’ll be writing up more details and plans over the next week.