Project plan for CPAN Testers 2.0 migration

I’ve spent a good deal of time brainstorming ideas for migrating to CPAN Testers 2.0 (CT2.0) and then paring down what I came up with to the barest minimum I think is necessary to migrate CPAN Testers off email/NNTP and still build a foundation for the future. This has been frustrating to spend a lot of time on, but deciding what not to do is probably as important as deciding what we need to do if we’re going to hit the deadline.

Here is my draft of a project plan for launching CPAN Testers 2.0 by the March 1. I’ve broken it up into ‘Architecture Tasks’ and ‘Migration Tasks’ and given some indication of the rough timing I think we need to hit. I’ve also added some commentary on work steps and current status. I’ve indicated in bold text who should lead a work block (if known) and where volunteers would be helpful.

Questions and suggestions are appreciated – particulary if something isn’t clear or if there are important tasks I’ve left out.

If you have time, interest and relevant expertise to help, please let me know in the comments or by email to dagolden at cpan dot org.

(Apologies for any crude abbreviations or shorthand throughout.)

1. Architecture Tasks 🔗︎

The CPAN Testers architecture can be broadly divided into three groups

Clients – the programs that run tests, create reports and submit them to the central CPAN Testers server
Metabase – the server and storage programs to receive and archive reports (and provide very rudimentary search capabilities)
Reporting – the websites and support programs to summarize CPAN Testers report data and syndicate them in various ways (e.g. *.cpantesters.org)

1.1 CT Clients (2nd half Jan) 🔗︎

Existing clients depend on Test::Reporter for email (or other) transport ultimately to the cpan-testers mailing list. Existing clients needs to be migrated to use the Test::Reporter::Transport::Metabase plugin.

As I’d like to start beta testing in February (see the Migration task list), these changes should be done in the second half of January after CT Metabase libraries are revised.

1.1.1 Test::Reporter::Transport::Metabase 🔗︎

Test::Reporter::Transport::Metabase and Metabase::Client::Simple (upon which the Metabase transport depends) are “done” but may need tweaks based changes to how user profiles and authentication credentials are handled (see below).

I will take responsibilty for making any necessary changes to these components.

1.1.2 CPAN::Reporter-based clients 🔗︎

CPAN::Reporter based clients already have support for Test::Reporter::Transport::Metabase, though they may need to be altered depending on user profile changes. CPAN::Reporter clients include:

CPAN
CPAN::Reporter::Smoker

I will take the lead to make changes or coordinate volunteer efforts for these clients.

1.1.3 CPANPLUS-based clients 🔗︎

I don’t know whether CPANPLUS-based clients support transport plugins or not. Someone will need to look into this and patch them if they do not. These clients include:

CPANPLUS
CPANPLUS::YACSmoke
minismokebox

I think Chris Williams should probably lead or coordinate this effort.

1.2 CT Metabase 🔗︎

The CPAN::Testers Metabase will replace the email/NNTP archive. The design for the initial launch is to use Amazon Web Services to provide scalability and reliability with minimal administrative overhead.

The first draft of the Metabase framework is mostly complete, but there are several areas needing work to support the CT2.0 launch.

1.2.1 Libraries (1st half Jan) 🔗︎

Enhancements to libraries need to happen quickly to support other components.

Implement search of new entries – only partial search capabilities are needed initially. The CT Reporting servers must be able to query for new reports added to the CT Metabase. Almost no work has been done on search so this may be a major workblock.
Revise user profiles and credentials – the current approach to user profiles is flawed in that it merges public user profile information with user authentication credentials. The Metabase::User::Secret fact needs to be removed from Metabase::User::Profile, but needs to be submitted with all Metabase::Client::Simple requests. Matching changes need to be made in the Metabase::Web and ::Gateway classes. This is another major workblock.
Metabase backend for AWS – Metabase::Archive::S3 and Metabase::Index::SimpleDB need to be written. This should be relatively straightforward.
Utility to map NNTP IDs to/from GUIDs – I plan to write a very simple library to standardize and abstact how this conversion will happen

Given the intimate knowledge of Metabase needed, I plan to take the lead on these tasks, but may recruit volunteers for portions of it.

1.2.3 Metabase Web Server (2nd half Jan) 🔗︎

The Metabase::Web server is a very primative Catalyst server and has only been run in standalone mode. It needs to be updated to run on a recent Catalyst framework. Configuration and deployment decisions need to be made for it to run on an Amazon EC2 instance.

This needs to be done after Metabase components are revised but before February in order to be ready to start beta testing.

In addition to Catalyst deployment, we need to have sufficient logging to do performance analysis and reporting to see how well the system is scaling.

I would like to find some Catalyst experts to volunteer to lead this effort or at least coach other less-experienced volunteers.

1.3 CT Reporting 🔗︎

The CPAN Testers Reporting infrastructure consists primarily of the data and web applications on cpantesters.org that provide reports to users or otherwise feed downstream applications like the CPAN Dependencies site or CPAN Testers Matrix.

The goal for migration is to make the change appear relatively seamless. In the longer-term, downstream apps may be able to use web services rather than relying on a large SQLite database.

Barbie has already started work on several of these and should recruit additional volunteers as necessary.

1.3.1 Interface to CT Metabase (2nd half Jan) 🔗︎

The current CPAN Testers reporting sites depend on a database fed from the NNTP archive. We need to replace this with a feed from the CT Metabase.

In the long run, this would be provided via Metabase::Web services, but for expediency I think the best approach is a direct connection to the Amazon S3/SimpleDB backend via a Metabase::Librarian.

This should be a relatively trivial matter of making sure the right libraries are installed on the cpantesters.org server and that the AWS access keys are in appropriately protected configuration files.

I will work with Barbie to ensure this is done before beta testing in February.

1.3.2 Statistics database (1st half Feb) 🔗︎

During the beta period, we need to test updating the CPAN Testers stats database from the CT Metabase. This probably needs to be done in parallel to the existing sites and databases.

One major change will be converting from NNTP IDs to CT Metabase GUID to identify reports.

If possible, we would like to shorten the time lag in processing updates, but this is ’nice to have’, not ‘must have’ functionality.

1.3.3 Report viewer (1st half Feb) 🔗︎

When the NNTP archive shuts down and reports are sent to the CT Metabase, users will not longer be able to view reports on nntp.perl.org. A new report viewer web application needs to be deployed on cpantesters.org.

This also entails regularly updating a mammoth reports database with a copy of the text report from the CT Metabase (using the interface in 1.3.1). Reports will be indexed and made available for display based on their GUID, not NNTP ID, so existing sites will need to change their URLs to use GUIDs as well as pointing to a new URL endpoint.

1.3.4 Find a tester (2nd half Feb) 🔗︎

The “Find a Tester” service will need to be revised to get contact information via GUID rather than NNTP ID. This seems less critical to have immediately and could slip past February if necessary.

2. Migration Tasks 🔗︎

These tasks describe a sequence of activities to migrate legacy reports and launch CPAN Testers 2.0. Many of them depend on architecture component described above.

2.1 NNTP Migration (2nd half Jan) 🔗︎

Existing NNTP reports need to be migrated into the CT Metabase. This may actually stretch out over January, but doesn’t seem to be critical to be done before beta testing.

Because of the volume of reports and the possibility that we might need to make late changes that require re-migrating the reports, I’d like to design the conversion in a way that can be parallelized on Amazon EC2.

Steps in the conversion should include:

Upload NNTP archive tarballs to S3 (638 files) – these have already been generated by the Perl NOC and uploaded to S3
Generate submitter profiles and add them to the CT Metabase– Metabase requires “profiles”, not just email addresses to identify users. For legacy reports, profiles need to be generated for all known testers (based on the existing address mappings used for stats.cpantesters.org.
Write NNTP article converter – this will need to extract articles from the archive tarball, filter out non-report articles or badly formatted reports, parse them into Metabase facts, link them to submitter profiles and inject them into the CT Metabase
Create EC2 converter instances – tarballs can and should be processed in parallel. This is just a matter of adding archive tarball IDs to an Amazon SQS queue that custom EC2 instances can use to dequeue tasks, run the converter program and shut down when the queue is empty.
Queue archive tarballs IDs and run 1+ instances against the queue – once the converter instances have been created, we then just run enough in parallel against the 600+ tarballs to get a “fast enough” conversion rate
Repeat for new archive tarballs – between now and the end of February, additional tarballs should be provided by Perl NOC to migrate reports arriving since the first batch of tarballs was created.

As the conversion process is just an extension of what already happens in Test::Reporter::Transport::Metabase, I will take responsibility for finishing the conversion.

2.2 Deploy CT2.0 Server (2nd half Jan) 🔗︎

The Metabase framework has not been tested “at scale” – meaning processing on the order of 500,000+ reports per month. To ensure we can easily scale, the CT 2.0 Server will be deployed on EC2, so that we can deploy servers in parallel if necessary.

Deployment steps will include:

Choose base EC2 machine image – selecting an existing EC2 image to be customized as a CPAN Testers server
Install and configure CT Metabase components – the design for this should have been worked out as part of the architecture work, but the server needs to be deployed on the machine image and properly configured for automatic start when the VM boots
Launch instance – fire it up and start receiving reports!

I’m personally interested in getting my hands dirty with this, but may be overwhelmed with other tasks. If someone has prior EC2 experience and can volunteer as a lead or coach, that would be very helpful.

2.3 Beta test (1st half Feb) 🔗︎

Once we launch the server, we need a test period. This entails getting a small handful of testers to gradually ramp up the volume over a couple weeks to see how well the servers and Metabase perform.

Steps include:

Select beta test group – likely some of the “high volume” testers can gradually convert some of their smokers over; we probably need at least one CPAN::Reporter based smoker and one CPANPLUS based smoker
Email profiles and instructions to beta test group – we will have pre-generated profiles as part of the migration, so we can provide these to testers so their new reports have a consistent identity
Test updating statistics DB from CT Metabase – we want to see the stats database getting regular report updates from beta tester
Test new report viewer and tester-finders – new reports that were only submitted to Metabase (and not to NNTP) should be visible in the report viewer based on their GUIDs
Write NNTP tail daemon to convert new NNTP articles to CT Metabase – until the NNTP archive is shut down, reports will continue to be submitted the old way; with the CT2.0 beta running, we need a daemon to “tail” the NNTP archive and continuously migrate reports
Test throughput and deploy instances and load balancing if necessary – based on the results of the beta test, try deploying additional instances and a load balancer

This is mostly process management (and a little coding of an NNTP tail daemon) and doesn’t require deep Metabase expertise. I’d like to find a volunteer to be the ‘beta test manager’ so I and others can be free to hack on the libraries or server for fixes if we need it.

2.4 Launch (2nd half Feb) 🔗︎

We want a buffer to start the launch before the March 1 deadline. If all goes well with the beta test, we should aim to “launch” in mid-Feburary.

Steps include:

Email profiles and instructions to all testers – just like in beta testing, but to all ‘active’ testers (definition of ‘active’ to be defined)
Switch cpantesters.org to new databases – assuming the CT Metabase driven stats database and report viewer worked well in beta, the production *.cpantesters.org applications should switch over to the new databases; at this point, CT Reporting should be entirely independent of the NNTP archive at the Perl NOC
Write NNTP tail daemon to warn email submitters of the shut-down date – hopefully, most testers will switch over quickly, but there will be stragglers who may not get the news; we need another NNTP tailing daemon to gently hassle them. (This can probably be adapted easily from my CPAN Testers nagbot.)
Coordinate with Perl NOC for a graceful sunset period – ideally, we’d like a ‘kind warning’ to testers trying to submit via email as of or after the deadline

Either the beta-test manager or another volunteer should coordinate these efforts.

2.5 Post launch (March) 🔗︎

After the launch, we’ll want to start fixing things we find or didn’t get to before the deadline.

Some early thoughts:

Monitor performance – see how things scale with all tests going through the CT Metabase
Improve syndication (methods, lag time, etc.) – some people used to follow the NNTP archive. While some syndication will happen via the www.cpantesters.org site, it has a lag and we can look into alternate approaches for more real-time syndication of reports
Migrate IRC notifiers to use new syndication – existing IRC bots monitor the NNTP archive and will need to be switched over
Restrict get/search access to CT Metabase server – to manage costs, we don’t want just anyone to be able to get/search from the master server, so we’ll need to tighten up access controls

2.6 Longer term improvement 🔗︎

After the migration, there will be a number of additional opportunities to build on the CT2.0 infrastructure.

On the client side, I’d like to see a move towards more structured data consistently captured from all clients rather than having to do crude parsing of what would otherwise have been an email text.

On the server side, I think there are a number of ways to improve query and search to support more interesting analytics, more targeted syndication of reports, and better visualizations.