Go as glue: JSON + XML + PNG = PDF

tl;dr: How I used Go to mash up card game data from the Internet into a printable PDF of card proxies.

Earlier this year, a friend introduced me to the Lord of the Rings card game. Like Magic the Gathering, you construct your own deck from a pool of cards. Unlike Magic, it’s cooperative (or solo): you compete against a thematic scenario rather than against other players.

Ideally, you need access to a big pool of cards to customize your hero deck to the specific challenges of each scenario. Unfortunately, many of the early expansion sets are out of print and ridiculously expensive on Ebay.

For casual play, proxy cards solve this problem: a proxy is a paper copy of a card placed on top of a card you don’t need inside a plastic sleeve.

Sleeving a proxy

Proxies also solve another annoyance of mine. I like having multiple decks ready for play, but sometimes several decks need all my copies of a certain card. Proxying lets me have enough copies of a card for all my decks – just like I could do if I were playing online.

But where do proxies come from? And if I want to make a whole deck of proxies, how can I make that painless?

Like many games of this type, there are web sites with pictures and descriptions of all the cards. And I’m a programmer. So I decided to write a program in Go to grab the images I need and lay them out in a PDF for easy printing.

Let’s walk through how to do it!

Exploring the input data 🔗︎

Our main source of data is a web site called RingsDB. It has a lot of great feature for players, including the ability to build and save deck lists online. More importantly for our needs, it offers an API.

Deck lists are downloadable as text files. Here’s an excerpt of a deck list in the XML format used by OCTGN, a virtual table top platform:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<deck game="a21af4e8-be4b-4cda-a6b6-534f9717391f">
  <section name="Hero" shared="False">
    <card qty="1" id="3dd1cdc1-8c87-4c8b-89fe-22e7e310bca9">Thranduil</card>
    <card qty="1" id="4c4cccd3-576a-41f1-8b6c-ba11b4cc3d4b">Celeborn</card>
    <card qty="1" id="8d6a15a8-e363-46bb-8e49-0af45a1ea0d1">Galadriel</card>
  </section>
  <section name="Ally" shared="False">
    <card qty="1" id="f9865446-f3a4-4a9f-bb1e-68565720511b">Galadhrim Healer</card>
    <card qty="3" id="d5500d57-03af-4152-9ad0-620aeb4fd50b">Marksman of Lórien</card>
    <card qty="3" id="905dbe88-7974-4b0c-9a82-f96ba4dd866f">Woodland Courier</card>
    ...
  </section>
  ...
</deck>

A deck has sections for different card types (Hero, Ally, etc.). The entries for the cards include a quantity and a unique identifier we can use later to find the image for each card.

The RingsDB API doesn’t let us query directly on the OCTGN ID, but it does let us get metadata on the complete card pool in JSON format. Here’s a portion of the JSON document for the card for Aragorn:

{
   "name" : "Aragorn",
   "code" : "01001",
   "text" : "Sentinel.\n<b>Response:</b> After Aragorn commits to a quest, spend 1 resource from his resource pool to ready him.",
   "imagesrc" : "/bundles/cards/01001.png",
   "octgnid" : "51223bd0-ffd1-11df-a976-0801200c9001",
   "traits" : "Dúnedain. Noble. Ranger.",
   "url" : "https://ringsdb.com/card/01001",
   ...
}

We can see the octgnid and the imagesrc fields are what we need to connect the decklist to image data.

Sketching out a pipeline 🔗︎

Now we know enough to sketch out an approach and start outlining the program. We want a program that will consume a deck list and output a PDF of images for that deck. We can break that down into several steps:

Get metadata for all cards
Parse the deck list
Get the image for each card in the deck
Arrange the images in a PDF

To keep things simple, we’ll plan for the tool to take two required arguments from the command line: the input filename and the output filename. We’ll use a pipeline pattern, so our main function is just this:

type App struct {
    inputFile  string
    outputFile string
    err        error
}

func main() {
    if len(os.Args) != 3 {
        log.Fatalf("usage: %s <input.o8d> <output.pdf>", os.Args[0])
    }

    app := &App{
        inputFile:  os.Args[1],
        outputFile: os.Args[2],
    }

    app.LoadMetadata()
    app.ParseInputFile()
    app.PreloadImages()
    app.CreatePDF()

    if app.err != nil {
        log.Fatalf("error: %v", app.err)
    }
}

You’ll notice that we don’t check for errors after each pipeline step; we check at the end instead. This is a technique adapted from Rob Pike’s article, Errors are values, and that I expanded on in my own article, Schrödinger’s Monad.

Each stage will set the app.err variable for an error, and subsequent stages will no-op if app.err is non-nil. This simplifes the main function and avoids duplicate code for exiting with a fatal error after each pipeline stage.

We also don’t pass data directly between pipeline stages. Instead, we’ll add fields for them later in the app object. This keeps the high-level logic flow of the program immediately apparent, unobscured by the details of passing data between the stages. (For a one-shot program like this, it doesn’t matter if intermediate data stays in app and isn’t garbage collected while the program runs; YMMV.)

Data caching 🔗︎

Before we start writing the LoadMetadata method, we should think about caching. Card metadata changes slowly – generally only after a new expansion pack of cards has been released or if someone corrects an error in the existing metadata. It seems both slow and unfriendly to download metadata and card images from scratch every time we run the program, so we’ll use a local disk cache to cut down on how often we load this data over the Internet.

To do that, we’ll use shibukawa/configdir, which provides operating-system specific directories for user configuration/cache data. We’ll add an object for the cache to our app data structure:

const vendorName = "xdg.me"
const appConfigName = "cardproxypdf"

type App struct {
    ...
    cache      *configdir.Config
}

func main() {
    ...
    app := &App{
        ...
        cache:      configdir.New(vendorName, appConfigName).QueryCacheFolder(),
    }
    ...
}

Later, when we use the cache, we’ll refresh card metadata if it’s more than 24 hours old and we’ll keep images around until cleaned up manually.

Fetching card metadata 🔗︎

We’ve seen an example of the metadata, above, but we don’t actually need all of it. The important part is the mapping from the OCTGN ID to the image source, so we’ll add a field to App to store the desired map.

type App struct {
    ...
    octgnToImgName map[string]string
}

The first part of LoadMetadata is fetching from the cache, if it exists:

func (app *App) LoadMetadata() {
    // Shortcut if app has errored
    if app.err != nil {
        return
    }

    // Try loading from cache
    var err error
    app.octgnToImgName, err = loadFromCache(app.cache)
    // Return if it worked or fall through to refetching from the API
    if err == nil {
        return
    }
    if err != errIgnoreCache {
        log.Printf("warning: failed loading metadata from cache: %v", err)
    }

    ...
}

I’m not going to show loadFromCache here, but I’ll link to a repo with all the source later. The important thing to know is that if the cache doesn’t yet exist or is more than 24 hours old, the function returns errIgnoreCache.

Regardless of any error, if the cache doesn’t have the metadata, we need to download it. We’ll use a helper function for downloading over HTTP:

func httpGetBytes(url string) ([]byte, error) {
    client := &http.Client{Timeout: 5 * time.Second}
    resp, err := client.Get(url)
    if err != nil {
        return nil, err
    }
    defer resp.Body.Close()
    body, err := ioutil.ReadAll(resp.Body)
    if err != nil {
        return nil, err
    }
    return body, nil
}

In this function, we create an http.Client with a timeout because the default client used by http.Get has no timeout (which is generally a bad idea).

Once we have the JSON data from the API, we’ll need to convert it to the map form we want. For that, we’ll first need a data model to decode the JSON. Since we’re throwing away all but a couple fields, our model for a single card is a tiny struct:

type RingsCard struct {
    ID       string `json:"octgnid"`
    ImageSrc string `json:"imagesrc"`
}

The JSON data from the API is an array of these cards, so we just need to unmarshal to a slice of RingsCard and loop over them to populate our map. We’ll trim off the first part of the image path, as having the base path is easier to work with in the local cache and in the construction of our PDF.

const ringsImagePrefix = "/bundles/cards/"

func convertRingsDataToMap(body []byte) (map[string]string, error) {
    var cardList []RingsCard
    err := json.Unmarshal(body, &cardList)
    if err != nil {
        return nil, err
    }

    images := make(map[string]string)
    for _, v := range cardList {
        images[v.ID] = strings.TrimPrefix(v.ImageSrc, ringsImagePrefix)
    }

    return images, nil
}

With these helper functions, we’re ready to finish LoadMetadata:

const ringsURLGetAll = "http://ringsdb.com/api/public/cards/"

func (app *App) LoadMetadata() {
    ...

    // Fetch from the API and cache the result
    var data []byte
    data, app.err = httpGetBytes(ringsURLGetAll)
    if app.err != nil {
        return
    }
    app.octgnToImgName, app.err = convertRingsDataToMap(data)
    if app.err != nil {
        return
    }
    err = saveToCache(app.cache, app.octgnToImgName)
    if err != nil {
        log.Printf("warning: failed saving metadata to cache: %v", err)
    }

}

At the end, we save the data to the cache. If that errors, we warn and continue, because we still have the data we need for this run.

Parsing XML deck contents 🔗︎

The next stage, ParseInputFile, needs to take the OCTGN XML text file and turn it into a data structure we can used to render the deck.

Reading the file is trivial:

func (app *App) ParseInputFile() {
    if app.err != nil {
        return
    }

    var data []byte
    data, app.err = ioutil.ReadFile(app.inputFile)
    if app.err != nil {
        return
    }
    ...
}

Then we have to unmarshal the XML data. With a little bit of forethought, we can reuse part of our XML data model for the deck itself. Remember that the XML structure looks like this:

<deck game="a21af4e8-be4b-4cda-a6b6-534f9717391f">
  <section name="Hero" shared="False">
    <card qty="1" id="3dd1cdc1-8c87-4c8b-89fe-22e7e310bca9">Thranduil</card>
    ...
  </section>
  ...
</deck>

To render the deck as a PDF, we don’t need the intermediate section structure – we can flatten the deck into a list of cards. However, for each card, we don’t just need the OCTGN ID and count, we also need to associate it with the image for the card that we got from the metadata.

We can define an XMLCard struct with fields for qty and id from each XML card entity (note that we don’t actually need the card name; we only want attributes). If we add an extra, non-XML-tagged field to hold the image path, then XML unmarshaling will leave it blank. We can fill in the path from the metadata as we flatten it.

type XMLCard struct {
    Quantity  int    `xml:"qty,attr"`
    OctgnID   string `xml:"id,attr"`
    ImagePath string // filled in later from metadata
}

The types for the XML section and XML deck are just slices of the corresponding types they contain:

type XMLSection struct {
    Cards []XMLCard `xml:"card"`
}

type XMLDeck struct {
    Sections []XMLSection `xml:"section"`
}

Now that we have our data model defined, we can unmarshal the XML, flatten it (adding in the card image metadata), and store the flattened deck in App:

type App struct {
    ...
    deck           []XMLCard
}

func (app *App) ParseInputFile() {
    ...

    var deck XMLDeck
    app.err = xml.Unmarshal(data, &deck)
    if app.err != nil {
        return
    }

    flat := make([]XMLCard, 0)
    for _, section := range deck.Sections {
        for _, card := range section.Cards {
            card.ImagePath = app.octgnToImgName[card.OctgnID]
            flat = append(flat, card)
        }
    }

    app.deck = flat
}

Fetching image data 🔗︎

Given our deck, now we need to ensure that images for all the cards in the deck are cached locally. If a particular image isn’t in the cache, we just need to construct the right URL, grab the bytes, and cache them:

const cacheImageFolder = "images"
const ringsURL = "http://ringsdb.com"

func loadImageToCache(cache *configdir.Config, imageName string) error {
    cachePath := filepath.Join(cacheImageFolder, imageName)
    urlPath := ringsURL + filepath.Join(ringsImagePrefix, imageName)

    imageBytes, err := httpGetBytes(urlPath)
    if err != nil {
        return err
    }

    err = cache.WriteFile(cachePath, imageBytes)
    if err != nil {
        return err
    }

    return nil
}

That’s the work to be done for a single card, but for a whole deck we can parallelize the downloads with goroutines.

The PreloadImages method needs to loop over our deck, launching a goroutine for any image not already in the cache, but we have to be careful here. It’s a common Go mistake to use a loop variable in a goroutine closure like this:

// DON'T DO THIS
for _, card := range app.deck {
    go func() {
        doWork(card)
    }()
}

Instead, we want to pass our loop variable into the goroutine as an argument, like this:

for _, card := range app.deck {
    go func(card XMLCard) {
        doWork(card)
    }(card)
}

When fanning-out goroutines, we need a sync.WaitGroup to ensure all the goroutines finish before we continue processing. We add to the WaitGroup as we launch goroutines and defer marking them done within each goroutine. We can add a sync.Map to collect errors from each goroutine so we can log them all later. Now the skeleton of our fan-out code looks like this:

wg := sync.WaitGroup{}
var errMap sync.Map
for _, card := range app.deck {
    wg.Add(1)
    go func(card XMLCard) {
        defer wg.Done()
        err := doWork(card)
        if err != nil {
            errMap.Store(card.ImagePath, err)
        }
    }(card)
}
wg.Wait()

Instead of the generic doWork placeholder, above, we’ll call our loadImageToCache function and we’ll only launch a goroutine if we don’t already have it in the cache. Putting all that together, here’s how PreloadImages starts out:

func (app *App) PreloadImages() {
    if app.err != nil {
        return
    }

    wg := sync.WaitGroup{}
    var errMap sync.Map
    for _, card := range app.deck {
        if !app.cache.Exists(filepath.Join(cacheImageFolder, card.ImagePath)) {
            wg.Add(1)
            go func(card XMLCard) {
                defer wg.Done()
                err := loadImageToCache(app.cache, card.ImagePath)
                if err != nil {
                    errMap.Store(card.ImagePath, err)
                }
            }(card)
        }
    }
    wg.Wait()
    ...
}

After wg.Wait(), all the goroutines are done, but we don’t know if everything succeeded. To find out, we have to walk the sync.Map to extract any errors:

func (app *App) PreloadImages() {
    ...

    var errs []string
    errMap.Range(func(k, v interface{}) bool {
        errs = append(errs, fmt.Sprintf("%s (%s)", k.(string), v.(error).Error()))
        return true
    })
    if len(errs) > 0 {
        app.err = fmt.Errorf("error(s) fetching images: %s", strings.Join(errs, "; "))
    }
}

Creating the PDF 🔗︎

We’ve done a lot! We’ve got our deck list and we’ve cached all our images. Now it’s time to build a PDF. We’ll use jung-kurt/gofpdf as our PDF builder.

At the highest level of abstraction, CreatePDF only has three things to do – initialize a new PDF, add images to it and render them onto the page and out to a file:

func (app *App) CreatePDF() {
    if app.err != nil {
        return
    }

    // Portrait-orientation, using mm units, Letter size paper
    pdf := gofpdf.New("P", "mm", "Letter", "")

    app.err = addImagesToPdf(pdf, app.cache, app.deck)
    if app.err != nil {
        return
    }

    app.err = renderPDF(pdf, app.deck, app.outputFile)
}

Adding images doesn’t mean laying them out on the page yet (we’ll do that in renderPDF). It means loading the image data into the PDF object keyed by a name that we can use later during layout.

First – and unfortunately – we have to deal with one of the weird curve balls that happen with real, live data from the wild Internet. Consider this card image for Elrond, 04218.png:

But here is the output of the file command for that image:

04128.png: JPEG image data, Exif standard: [TIFF image data, little-endian, direntries=0], baseline, precision 8, 424x600, components 3

That’s right. The image name ends in .png, but the actual image data inside is JPEG!

Browsers rely on the file signature, so it all just works when you look at it, but our PDF library defaults to using the filename suffix unless you give it the file type. I found that out the hard way when my first attempts at PDF generation failed due to the mismatch between the suffix-implied-type and the data.

So we’ll need to do file-type detection on our downloaded images as we load them into the PDF. Fortunately, Go provides http.DetectContentType, which gives us a MIME file type for a byte slice. We can use that to populate a gopdf.ImageOptions struct with the image type:

func getImageOptions(bytes []byte) (gofpdf.ImageOptions, error) {
    mimeType := http.DetectContentType(bytes)
    switch mimeType {
    case "image/jpeg":
        return gofpdf.ImageOptions{ImageType: "JPEG"}, nil
    case "image/png":
        return gofpdf.ImageOptions{ImageType: "PNG"}, nil
    default:
        return gofpdf.ImageOptions{}, fmt.Errorf("unsupported image type: %s", mimeType)
    }
}

Then addImagesToPdf is just a loop over the deck, reading in each image’s bytes, detecting a type, and registering the image with the pdf object:

func addImagesToPdf(pdf *gofpdf.Fpdf, cache *configdir.Config, deck []XMLCard) error {
    for _, card := range deck {
        imageBytes, err := cache.ReadFile(filepath.Join(cacheImageFolder, card.ImagePath))
        if err != nil {
            return err
        }
        imageOpts, err := getImageOptions(imageBytes)
        if err != nil {
            return err
        }
        pdf.RegisterImageOptionsReader(card.ImagePath, imageOpts, bytes.NewReader(imageBytes))
    }
    return nil
}

Now that images are loaded, we’ll use the renderPDF function to lay out the images in the PDF.

The first thing we’ll do is convert the decklist to a slice of strings of the image names – the same ones we used when registering the images with the pdf object. Using card.Quantity for each card, we’ll duplicate the string name that many times, which will be handy in a later step when we’re dividing images up into individual pages. Because we don’t know the total number of cards, we can’t pre-allocate the slices, so we’ll start from an empty one and append:

func renderPDF(pdf *gofpdf.Fpdf, deck []XMLCard, outputPath string) error {
    images := make([]string, 0)
    for _, card := range deck {
        for i := 0; i < card.Quantity; i++ {
            images = append(images, card.ImagePath)
        }
    }
    ...
}

Each page can hold up to nine images in a 3x3 grid. Because we duplicated the card name strings for duplicate cards, laying out pages is trivial. We just loop over the images, grabbing up to nine at a time. Each set we grab is a page and we loop until there are no more pages left. A handy splitAt function keeps the main loop readable:

func splitAt(n int, xs []string) ([]string, []string) {
    if len(xs) < n {
        n = len(xs)
    }
    return xs[0:n], xs[n:]
}

func renderPDF(pdf *gofpdf.Fpdf, deck []XMLCard, outputPath string) error {
    ...
    var batch []string
    for len(images) > 0 {
        batch, images = splitAt(9, images)
        err := renderSinglePage(pdf, batch)
        if err != nil {
            return fmt.Errorf("could not assemble PDF: %v", err)
        }
    }
    ...
}

Rendering a single page is fairly straightforward, but we need to account for the possibility of having less than nine cards. We need to loop over up to nine “slots” on a page, so we’ll use two nested loops of three for the rows and columns. For each slot, we’ll put the image in the right location in the 3x3 grid.

I’ve chosen to keep the necessary size and spacing constants within the function itself, because they are so tightly tied to this one function and it made it easier to tweak. Keep in mind that we set units to millimeters and coordinates and sizes need to be floats for the layout, so we’ll convert loop indices to floats and let Go coerce the other numbers.

func renderSinglePage(pdf gofpdf.Pdf, images []string) error {
    if len(images) > 9 {
        return fmt.Errorf("too many images to render (%d > 9)", len(images))
    }

    pdf.AddPage()

    var cardWidth = 63.5
    var cardHeight = 88.0
    var leftPadding = 4.0
    var spacer = 4.0

    for i := 0; i < 3; i++ {
        for j := 0; j < 3; j++ {
            if len(images) == 0 {
                return nil
            }
            fi, fj := float64(i), float64(j)

            pdf.ImageOptions(
                images[0],
                leftPadding+spacer*(fj+1)+cardWidth*fj,
                spacer*(fi+1)+cardHeight*fi,
                cardWidth, cardHeight, false, gofpdf.ImageOptions{}, 0, "",
            )
            images = images[1:]
        }
    }

    return nil
}

Finally, we finish renderPDF by writing the PDF to the output file:

func renderPDF(pdf *gofpdf.Fpdf, deck []XMLCard, outputPath string) error {
    ...
    err := pdf.OutputFileAndClose(outputPath)
    if err != nil {
        return fmt.Errorf("could not render PDF: %v", err)
    }

    return nil
}

The generated PDF will be a series of image pages like this:

Now we just need to print out our PDF, grab some scissors, sleeve up the proxies, and we’re ready for game night!

Conclusion 🔗︎

Historically, glue code or mashups might have been written in a dynamic language like Ruby, Python or Perl, but there are a lot of reasons why we should consider Go, as well. In particular, it has fast compilation for the kind of iterative “write code and try it” style of development common for these sorts of quick, one-off projects.

Like popular dynamic languages, Go has both a robust set of internal packages and a large number of open-source community packages. There are packages for working with a wide variety of web data formats, plus special purpose packages for tasks like PDF generation. Plus Go’s simple, powerful concurrency model makes it trivial to multiprocess as needed.

Give it a try for your next glue project!

The full code for this article may be found at github.com/xdg-go/lotrproxypdf