I often download academic articles as PDFs to read later. I regularly find two really annoying problems:
- Huge margins make the PDF nearly unreadable on my Kindle Fire
- Title and authors are missing from PDF metadata, making them harder to find later via search
Today, I found an answer to the first and wrote an answer to the second.
For fixing huge margins, I found the free briss tool, which shows a composite image of all odd and even pages and lets you set a crop box around them.
To fix the metadata, I did a bit of Googling for tools, then realized I could whip up a tool with Perl and PDF::API2 faster than hunting for an existing one.
This program reads PDF metadata, opens an editor with the data in JSON format, and takes the result and saves it to a new PDF.
#!/usr/bin/env perl use v5.10; use strict; use warnings; use JSON::MaybeXS; use PDF::API2; use Path::Tiny; die "Usage: $0 <infile> <outfile>\n" unless @ARGV == 2; my ( $infile, $outfile ) = @ARGV; unless ( $infile || -r $infile ) { die "Input file '$infile' can't be read\n"; } my $pdf = PDF::API2->open($infile); my $json = JSON::MaybeXS->new( utf8 => 1, pretty => 1 ); my $temp = Path::Tiny->tempfile; $temp->spew( $json->encode( { $pdf->info } ) ); if ( $ENV{EDITOR} ) { system( $ENV{EDITOR}, $temp ) and die "Error editing temp file: $!\n"; } else { die "No EDITOR environment variable set.\n"; } $pdf->info( %{ $json->decode( $temp->slurp ) } ); $pdf->saveas($outfile);