Anyway, to deal with this new format I had to change my approach for getting the data into Excel. With the help of this page Tutorial: Command-line OCR on a Mac I was able to build a process to do the following:
- Use pdftk to burst the multi-page PDF into single-page files.
- Use inkscape to convert each page into a PNG file.
- Use Tesseract to OCR these image files into text.
- Extract and format the text files for import into Excel.