HTML to PDF With PhantomJS

PhantomJS Tricks: HTML to PDF Conversion

PhantomJS is widely known as the brains powering headless javascript testing. But as a standalone WebKit executable, it also has a screen capture functionality that can render web pages to PNG or PDF. For very simple document conversions, PhantomJS is a fairly straightforward tool. But I warn you, severe headaches will occur with any conversion of substance: repeating header/footers, images, SVG, fonts, etc – issues aren’t mentioned in the sparse documentation or example snippets, and can lead to some serious frustration. Hopefully I can help save you a few days with these tricks.

  • Document Structure
  • Page Frame Border
  • Images (Header/Footer)
  • SVG

Page Structure

A Phantom document’s “canvas” takes a variety of sizes (A4, Letter, Legal, etc.) The hints below conform to US-Letter size (8.5 x 11).

A document with repeating headers and footers looks like this:

          margin
 _______________________________
|         header               |
|                              |
________________________________
|                              |
|                              |
|          body                |
|                              |
|                              |
|                              |
________________________________
|                              |
|          footer              |
________________________________

How exactly PhantomJS (or WebKit) structures their measurements internally seems somewhat of a mystery. The easiest way to understand a PhantomJS document is to consider it containing two types of pixel lengths. These are likely OS-specific measurements (as measured on an Ubuntu box.)

  • Header / Footer : 1 full page = 2010 pixels
  • Body: 1 full page = 990 pixels.

Meaning, if you were to have a single document consisting of entirely “header,” it would have a height of 2010 pixels. A document containing only a “body” is 990 pixels.

Any content that goes in the header or footer, needs to be converted from real (what you see) pixels to “Phantom Pixels”, at the 2010/990 ratio. For example, do you have a header with a height of 125 pixels? That will need to be resized to 125 * 2010/990 pixels. But any content that goes in the body, doesn’t require a size conversion.

I don’t know why these lengths are what they are. I had to iteratively guess-and-check to find them out.


Page Frame Border

If you want a seamless page frame border, you’ll need to calculate your pixels for the document, and edit the borders of each component (header / body / footer) accordingly; The header has zero bottom border, the body has only visible side borders, and the footer has zero top border.

PhantomJS Full Page Border

Take your margins and header/footer height, and subtract them from 990px. The result is the height of your body.

  • Margin (top and bottom, or 15px each)=30px,
  • Header = 150px, Footer = 50px.
  • 990 – (150+50+2*15) = 760px.

You’ll have to render the PDF twice in PhantomJS: on the first render you’ll be able to calculate the number of pages of the document (A good way is to get this from the footer callback). Then you can extend your content wrapper div to the entire height, and render the page again to have a repeating page frame border for the entire document.

For example, if your document is 4 pages long, the height of your wrapper div (that which contains all the content of your document) needs to be 4 * 760px = 3040px. If you don’t do this, you’ll see the body border prematurely end, and you’ll get whitespace instead of a frame border.

Of course, you’ll need to fudge the heights of your header/footer +/– 1 pixels to really get it spot on.


Images (Header / Footer)

Images in the header need to be passed as base64-encoded text, and need to be included in the body, but hidden via a style attribute ‘display:none’. Due to the async nature of phantomjs, when the headers are rendered, if the image isn’t already ‘cached’/loaded it simply won’t display.

Think of the header/footer as sandboxed from each other and the rest of the document. So CSS styles don’t fall over.


SVG and Assets

SVG images need to be placed as base64-encoded text, and best done through an image tag. Raw SVG works, but isn’t properly “namespaced” (for lack of a better term that describes what happens), so if you have multilple SVG charts or graphs, it’s likely the styles will bleed over. For example, if you add a second graph, the style settings of the second graph will override the first. Three charts, and now you have the first two re-styled according to the last SVG.

Local assets (localhost/127.0.0.1) will not be loaded. But File:// uri’s seem to work.

Fonts are funky. Font weights seem to be ignored, so you’ll have to be explicit with naming each font instead of treating them properly like a family.


Hope I saved you some frustration – if anyone has clearer ideas regarding phantom behavior, I’d most welcome the clarification – thanks!

Comments