AGS Logo AGS Logo

Serverless PDF Export

Laptop showing a web page about building a dream website. On the desk with the laptop is a phone and coffee mug.

Photo by Carriza Maiquez on Unsplash

When we're in a browser we can easily download a page to a PDF using Print + Save as PDF. However sometimes we need to a programmatic way to automatically download a web page to PDF. From a terminal or when doing local development we can easily use Puppeteer to handle this as the desired functionality is built-in, but this depends upon the availability of a web browser, which isn't so simple within a serverless function.

The principles involved include:

  • Install the Browser
  • Get a reference to the browser
  • Load the web page
  • Export to PDF
  • Save the PDF to the Storage Bucket

When trying to get this to work I found a few issues. First, there's no browser natively available within the cloud function environment. Then, when trying to install the browser at deploy time it either installed the browser only on the system performing the deploy (not in the target environment) or else it gets added to the deploy bundle to the tune of over 250 MiB, which is far too large for a typical serverless bundle.

Initially I assume that installing the browser would be extremely time consuming, but in practice I found it added only a couple seconds to the cold start time of the function. Because of this, I chose to leave the installation as part of the main function execution.

Setup

Start by installing Puppeteer.

npm install puppeteer --save

Puppeteer Service

Create Puppeteer service to install a browser and retrieve a reference to it.

import { Browser, BrowserPlatform, install, InstalledBrowser } from '@puppeteer/browsers'

import * as puppeteer from 'puppeteer'

export class PuppeteerService {
  private installedBrowser: InstalledBrowser = null

  async installBrowser(): Promise<void> {
    if(!this.installedBrowser) {
      const installedBrowser = await install({
        browser: Browser.CHROME,
        buildId: '146.0.7680.177',
        platform: BrowserPlatform.LINUX,
        cacheDir: '/tmp/puppeteer-cache'
      })
      this.installedBrowser = installedBrowser
    }
  }

  async getHeadlessBrowser(): Promise<puppeteer.Browser> {
    await this.installBrowser()
    return puppeteer.launch({
      executablePath: this.installedBrowser.executablePath,
      args: [ '--no-sandbox', '--disable-setuid-sandbox' ],
      headless: true,
    })
  }

}

The installBrowser() function makes it possible to request browser installation at any time we want, and the getHeadlessBrowser() function returns a reference to a headless browser instance. In order to install the brower you must specify a specific build number that is available for installation, so this is a great thing to parameterize and will become an important ongoing maintenance task. The specific cacheDir I've provided works in the cloud functions but might need customized based upon your environment.

The full path to the executable of the installed browser must be provided to the launch(...) method along with the necessary parameters for headless use. The term "headless" here means "without rendering the UI" and this is important both because it's faster, but also because many server environments don't have the UI libraries that would be necessary to render the UI.

Function

The code below sets up an onCall function called createPDF which takes a URL parameter and produces a PDF file in the requesting user's download folder, using the Puppeteer page.pdf(...) function to do the heavy lifting. Here I've specified 2 CPUs and 2 GiB of memory for good measure. You may also find that you need to increase the timeout, based upon the load time of the website(s) you're exporting PDFs from.

export const createPDF = onCall<{ url: string }>(
  { cpu: 2, memory: '2GiB' },
  async request => {

  const uid = request.auth?.uid
  const { url } = request.data

  if (!uid) handleHttpError([ 'unauthenticated', 'Please log in.' ])

  // Setup the browser
  const puppeteerService = new PuppeteerService()
  const browser = await puppeteerService.getHeadlessBrowser()
  const page = await browser.newPage()

  // Load the page
  await page.goto(url,  { waitUntil: 'domcontentloaded' })

  // Export the PDF
  const title = await page.title()
  const path = join(tmpdir(), `${title}.pdf`)
  await page.pdf({
    path,
    format: 'Letter',
    printBackground: true,
  })

  // Setup the Storage Bucket
  const bucket = (await getStorageBucket()).bucket()
  const file = bucket.file(`downloads/${uid}/${title}.pdf`)
  const [ exists ] = await file.exists()
  if(exists) {
    await file.delete()
  }

  // Copy the PDF to the Storage Bucket
  const inStream = await fs.createReadStream(path)
  const outStream = file.createWriteStream()
  await pipeline(inStream, outStream)

  // close the browser
  await browser.close()
})

The initial download is made to the functions local filesystem in a temp directory, but then the stream pipeline(read, write) function is used to pipe the downloaded file into the storage bucket file that's been created. And finally, don't forget to clean up by closing the browser.

Summary

The simplicity of the code masks the fact that figuring out how to do this and get it tested in a live serverless environment took a bit of figuring out. If this helps you solve a problem on your project, please reach out and let me know!

Custom Websites

Something all businesses and products have in common is the need for a website. Ideally one that is fast, reliable, accessible, and easy to update. Andromeda delievers using the same design expertise our customers have come to expect and combining this with our own Ignition™ Content Management System to produce blazing-fast static websites. Hire us to custom-design your new website and get the very best of design and technology to help your business grow with a top-quality online presence.

License: CC BY-NC-ND 4.0 (Creative Commons)