When we're in a browser we can easily download a page to a PDF using Print + Save as PDF. However sometimes we need to a programmatic way to automatically download a web page to PDF. From a terminal or when doing local development we can easily use Puppeteer to handle this as the desired functionality is built-in, but this depends upon the availability of a web browser, which isn't so simple within a serverless function.
The principles involved include:
- Install the Browser
- Get a reference to the browser
- Load the web page
- Export to PDF
- Save the PDF to the Storage Bucket
When trying to get this to work I found a few issues. First, there's no browser natively available within the cloud function environment. Then, when trying to install the browser at deploy time it either installed the browser only on the system performing the deploy (not in the target environment) or else it gets added to the deploy bundle to the tune of over 250 MiB, which is far too large for a typical serverless bundle.
Initially I assume that installing the browser would be extremely time consuming, but in practice I found it added only a couple seconds to the cold start time of the function. Because of this, I chose to leave the installation as part of the main function execution.
Setup
Start by installing Puppeteer.
npm install puppeteer --save
Puppeteer Service
Create Puppeteer service to install a browser and retrieve a reference to it.
import { Browser, BrowserPlatform, install, InstalledBrowser } from '@puppeteer/browsers'
import * as puppeteer from 'puppeteer'
export class PuppeteerService {
private installedBrowser: InstalledBrowser = null
async installBrowser(): Promise<void> {
if(!this.installedBrowser) {
const installedBrowser = await install({
browser: Browser.CHROME,
buildId: '146.0.7680.177',
platform: BrowserPlatform.LINUX,
cacheDir: '/tmp/puppeteer-cache'
})
this.installedBrowser = installedBrowser
}
}
async getHeadlessBrowser(): Promise<puppeteer.Browser> {
await this.installBrowser()
return puppeteer.launch({
executablePath: this.installedBrowser.executablePath,
args: [ '--no-sandbox', '--disable-setuid-sandbox' ],
headless: true,
})
}
}
The installBrowser() function makes it possible to request browser installation at any time we want, and the getHeadlessBrowser() function returns a reference to a headless browser instance. In order to install the brower you must specify a specific build number that is available for installation, so this is a great thing to parameterize and will become an important ongoing maintenance task. The specific cacheDir I've provided works in the cloud functions but might need customized based upon your environment.
The full path to the executable of the installed browser must be provided to the launch(...) method along with the necessary parameters for headless use. The term "headless" here means "without rendering the UI" and this is important both because it's faster, but also because many server environments don't have the UI libraries that would be necessary to render the UI.
Function
The code below sets up an onCall function called createPDF which takes a URL parameter and produces a PDF file in the requesting user's download folder, using the Puppeteer page.pdf(...) function to do the heavy lifting. Here I've specified 2 CPUs and 2 GiB of memory for good measure. You may also find that you need to increase the timeout, based upon the load time of the website(s) you're exporting PDFs from.
export const createPDF = onCall<{ url: string }>(
{ cpu: 2, memory: '2GiB' },
async request => {
const uid = request.auth?.uid
const { url } = request.data
if (!uid) handleHttpError([ 'unauthenticated', 'Please log in.' ])
// Setup the browser
const puppeteerService = new PuppeteerService()
const browser = await puppeteerService.getHeadlessBrowser()
const page = await browser.newPage()
// Load the page
await page.goto(url, { waitUntil: 'domcontentloaded' })
// Export the PDF
const title = await page.title()
const path = join(tmpdir(), `${title}.pdf`)
await page.pdf({
path,
format: 'Letter',
printBackground: true,
})
// Setup the Storage Bucket
const bucket = (await getStorageBucket()).bucket()
const file = bucket.file(`downloads/${uid}/${title}.pdf`)
const [ exists ] = await file.exists()
if(exists) {
await file.delete()
}
// Copy the PDF to the Storage Bucket
const inStream = await fs.createReadStream(path)
const outStream = file.createWriteStream()
await pipeline(inStream, outStream)
// close the browser
await browser.close()
})
The initial download is made to the functions local filesystem in a temp directory, but then the stream pipeline(read, write) function is used to pipe the downloaded file into the storage bucket file that's been created. And finally, don't forget to clean up by closing the browser.
Summary
The simplicity of the code masks the fact that figuring out how to do this and get it tested in a live serverless environment took a bit of figuring out. If this helps you solve a problem on your project, please reach out and let me know!

