The Portable Document Format is an efficient way of publishing content for both screen and print. However, it is easy to produce PDFs that are unnecessarily large. In this chapter we describe ways to avoid this.
Is PDF the Correct Format?
Where a document is being created for distribution, consider whether it can be displayed as HTML or plain text instead of or in addition to a PDF. The wrapper for the PDF can significantly add to the size of the file, whereas text and HTML are comparatively small, compress well and can be easier to use and manipulate by the end user. Offering documents in text or HTML format also removes the need for PDF viewing software, which in some cases may not be installed on the user's computer and can require a further significant download. A List Apart present a good summary of the reasons why PDF could be an inappropriate format for many documents.
Choose the Right Version
When generating PDFs, we recommend using version 1.4 of the PDF standard. Most installed PDF viewers are likely to be compatible with at least version 1.4. Given that the latest versions of PDF viewers are likely to be larger than earlier versions (Adobe Reader 8.1 for Windows is 22.3MB), users with a poor Internet connection are unlikely to want to download them. The PDF/A or PDF-Archive standard is intended to be an efficient format for the long-term storage of documents and consists of a reduced subset of the 1.4 standard. Therefore, requiring viewers compatible with 1.4 or less would be compatible with a shift toward using PDF-Archive as a standard. Version 1.5 of the PDF standard has added the ability to compress the whole PDF file. However this will not be understood by earlier versions of PDF viewers and therefore our advice is that unless you can be certain that visitors to your website will be able to read these documents, it is best to use version 1.4.
Minimising Embedded Information
PDF documents can include additional information such as embedded fonts, which help to ensure the highest quality of display and printing. These features consume additional space in the file and therefore must be downloaded by the user.
Using standard fonts reduces the need to embed fonts in the document. The following 14 base fonts are guaranteed to exist on all systems according to the PDF 1.4 reference:
However, these base fonts do not support languages with non-Latin characters. For these languages there may be no choice but to embed the fonts in the document. If you do embed fonts in a document, it is possible to subset the font so that only the actual characters used will be included rather than the entire character set. This also means that someone who wishes to edit the PDF would have to have the same font installed on their system, so this is a consideration to be taken into account if relevant.
If your PDF is being produced from a scanned paper document, the pages will be stored as images, which will take up more space than text. If possible, use Optical Character Recognition (OCR) software to produce text from the scanned images, or type in the text by hand.
Use vector-based graphics to represent line drawings such as diagrams, or images that might usually be represented as GIFs. These will compress well and will not suffer from loss of quality when scaled.
The compression techniques described in the Images section can be applied to images that will be used in PDFs. However, some PDF creation tools have their own image compression options and in this case it may be more effective to add the highest quality images available at the PDF output resolution and then use the PDF creation tools to compress the images to the desired level. Images that are compressed before adding to a PDF may lose significant quality when recompressed by the PDF creation tools.
Consider providing multiple versions of the same PDF with different levels of image compression, or no images at all.
Splitting Up Files
Lengthy documents can be split into separate files so that the whole document doesn't need to be downloaded if the user only wants to access a particular chapter.
A linearised PDF is one that has been optimised so that a PDF viewer embedded in a web browser can display pages before the entire document has loaded. This is also known as byte serving or Fast Web View. This is the PDF equivalent of incremental rendering on web pages, providing faster response to the user while the document is loading and improving their experience. This means that a user can evaluate a potential download faster, although the actual resulting file is slightly larger than a non-linearised version.
In Acrobat 8, saving documents through the Save As... option menu automatically linearises PDFs by default and also removes unused objects, further decreasing the size of the resulting PDF. For more information read Adobe's document Optimizing Adobe PDF Files for the Web.
Websiteoptimization.com provides a detailed discussion of further methods that can be used in Acrobat 8 and other tools to optimize PDFs automatically.
SummaryFor optimising PDFs we recommending using:
- HTML or plain text instead of a PDF, whenever possible
- Version 1.4 of the PDF standard when saving new PDFs
- Standard cross-platform fonts, like Helvetica, rather than embedded fonts
- OCR text from scanned documents
- Vector-based graphics
Further reading on PDF accessibility and downloading can be found in Aptivate's PDF Usage Guidelines.
[#1] Facts and Opinions About PDF Accessibility, Joe Clark, A List Apart, 2005 http://www.alistapart.com/articles/pdf_accessibility/
[#2] PDF Reference, Third Edition, Adobe Portable Document Format Version 1.4, Adobe Systems Inc, 2001 http://www.adobe.com/devnet/pdf/pdfs/PDFReference.pdf (8.95MB)
[#3] Adobe Acrobat Optimizing Adobe PDF files for the Web, Adobe Systems Inc, 2001 http://www.adobe.com/products/acrobat/pdfs/c01acrotip.pdf (76kB)
[#4] Optimize PDF Files, websiteoptimisation.com http://www.websiteoptimization.com/speed/tweak/pdf/
[#5] PDF Usage Guidelines, Aptivate, 2007 http://www.aptivate.org/Projects.PDFUsageGuidelines.html