Compression

Introduction

HTTP Compression is a technique supported by most web browsers and web servers. When enabled on both sides, it can automatically reduce the size of text downloads (including HTML, CSS and JavaScript) by 50-90%. It is enabled by default on all modern browsers, however all web servers disable it by default, and it has to be explicitly enabled. A surprising number of sites still do not have compression enabled, despite the benefits to users and the potential saving in bandwidth costs at the server-side.

Turning on compression takes a relatively short amount of time for the potential bandwidth savings it offers. However, due to various historical browser and server issues it isn't quite as simple as an on-off switch, and in some cases investing in third party tools may be the best option. To make the process as simple as possible, instructions for the most popular web servers are provided below. A discussion of various compatibility issues addressed by these instructions follows for those who are interested, or for the situation where a different web server is being used.

Checking for Compression

There are a number of ways to check whether or not a site has compression enabled. The Firefox web browser with the Web Developer Toolbar plugin[1] can be used to analyse page sizes and show the benefits of compression on servers that have it enabled.

Alternatively, online compression analysis tools can be found on pages such as http://www.port80software.com/ and http://pipeboost.com/, or more comprehensive page analysis is available at http://www.websiteoptimization.com/services/analyze/.

The Effects of Compression

As an example of the benefits of using compression, here is an analysis of a range of popular sites. The download times assume an average bandwidth of 20kbps.

Site Size without compression (kB) Size with compression (kB) Download time without compression (s) Download time with compression (s) Relative saving
Blackwell Publishing home page 170 156 68 62 8%
Development Gateway home page 78 70 31 28 10%
CNN home page 796 695 318 278 13%
SpringerLink computer science subject page 450 173 180 69 62%
Google search results page for 'development' 27 12 11 5 66%

Enabling Compression

Select the web server that you are using for the relevant guide to enabling compression:

If your web server is not listed then check the documentation for instructions on enabling compression. Consider reading the detailed discussion of compatibility issues below to identify issues you may wish to address on your server.

After Compression is Enabled

There are a couple of things worth checking after enabling compression on your web server:

Browser Compatibility
As with any change made to a site, it is good practice to test against a number of browsers for compatibility after the change is made. Given the known compression compatibility issues with a minority of older browsers, it is especially worth doing this after enabling compression. See the Browser Compatibility section for more details.

CPU load
Serving compressed content can increase CPU load on the server. This is less likely to be an issue for newer servers, but as sites grow larger and more popular, older machines may see high CPU usage at peak times. It is important that your CPU usage isn't regularly at or near 100%, otherwise this will become a bottleneck for serving content and actually increase the time taken to download pages. Monitor the CPU usage on your server after enabling compression, and if it regularly approaches 100% then you should consider, in order of preference: compressing static content in advance, turning down the level of compression if this option is available, turning off compression for dynamic pages, reducing the list of file types for which compression is enabled, and turning off compression for one or more sites on the server.

Pre-compression of Static Content

Apache and IIS both support pre-compression of static content, where the file is compressed just once and cached for future use. While dynamic files must be compressed for each request, pre-compression avoids repeatedly compressing the same static file, saving CPU overhead and improving response time.

One consideration is that if you're using a server that doesn't automatically implement pre-compression and caching, such as earlier versions of Apache, then it is necessary to manually update the compressed file whenever the source file changes. Another is that server-side includes are not supported for pre-compressed files.

For IIS, pre-compression and caching are completely automatic and require no intervention by the administrator.

For Apache 1.3 using mod_gzip, with versions prior to 1.3.26.1a the administrator must manually create compressed .gz versions of static files with the gzip program, and place them in the file structure next to the original uncompressed versions, e.g. a directory containing home.html should also contain home.html.gz. The config file should contain the line mod_gzip_can_negotiate Yes, as per our sample config, to tell Apache to look for a sensibly named compressed file to serve if possible.

With mod_gzip versions 1.3.26.1a onward, Apache will automatically create pre-compressed static content if mod_gzip_update_static Yes is added to the Apache config.

For Apache 2.x using mod_deflate the configuration is rather complex and involves the use of the MultiViews module, and renaming your uncompressed files. For a guide to this, read http://www.everything2.com/index.pl?node_id=1690029 or http://www.webmasterworld.com/forum92/5880.htm. While mod_deflate is slightly more user-friendly, it is also possible to use mod_gzip with Apache 2.x to allow the automatic pre-compression of static content.


Further Notes on Compression

The following is intended as an in-depth discussion of various issues relating to compression for reference as an aid in understanding the instructions described above, or in making your own amendments to web servers. It is not necessary to understand the variety of issues discussed here to follow the instructions above to enable compression on your server.

Compression Compatibility

Browser Bugs

Due to undetected bugs, many browsers could not properly support compression for some time. The latest versions of all major browsers work perfectly well with compression.

Unfortunately, we cannot assume that all users have the latest version of a decent browser or will be able to upgrade. We can only support these users if we take steps to disable compression for them.

The browser reports to the server that it supports compression by sending a header in each request:

Accept-encoding: gzip

Browsers that don't even know about compression will not send this header. If the server supports compression as well, it will send back a compressed reply, and the following reply header tells the browser that it must decompress the content before use:

Content-encoding: gzip

Some servers (e.g. Apache) can control whether to enable or disable compression based on the browser's User-Agent request header, which indicates the browser name and version. With these servers, it is possible to detect most browsers that do not properly support compression and disable it for them, so that they will be able to use the site.

We will list the problematic browsers and content-types below and describe how to configure some popular web servers later in the chapter.

Netscape Navigator versions 4.06 to 4.08 claim to support compression, but in fact do not correctly process compressed data at all. In order to support these browsers, it is necessary to detect them and disable compression completely.

These browsers can be identified by one of the following user agent strings:

Mozilla/4.06
Mozilla/4.07
Mozilla/4.08

In addition, all other versions of Netscape 4 do not support compression of content types other than text/HTML (HTML pages), and it should be disabled for all other types. You can identify them from user agent strings starting with Mozilla/4, but this is complicated by the fact that several other major browsers, including Internet Explorer and Opera, also use this string, and compression should be enabled for some of these browsers.

A further trap is that some versions of Internet Explorer 5.5 and 6.0 had a bug[2] that corrupted compressed files if the user had a download manager or other HTTP handler installed. This bug was fixed by a hotfix patch, rolled into Windows XP Service Pack 2. It is not possible to detect whether the patch is installed, but it is possible to detect IE6 with Service Pack 2 by looking for the following user agent:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1

We recommend that you disable compression entirely for user agents that match Mozilla/4.0 (compatible; MSIE 6.0 but do not match the above string that indicates that the bug is fixed. This will disable compression for all IE4, IE5, IE5.5 and IE6 users except IE6 with SP2, but supporting such browsers without compression is better than not supporting them at all. The biggest risk is collateral damage, disabling compression for browsers that do actually support it, but it is unfeasible to test every single browser to determine whether or not it works properly with compression.

Versions of Norton Internet Security from 2005 and earlier are reported to disable compression when running on the client machine. Port 80's httpZip product fixes this problem for all versions of IIS. Norton Internet Security 2006 is reported to cause a new problem[3] with compressed content by breaking the HTTP protocol. This problem particularly causes issues with Cold Fusion Server[4], and due to its nature is difficult to fix in a third party product. There is currently no known workaround for this issue.

There is more information on browser support for compression at http://schroepl.net/projekte/mod_gzip/browser.htm.

Content Type Compatibility

Some browsers have problems handling certain content types, particularly binary files, when they are compressed.

PDF

PDF files can become unreadable when viewed through Acrobat Reader and Internet Explorer if they are compressed. It is advisable to exclude them from the list of file types to be compressed.

Image formats

There is little point in compressing images, as these are probably already compressed and further compression may actually slightly increase the size of the file.

JavaScript

When reading compressed JavaScript files, Internet Explorer 5 can fire an onLoad event before the JavaScript has been decompressed, which can cause problems with some scripts. If you follow our recommendations then JavaScript compression will be disabled anyway for this browser. Otherwise, you might want to test your site with IE 5 if it contains JavaScript.

CSS

A minority of older browsers do not reliably handle compressed CSS files, although this appears to be the result of a combination of factors which makes it difficult to specify exactly when this behaviour will be observed. This can lead to a page being presented as if there were no CSS file. If you have designed your page to be at least basically usable without any CSS, as recommended in the Stylesheets chapter, then we suggest that you compress CSS files as this will benefit the great majority of users.

Summary

  • Enable file compression for text files (HTML, CSS and JavaScript) on your server
  • Try to implement pre-compression for static content
  • Check Browser Compatibility for different compressed content types

Further Resources about Compression

See http://www.webperformance.org/compression/

Footnotes

[#1] https://addons.mozilla.org/en-US/firefox/addon/60

[#2] http://support.microsoft.com/default.aspx?scid=kb;en-us;Q313712

[#3] http://www.port80software.com/200ok/archive/2006/01/04/901.aspx

[#4] http://www.port80software.com/200ok/archive/2006/01/06/909.aspx