aware.hwg.org: accessible web authoring resources and education center
Current Location: HTML Writers Guild · AWARE · Tips · Browser Stats Essay

How Server Statistics Undercount Text Browsers

On the HTML Writers Guild's stylesheet mailing list, I made the following comment one time:

...often time the statistics that people are provided with are deceptive and VASTLY UNDERCOUNT people with disabilities and other folks.

Ann Navarro then asked me to explain:

Perhaps you could expand on *how* such filtered statistics can undercount those individuals? The mechanism that returns such a result isn't necessarily apparent to some developers.

The following essay resulted from that email exchange:

How Browser Stats Undercount The Disabled And Others

Do you trust your browser stats, and those reported by other sites? You know, the ones that say such-and-such a percent of people use Netscape, or MSIE, or Opera?

You should take all statistics with a grain of salt, especially web statistics. Browser stats -- generated from agent logs -- are particularly fault because, nearly all the time, they give inaccurate low counts of non-graphical users.

The reason that agent logs vastly undercount the disabled and other people who might not be using images is because of how the typical user agent log is implemented. On most every system, the default operation is as follows:

The problem comes from the fact that one line is recorded per FILE, and it is NOT stored with the name of the file being accessed. Let's look at the implications of this.

Let's say you have a web site with 3 frames. In the left frame there are 6 images (a navigation bar); in the upper right frame there is one image (a banner); and in the lower right, big frame you have your content, which includes 4 other pictures. You have one stylesheet for the navigation bar and a separate one for the content window, but not one for the simple banner window.

I come to your site using Internet Explorer 4.0. My browser requests the following files:

So your webserver dutifully marks down, in the browser log file:


Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)
Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)
Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)
Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)
Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)
Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)
Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)
Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)
Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)
Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)
Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)
Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)
Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)
Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)
Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)
Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)
Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)

(That's 17 times, one for each file used.)

Now let's say that instead, I came to look at your site using Opera 3.2, with the images turned off. (This is my usual mode of surfing; I only load images if I think there's something worth seeing.) You can turn frames off in Opera, but let's assume I have them on. Also, Opera doesn't do CSS in version 3.2, only 3.5 onward.

Therefore, my browser requests the following files:

The server writes the following in the agent log file:


Mozilla/3.0 (compatible; Opera/3.0; Windows 95/NT4) 3.2
Mozilla/3.0 (compatible; Opera/3.0; Windows 95/NT4) 3.2
Mozilla/3.0 (compatible; Opera/3.0; Windows 95/NT4) 3.2
Mozilla/3.0 (compatible; Opera/3.0; Windows 95/NT4) 3.2

If I click on my toggle bar to load images, it will pick up the 11 image files, and add an extra 11 lines to the agent log file; if I don't do this, it won't be added. Let's pretend I don't do this (because you designed your website properly, using ALT text and whatnot.)

Now, finally I go and look at it in lynx:

Hopefully, you had a useable NOFRAMES section. The NOFRAMES tag is used to provide a functional equivalent to your frameset, for browsers that don't support frames, or when the user has them disabled -- in Opera, I can disable frames if I like. Sometimes you'll see the NOFRAMES element used for a message such as "Your browser does not support frames! Get a better browser!" I always find that slightly insulting, especially when the problem is not my browser, it's someone else's faulty website design!

But, anyway, back to the log file -- only one file was downloaded using lynx, and so my agent log contains:


Lynx/2.6 libwww-FM/2.14

So now you run your log analysis program. You have 17 hits from MSIE, 4 hits from Opera, and 1 hit from lynx. This comes out to the following usage stats:

77% Internet Explorer
18% Opera
4% Lynx

Therefore, you conclude that only 4% of your users are using lynx.

What's wrong here?

Statistics lie. Or at least, misapplied statistics lied. That 4% figure actually represented one third (33%) of your sample. As many people in the example above use Lynx (1 person) as used MSIE (1 person) or Opera (1 person)! But as you can see, they were vastly undercounted in the user agent stats!

The situation described above is very common. This is (or was, if they've changed it recently) the default way in which apache and other major servers are shipped. This is the way my webserver is configured.

Now, you can reconfigure how your server handles log files, if you're willing to (a) write your own processing scripts for them, and (b) mess with the webserver configuration. Neither of these is particularly easy, but it's doable. The most obvious thing to do is to log the name of the file with the agent string; then you could filter out those hits that are not to HTML pages. (Note that this will still undercount browsers that don't use frames!)

So let this serve as a cautionary tale regarding statistics -- it's very easy for someone doing a "sensible" log analysis to come up with numbers that are very skewed from reality. Look again at those figures showing that twenty times as many people use IE as use Lynx, based on a sample in which the number of users for both browsers was equal!

Copyright © 1999 by Kynn Bartlett, <kynn@idyllmtn.com>; reprinted here with permission.