I get an awful lot of spam, and most people don't seem to realize exactly how much of it there is. Until September 2002, I was like most people: I angrily deleted any piece of spam I got. Now, I am just as infuriated, but I store every single piece. I was on a mission: to figure out the exact quantity of the crap being thrown at me, and to identify the trends.

A month and a half after beginning my spam collection, I wrote a bunch of tools to gather statistics on the spam, almost certainly for my own amusement (since I like to hack PostScript code; all graphs below are custom PostScript), but also to find out a few facts on what I was getting. A funny thing is that it's easier to view this stuff with a web interface, and because I had already gone to all that work, I decided to make this public.

There are all sorts of spam analysis pages and gizmos out there, mostly because people want to prevent spam. But this page isn't about that. I'm no expert on that kind of stuff.

Raw numbers

This table shows some general numbers on the amount of spam I get. The Total is the total number of spam messages in my collection, and the Average number of messages per day is next to it. Underneath is the number of days in the sample.

Maximum and Minimum are for daily spam count figures, and the dates appear below the counts.

Autodetect shows the number of messages that my spam filter caught; the percentage is below. This number is not to be taken too seriously; it could be better, for one thing, but this page is not about automatic detection of spam or anything particularly geared to that purpose-it's about what spam looks like.

A simple graph of the daily number of spam messages I get appears above. The plot is of the average of the daily number of spam messages over the days leading up the date on the x-axis, including that day. The number of messages is the y-axis. Using the average smooths out irregularities.

Weekday Average Counts

The average number of spam messages for each day of the week in the collection. I could perhaps plot this one day.

Spam Message Size Analysis


This section of numbers is based on the sizes of the spam messages in the collection (at the moment, this includes the header). This information is mildly useful in determining just how much of our bandwidth and storage space these very inconsiderate spammers consume.

The table above includes the Median, Maximum, and Minimum sizes.

The graph below shows the distribution of the message sizes. The y-axis is the number of messages; each bar represents a range of sizes. Note that the ranges are larger on right edge of the chart; this causes slight inflation for those ranges.

(All times Central time)

Do spammers ever sleep? This graph attempts to answer that question. My spam analyzer categorizes the entire collection by the hour in which the spam arrived. The y-axis is the number of messages falling into the hour slot.

Keep in mind that spam comes from all over the world; this levels stuff out a little.

Account/Mail Server Distribution

One of the reasons I get so much spam is that I have a number of email addresses. This table breaks up the spam collection by address. I've decided not to show the addresses, but I have provided some comments.

