Spam Statistics
What is this stuff?
I get an awful lot of spam, and most people don't seem to realize exactly
how much of it there is. Until September 2002, I was like most people: I
angrily deleted any piece of spam I got. Now, I am just as infuriated, but
I store every single piece. I was on a mission: to figure out the exact
quantity of the crap being thrown at me, and to identify the trends.
A month and a half after beginning my spam collection, I wrote a bunch of
tools to gather statistics on the spam, almost certainly for my own
amusement (since I like to hack PostScript code; all graphs below are custom
PostScript), but also to find out a few facts on what I was getting. A
funny thing is that it's easier to view this stuff with a web interface,
and because I had already gone to all that work, I decided to make this
public.
There are all sorts of spam analysis pages and gizmos out there, mostly
because people want to prevent spam. But this page isn't about that. I'm no
expert on that kind of stuff.
hide explanations
get software
Raw numbers
Total | Average | Maximum | Minimum | Autodetect |
131139 | 132.06 | 345 | 3 | 99785 |
(993 days) | 2005.05.10 | 2005.06.03 | (76.09%) |
This table shows some general numbers on the amount of spam I get. The
Total is the total number of spam messages in my collection, and the
Average number of messages per day is next to it. Underneath is the
number of days in the sample.
Maximum and Minimum are for daily spam count figures, and the
dates appear below the counts.
Autodetect shows the number of messages that my spam filter caught;
the percentage is below. This number is not to be taken too seriously;
it could be better, for one thing, but this page is not about automatic
detection of spam or anything particularly geared to that purpose-it's
about what spam looks like.
A simple graph of the daily number of spam messages I get appears above.
The plot is of the average of the daily number of spam messages over the
days leading up the date on the x-axis, including that day. The
number of messages is the y-axis. Using the average smooths out
irregularities.
Weekday Average Counts
Mon | Tue | Wed | Thu | Fri | Sat | Sun |
138.32 | 141.50 | 139.34 | 140.98 | 129.24 | 115.68 | 119.44 |
The average number of spam messages for each day of the week in the
collection. I could perhaps plot this one day.
Spam Message Size Analysis
Median | Maximum | Minimum |
3913 | 8627473 | 511 |
This section of numbers is based on the sizes of the spam messages in the
collection (at the moment, this includes the header). This information is
mildly useful in determining just how much of our bandwidth and storage space
these very inconsiderate spammers consume.
The table above includes the Median, Maximum, and Minimum
sizes.
The graph below shows the distribution of the message sizes. The y-axis is
the number of messages; each bar represents a range of sizes. Note that
the ranges are larger on right edge of the chart; this causes
slight inflation for those ranges.
(All times Central time)
Do spammers ever sleep? This graph attempts to answer that question. My
spam analyzer categorizes the entire collection by the hour in which the
spam arrived. The y-axis is the number of messages falling into the hour
slot.
Keep in mind that spam comes from all over the world; this
levels stuff out a little.
Account/Mail Server Distribution
Server | Count | Pct | Comment |
#1 | 94618 | 72.15% | Kernel-HOWTO (recent) |
#2 | 6066 | 4.63% | old, not widely published |
#3 | 30455 | 23.22% | Kernel-HOWTO (old) |
One of the reasons I get so much spam is that I have a number of email
addresses. This table breaks up the spam collection by address. I've
decided not to show the addresses, but I have provided some comments.
Brian Ward -