Tangled up in Spam

A very interesting, thorough and clueful discourse on spam from James Gleick in today's New York Times (via Slashdot). It's an in-depth examination of spam, its history and its mechanisms, and it's well worth reading. Although I'd recommend you read it all, and not skip to the conclusion, the conclusion is pretty damn good:

[...] two simple measures might be enough to stem the tide:

Forging Internet headers should be made illegal. The system depends on accurate information about senders and servers and relays; no one needs a right to falsify this information.
Unsolicited bulk mail should carry a mandatory tag. That alone would put consumers back in control; all the complex technological challenge of identifying the spam would vanish.
We need to be able to say no. No, I'm not looking for a good time. No, I don't want to ''e-mail millions of PayPal members.'' No, I don't want an anatomy-enlargement kit. No, I don't want my share of the Nigerian $25 million. I just want my in-box. It belongs to me, and I want it back.

One thing of note: it mentions SpamSieve, which is a Bayesian filtering tool that works for pretty much any Mac OS X email client. SpamSieve works pretty much like an expert system, in the following way:

First you find a whole bunch of spam, and you tell it that these messages are spam
Then you find a bunch of legitimate emails, and you tell it that these are OK
Whenever you receive email, it analyses it, and if it thinks it's spam it marks it as such. You tell it whenever it gets anything wrong, and in time it gets better.

All this works very well, at least up to a point. At one point I was getting 97.2% accuracy, as SpamSieve was increasingly better trained to handle my email. Then I subscribed to a new mailing list - and it marked nearly all of the messages as spam. I had to tell it that all these messages were in fact legit - and the accuracy rate plummeted to 96%. (It has since recovered to 96.3%.)

Spammers are now, according to James Gleick, misspelling words like penis or viagra to confuse such programs. I haven't seen any of these misspellings yet - according to SpamSieve - but I don't doubt they'll come. And this is the weakness of such spam-filtering software: they base their filtering entirely on knowledge of what was bad in the past. They're fighting the last war.

Happily, trainable expert systems like SpamSieve will be able to adapt more easily than rule-based systems. But still, this is something to bear in mind: there is an arms race between spammers and anti-spammers, and there is no guarantee that the anti-spammers will win.

3 Comments

Helen | February 11, 2003 12:57 PM | Reply

Why do people bother with spam? Why? Why? It's not as if anyone, anywhere, is ever going to give spammers money for their products. I just don't understand it...

sam | February 11, 2003 2:34 PM | Reply

The costs of sending spam are infitesimal, so even if you get a response rate of 1 in a million, you're still going to make money. See for instance this Washington Times story about the effectiveness of Nigerian spams.

Helen | February 11, 2003 4:38 PM | Reply

Gah, people who fall for spam are too stupid to own computers.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

James Gleick writes about spammers in the New York Times, and it's a good read.

No TrackBacks

3 Comments

Leave a comment

Search

About this Entry

Categories

Monthly Archives

Pages