Spam plagiarism

5 Oct 2005

Like a lot of people, I use a bayesian spam filter, trained to sift the crap from the mail I'm interested in. It's pretty good; I can't remember the last time I got a false positive, and most spam goes straight into the junk folder. One or two still get through, though. Appending random poetry to the end of the message worked for a while, but training makes that less effective. However, today I received a message with the following tacked on the end:

Watching anything from a Michael Moore documentary to a CBC investigative report, I know I like reality. So why do I hate (and I mean that with the full intensity intended by someone who rarely uses that word) reality television shows? Because reality TV isn't insightful commentary. Voyeuristic melodrama that is anything but real has no chance of being more than annoying and boring. I used to think blogs were to e-zines what reality television shows were to dramas. Now, I think the comparison would be more effective if blogs were perceived more like independent film. And reality television scheduled in between soap operas.

My first thought was that random poetry had become more sophisticated, and vaguely topical. However, as I read it I became less and less convinced. Turns out that this was in fact written by a human; a quick Google search reveals the original post on someone's Blogspot blog. This may have been going on for a while, but it's the first time I've seen it. I think it's an interesting development; the blogoshpere is providing spammers with a near-infinite supply of chatty, lucid (more or less) prose that's far more realistic than machine-generated text could realistically hope to be. I'm not sure how effective bayesian filtering will be against such techniques, but my guess is that it more or less neutralises the positive score as an indicator, meaning that the negative score is more important. I'm curious to see what happens.

This site is maintained by me, Rob Hague. The opinions here are my own, and not those of my employer or anyone else. You can mail me at, and I'm robhague on Twitter. The site has a full-text RSS feed if you're so inclined.

Body text is set in Georgia or the nearest equivalent. Headings and other non-body text is set in Cooper Hewitt Light. The latter is © 2014 Cooper Hewitt Smithsonian Design Museum, and used under the SIL Open Font License.

All content © Rob Hague 2002-2015, except where otherwise noted.