Curiosity is bliss    Archive    Feed    About    Search

Julien Couvreur's programming blog and more

Comment spam


I got hit by large waves of comment spam in the last couple weeks. I apologize for any reader that was offended. The spamming got worse after I added the "last comments" section to the front page.

At first I just activated the "Email new comments" MovableType feature and was deleting the spam manually. That quickly became painful.
I then tried the solution of having a "delete comment" link in the email, but for some reason that link wouldn't display correctly in Outlook.

MT blacklist
I'm not sure why I didn't use Jay Allen's MT-blacklist plugin earlier, even though I knew about it. It turned out to be super easy to install. It's been tremendously helpful to me so far.
It also includes a link in the email notification for new comments, but that one works fine in Outlook.

One problem is that I've had to blacklist some very short keywords like "mom", "son" and "rape" which started triggering false positives (for example, a comment with "scraped" got flagged). Having a way to exempt a comment from the filtering would be good.

What's next?
Sifry also wonders about the blog comment spam solutions and the coming arms race.

I think most of distributed moderation systems will come down to two parts: some form of identification and some form of reputation.

The identification could use your home url, a PGP key, a TypeKey or Passport account...

The reputation could be handled centrally by technorati, in the case that Sifry discusses (where technorati binds the comment posts to the original post).
But you could also have a binary reputation using LOAF and blogrolls: you publish a size efficient list of IDs (whatever they are) that you specifically trust and a list of IDs you specifically don't, and you combine these with the lists published by the other blog owners you trust (your blogroll).
The remaining problem is how to handle comments/commenters that don't fall in either category, because they are unknown from the system. These probably still need to be manually approved or at least continue to be filtered.

Update: Get the latest MT-blacklist version, that fixes a flaw regarding escaped characters.


Comment spammers have not found me. I guess that my pages do not rank high enough on Google...

Posted by: David Kaspar (June 7, 2004 04:59 PM)
comments powered by Disqus