Archive for the ‘Tech-geekery: spam’ Category

In which CAPTCHAS are not so much a cure that’s worse than the disease as a disease in their own right.

August 29th, 2007 by Reinder

While I was on a Moorcock essay tip, I went to the Michael Moorcock's website to see if his short essay Epic Pooh actually did have some sort of a sequel as promised*). Multiverse.org is largely built on forum software, which is a less than ideal way to manage a website to start with, but still I was more than a bit surprised to find that I had to fill in a CAPTCHA before said software would show me search results.

You read that right. I had to prove that I wasn't a bot before I could search. What the fuck? I know from bitter experience that spambots can be a cancer on even a well-protected website and that spam can take down a server. And yes, spammers will post into any text box in any web form. But as long as you don't post search terms to anyone other than the searcher's results page, and there's no reason why you should, I don't see how bots carrying out searches are the sort of problem that can be solved by harrassing legitimate users with CAPTCHAS. Not that there is any problem for which CAPTCHAS are the solution, but this particular use of them takes the bakery.

*) Answer: Yes. "Continued" didn't look clickable but it was, and clicking it caused the next page to show. The printer-friendly version is probably more convenient to read.

Comments on the comic under threat – new measures in place

July 16th, 2007 by Reinder

If you've tried and failed to get a comment published in the Rogues of Clwyd-Rhan comic archives, let me know. The comic has been under a sustained spam attack for over a day now, with almost half of all IPs and pageviews being the node that gets served when a comment is blocked. To deal with this more efficiently, Mithandir has installed an upgrade to the comment system allowing me to quarantine comment spam. Most of the new batch of spam doesn't get blocked until it reaches the content-based filters, which are processor-intensive. With the quarantine, I should be able to see what IP addresses are used for sending the spam, and block those, which is more efficient.

However, there's always the possibility that something has been broken in the upgrade and legitimate comments get blocked. I already know the upgrade didn't go smoothly, so I'm keeping an eye out for both legitimate comments getting blocked and oodles of spam passing through.

The above does not affect the weblog, where comments will remain closed for the time being.

Comments on Waffle now moderated thanks to spammers and Movable Type’s general uselessness at spam prevention

June 12th, 2007 by Reinder

After an overnight spam attack in which hundreds of spams got posted to the blog, including many that would have been blocked if the regular expression filter actually worked, I have set the comment options in Movable Type to moderated. I was going to switch off commenting entirely, until I realised that moderation would work for the small number of real comments I get here.

There may be delays in getting your comment posted. I haven't switched on email notification for new comments because I'm not that masochistic. The ratio of real comments to spam is very low, and the last thing I want is to have hundreds of spam comments in my incoming email as well as my Movable Type backend.

This message only affects comments on the weblog, not on the webcomic, which has a superior commenting and comment filtering system written by one guy in his spare time.

I could go on forever on how bad Movable Type's spam prevention is. Where to begin? How about with cleanup? I could cook dinner in the time it takes to rebuild a hundred entries - and then let it get cold checking whether the entries have actually rebuilt. At least one batch of twenty rebuilds timed out during today's cleanup, which means that the spam posts on those may or may not have gone from the archived entries.

Why twenty? Why not do a hundred at a time? Partly because of the timeout problem, but today I'd actually have been willing to do the cleanup in batches of seventy-five or a hundred, just to get it over with. All the spams that got posted were gibberish (which I can't filter because there's no regular pattern in it) with links in BBCode (which I can filter using a regex, but as I said, the regex filter doesn't work). But another problem with MT's commenting system is some very poorly-written AJAX(-ish) programming in the backend, which causes common interface elements to behave differently from how they should. You can see that in the category selecter - unlike with regular dropdowns, you can't actually scroll to the category you need unless you keep the mouse button pushed down all the time. If you don't keep the mouse button pushed down, the dropdown will reset itself to its initial position. The same happens within the AJAX(-ish) widget that governs the display options in the commenting backend, so when, brainwashed as I am by more than a decade of using standard dropdown boxes, I thought I'd selected to display 75 rows of comments in my backend, I'd actually chosen to display twenty. So I ended up cleaning them out twenty at a time. Another example of terrible backend scripting is the checkboxes with each individual entry's backend that you can use to close the entry for comments or trackbacks. You have to click them very decisively and firmly, looking straight at them and mumbling incantations along the lines of "obey, motherfucker". And. Don't. Blink. Otherwise, they will revert to the state they were in before you clicked them. I've observed this in both Opera and Safari, by the way. It's unbelievable that something like this was allowed to pass the quality control. If you don't give the act of clicking a check box your full and undivided attention, you'll move your mouse to "Save Changes" and click that thinking you've closed the entry whereas in fact you've left it wide open. It's Movable Type's Christmas gift to spammers.

What else? Oh yeah. The Spamlookup Plugin's word and regular expression filter works only about half of the time. I don't know what causes it to fail, but fail it does. Also lose and suck.

But all this bitching about the superficial design and implementation flaws only serves to conceal Movable Type's fundamental design and implementation flaws. These aren't unique to Movable Type - I could easily write a similarly long and ranty screed about how bad, say, PHPBB is in this regard.

Movable Type and many other content management/commenting/forum posting/yadda yadda yadda systems have this fundamental design problem: There is no single interface for dealing with spam, and far too many of the tools are included as plugins. Bundled plugins, as far as SpamLookup is concerned, but still plugins.

Systems that publish user-contributed material to the web should be written from the ground up to detect and prevent spam The SpamLookup code, as well as additional code like Akismet and Bad Behaviour that users now have to hunt down and install, should be there as part of the core functionality with every installed version of the system, so that the user running the install doesn't have to think about it and spam can be dealt with as quickly and quietly as possible. Spam prevention is as important as the content creation itself, for the simple reason that spam will eventually be posted in such numbers that it will bury and defeat the content creation (see A quick reminder of why there are no comments on this blog from 2005) and, in forums, bury and defeat all other aspects of the forum (see any PHPBB forum that hasn't got a team of rabid, fascist moderators purging the member lists, blocking posts by non-members, blocking fake account creation, blocking whole IP ranges from posting messages or creating accounts, blocking, blocking, blocking).

Over time, the utility of a content creation system that lets spam in drops to zero. For that reason, it's worth it to compromise other aspects of the system, such as ease of use, to prevent spam from getting a foothold. In Movable Type (and PHPBB, and, and, and), we get poor usability anyway, especially in dealing with spam. To close old posts, we need to go to one place, or rather, several places: the posts themselves (there are, of course, plugins for that, but see the previous paragraph). To clean up spam, we need to go to another - the comments backend. To filter our messages, we need to go to yet another - the SpamLookup plugin, and if we have three different kinds of changes to make, we need to open three different boxes to make them. Then there's the general settings in which we decide how to handle comments globally, and we need to go somewhere else again.

Simplifying this isn't a trivial task, in fact now that I think of it, it's rather daunting. However, adding "Delete and close" and/or "Delete and Blacklist" buttons or checkboxes in the comments backend would shave off quite a bit of time from the daily despamming chores. And those would be easier to add if blacklists weren't governed by plugins to start with.

See also: Six Apart Picked Apart.

Well, so much for Movable Type’s nifty new spam prevention

April 12th, 2007 by Reinder

The keyword blacklisting in Movable Type has one little drawback: it doesn't work. I've added several variants of "Good Site! Thanks!" including "ood site! Thank" to the blacklist but spams containing those phrases continue to get posted. Update: I've boned up on regular expression syntax and the rules for whole-word blacklisting, and it works well now.

Worse than that, because of Movable Type's insane resource consumption, forced mass rebuilds after a spam cleanup sometimes hit Xepher's resource limits, causing them to time out and the rebuild to fail, meaning that the spams don't get deleted from the posted entries (though they do get deleted from the database). This is Not Acceptible.

Worse, the filter's performance seems to be worsening. Spams that automagically get junked still outnumber spams that don't, but not nearly by as much as they did a month ago. I've got bad experiences with learning filters (Opera's, for instance, tends to learn it wrong even though I'm pretty damned dilligent about catching any spam the filters don't, and marking it as such before deleting it); I don't know which part of the setup is failing to learn about spam, but one of them is. Maybe it's not updating its blackhole list.

This weekend, I'm going to beef up the anti-spam defenses, installing Akismet and everything else that I can find that might block it. Until then, don't be surprised if you suddenly find comments closed across the blog. I'm leaving them open on this one in case someone wants to suggest a neat anti-spam trick or plugin, though.

BTW Trackbacks have already been shut off again, probably for good this time. I've switched off sending trackbacks as well, except possibly to the aggregators that Movable Type auto-pings.

Why Web BBSes suck

March 9th, 2007 by Reinder

First, a question I've been meaning to ask: does anyone reading this know of a web bbs that
1) runs on PHPBB; and
2) has some version of Bad Behavior, such as this mod as its only defense against spam? In other words, no CAPCHAs, no other mods or plugins aimed at preventing the board from being overrun with spam?

If so, I very much want to hear from it. Bad Behaviour has done really well at stopping the endless flood of spam on Talk About Comics that I've been wondering if the time has come to stop making new members jump through hoops to get activated, or even open the forum to guest posters again. You know, make it a more inviting place. I'm not the guy who gets to decide this, by the way, but if there's evidence that Bad Behavior can do the job on its own, I can put in a word. Let me know in email or comments under this post.

I was prompted to bring this up by reading Matt Skala's recent post Why Web BBSes Suck. It's a great post that really opened my eyes to the extent to which I was taking bad functionality for granted for no other reason than that they've always been designed that way. I could quibble about some things, but I think the general thrust of his argument, that Web BBSes have terrible usability and don't serve the needs of their users well, is correct.

There is good news on some issues. Project Wonderful Talk, whose CAPTCHA I've finally been able to defeat, allows the use of Livejournal accounts for identification, which I hope many more boards will adopt (as well as other, similar, multi-site identification methods); PHPBB isn't as ubiquitous as it was a year ago even if it's still very dominant, and BBcode is more standardised than Matt claims it is. I also think the dominance of PHPBB could end very quickly if something truly better came along. Five years ago, when Ultimate Bulletin Board was as ubiquitous as PHPBB is now, it was quickly superceded by PHPBB because PHPBB was less crash-prone and easier to set up. The spambots have since made PHPBB at least as big a nightmare to work with as UBB was then.

So what I'd like to see is a project in which skilled designers and coders who have read Matt's rant build a new Web BBS from the ground up so it has the features the users actually need instead of the ones that Ultimate Bulletin Board happened to have in 1998 and which all other Web BBS systems have copied. And integrated spambot protection that actually works. Those two ingredients together would, I think, make most forum owners drop PHPBB like a hot potato.

Joey Manley could use some help keeping the TAC forum spam-free

September 2nd, 2006 by Reinder

The Talk About Comics forums are once again being overwhelmed on a regular basis, by spambots hiding behind Telefonica's lack of real anti-spam policies. Telefonica de España does have an Acceptable Use Policy but to my knowledge, its enforcement is still a joke.
What Joey wants to know:

There's a flood of fake phpBB user sessions, coming from numerous different IP addresses, crashing the whole server every few hours.

Probably spambots.

Fellow admins: any thoughts on solving this?

Note that I tried my best to install bad behavior, but its header-pushing ways conflicted with sessions.php and page_header.php no matter what I tried.

A large number of the spambots seem to have IP addresses that resolved to:

red.telefonica-wholesale.net

I know that Reinder has banned an entire ISP or two before, but I don't know how to do this. Any help?

So if anyone can help him make Bad Behavior work on PHPBB and/or keep the varmints out through PHPBB's regular banning system, please drop him a line.

And I could use some fact-checking: Am I right in supposing that Telefonica de Espana are still as bad as ever when it comes to dealing with spam, or have they cleaned up their act in the past few years? I'll be doing my own research, but if you have ready knowledge, please contact me.

Bad Behavior

July 19th, 2006 by Reinder

Via Branko, I hear of Bad Behavior, a

fingerprinting method for HTTP requests, [which] has proven, as one user called it, "shockingly effective" at identifying and blocking malicious activity, including blog/wiki spam, e-mail address harvesting, automated cracking attempts, and more. It does all of this looking only at the HTTP request headers; for POST data, the content of the spam is not analyzed at all.

If you have a WordPress blog, you probably need this, but it is designed to be easily integrateable into other PHP-based content management systems. If I read the documentation correctly, I could install it now and have it do basic spam-blocking work in Willow, but I prefer to wait until Mithandir has given me his opinion and maybe done whatever tweaks are necessary to make all the functionality cooperate with Willow.
(Mithandir's own motivation for doing this is probably a bit low right now, though - he's vacationing in Norway, and he reports that the amount of spam on his own website has dropped spontaneously over the past few days. So I may change my mind and muck about with the plugin myself. I should be able to stick in an "Include Once" call....)

Looks like the gmail honeymoon is over

December 24th, 2005 by Reinder

I'm now getting oodles of spam to my gmail address. Most of it's in Chinese, and the amount that passes gmail's spam filters is greater right now than the amount that gets caught in them.

I hope this is temporary and that they upgrade their spam filtering in the next few days - or maybe someone should disconnect China from the Internet for a couple more years until it has a government that enables dissent and stifles organised crime instead of the other way around. If not, I may be forced to switch email addresses again. I need to have an email address that I can publish on the internet without subterfuge, and without being swamped with crap. Maybe despammed.com is working properly again?

Ooh, this is a good one

August 7th, 2005 by Reinder

You all know the fake paypal spams that arrive in your email boxes. They usually have alarming messages about your account being screened or suspended because of an "incident". Just now I got a version that's a little more devious:

(more…)

Spam prevention link clearinghouse

February 3rd, 2005 by Reinder

Via Pete Ashton, I found an article on server-level solutions to the comment/trackback spam problem, which everyone who has a weblog that is open to comments or trackback should read themselves or forward to their web server admin.

(more…)