Spamvolution
Sterling Camden
Comment spammers are getting even more shrewd:

If either of these comments had made it past Akismet, I would have thought they were completely legit. In case you can’t make out the image above, the comments contain no links except for the URL of the commenter, and they read:
Problem with RSS Feed in WordPress.
I have a subdomain that I installed wordpress for another blog site, but the subdomain site’s rss feed points to my parent site.
Can anyone come up with any suggestions?
How did Akismet know that this was spam? I’m guessing it’s because they’ve seen a number of these same messages already marked as spam by other users.
So now we know what the primary use will be for an AI that passes the Turing Test: generating comment spam containing germane insights within each dialog. At that point, will it still be spam?
Posted in Get Outta Here |
22 Comments » RSS 2.0 | Sphere it!





You must realize, of course, that this is just further evidence that legislation is exactly the wrong way to go for trying to deal with spam. Right? You’d just be standing in the way of scientific progress.
Just as the online porn industry and filetrading have driven innovation in the realm of digital content delivery, so too will spamming efforts probably be a major force in driving development of artificial intelligence. I guess the new market for a Lisper’s skills is in advertisement, working on AI code to generate spam that doesn’t look like spam and can actually contribute to discussions.
At which point humans begin to have trouble keeping up in the conversation, and Google loses all interest in links.
Right — Even though I hate spam and am a CAUCE member, I still do not favor anti-spam legislation. That would only make things worse.
That’s a natural part of the innovation process — a part that is similarly obstructed when legislation starts getting involved in “fixing” things. As innovation increases the effectiveness of targeted advertising and pagerank advancement, systems like Google’s innovate to counter the gaming-the-system factor. We get a better Google, and we get advancement in the state of the art in AI development for marketing.
By the way, membership in CAUCE supports anti-spam legislation efforts. You might want to rethink your membership if you really oppose the legislation.
I got also these comments to the same URL shown in your image. They also passed Akismet here. I checked that page out and find out that it is a link farm to several other pages. Seems to be a PR-pushing page for his customers?
[...] Spamvolution — Chip’s Quips (tags: spam email malware) [...]
apotheon: you’re right. I took a look at their web site (the first time in years) and legislation is their main focus. I just sent them a membership revocation request.
Quix0r: it’s all about Google-juice, but it’s not a link farm. It turns out that the page contains a bunch of on-line games. By putting their URL on a lot of blogs, they raise their search rank. So when somebody searches for “Tetris”, they hope to be on the first page. Looks like they make money, oddly enough, via Adsense.
I wonder which will achieve consciousness first, spammers or anti-spam software?
Hey, a visit from Mr. AutoMATTic, himself! Thanks for stopping by and commenting.
Good question. I suppose it depends on which one needs it most to survive in their respective markets. It seems to me that the ability to successfully lie requires more intelligence and ‘theory of mind’ than the ability to identify a lie. But I could be wrong.
Of course, you could answer that one for us by jumping ahead of the game, Matt.
Sterling said:
I always find it disappointing when an organization that is nominally in agreement with me on very important issues is, in practice, actually more a part of the problem than part of the solution. I would dearly love for organizations like CAUCE and the ACLU to actually be on the right side of not just some, or even most, but basically all issues so that I could feel justified in supporting them.
Matt said:
Surprisingly, that hadn’t crossed my mind. It really should have.
There aren’t many people actually working on spam filtering in a way that might lead to intelligent software — and part of the problem is that there aren’t many people actually working on spam filtering intelligently. The most popular spam filtering “solutions” are blacklist-driven, for instance, which is just a good way to end up making it impossible for anyone on a major ISP and/or a Windows computer entirely incapable of engaging in email correspondence with anyone else. Twice now, I’ve had to go to absurd lengths to get myself unblacklisted because I happened to be emailing from an IP address that was part of a block that had a couple of spam zombie systems.
The single most important task of spam filtering is, and should be considered, avoiding false positives. It seems to me that the best way to do it is to ensure that your spam filtering software doesn’t filter something out unless there is really no way it could possibly be anything but spam. Even heuristic systems, which run the best chances of achieving intelligence eventually, don’t tend to be designed in this matter, which I think could be a real problem. If we’re going to develop AI, we should aim toward developing ethical AI, and it’s difficult to do that without instilling in them such concepts as the notion that a hundred guilty men should go free before a single innocent man is wrongly convicted and sentenced for a crime he didn’t commit.
In fact, along the lines of that analogy, I really don’t think it’s going to be very long before people lose sight of the positive purpose of spam filtering (protecting users from ill effects of spam) the same way the criminal justice system has long since lost sight of the positive purpose of criminal justice (protecting the innocent from the ill effects of crime). Instead, spam filtering software will (I think) increasingly become a means of “punishing” spammers, just as the criminal justice system has been twisted toward ends of “punishing” criminals. The result of both is that we, the people these systems are supposed to protect, will actually be harmed by the system without effectively controlling the spammers/criminals.
On the other hand, Paul Graham is (according to his own words) working on developing a Lisp dialect and, with it, a heuristic spam filtering system, in parallel. His statements on his philosophy of design of the spam filtering system make it clear that avoiding false positives is a primary concern for him, which is a statement that warms the cockles of my hackish little heart. Combined with the fact that a new, cleaner Lisp dialect that doesn’t suffer some of the shortfalls of Scheme is the language he’s using (assuming Arc ever leaves vaporware status) makes me think we might be on our way toward developing a spam filtering AI I’d like to meet after all, since Lisp is often the language of choice of AI researchers anyway (for obvious reasons).
Sterling said:
If it’s done right, I think spam filtering has a greater need of AI. Not only would it need the ability to detect a lie, but it would also need the ability to adjust to lie detection to increase efficiency (since spammers will mostly appear to advance by virtue of the fact that many, many spammers will get “smarter” by a little bit, but will seem smarter in the aggregate as an emergent property of spam as a whole) and, in addition to that, will need to be able to make value judgments (to help ensure that false positives aren’t a problem). After all, spammers only need to mimic stuff that gets through the filter. Anti-spam systems need to make fine distinctions between legitimate content and the mimics. In fact, spammers don’t need their software to lie at all — just to work marketing-effective content into a message that otherwise looks just like everything else in the target venue. Only if spammers actually put effort into allowing their software to converse interactively will they be on the right track toward AI.
Most interesting, apotheon. I read Paul Graham’s essay, and his approach seems to involve a simple Bayesian filter based on word probabilities — including header content and markup, which automagically builds black/white lists as well. Do you know of anything more recent (since 2002) that he’s written on this?
Akismet works very well for me, but one simple additional feature could eliminate the few false positives I’ve experienced: a white list. Let me specify commenters whose posts should never be considered spam. That would be a nice feature for WordPress moderation also.
Not off the top of my head. I was going entirely from (admittedly fuzzy) memory, and remembered him going on about false positives, et cetera. Hopefully I’m not misremembering.
Actually . . . WordPress moderation does allow that. From the dashboard, go to Options > Discussion, and click the “Comment author must have a previously approved comment” checkbox. That causes all comments to go into moderation until approved, unless that person’s post was previously approved — thus giving you a “whitelist”.
Oops, darnit, I meant to put your quotes in blockquotes, but accidentally used markdown syntax. Um. Yeah.
There, I fixed it up for you.
But won’t that option just prevent new commenters from showing up right away? Are you certain that it thereafter never filters a comment from someone who had been previously approved?
Well . . . if they have a user account on the site, yeah. That’s why you’re able to post comments without going into moderation at SOB.
Obviously, it’s not so easy to whitelist anonymous posters. I don’t think anyone who has registered at SOB and posted a legitimate comment more than once has gone into moderation after the first time, though.
I don’t allow anonymous posters (they have to at least provide an e-mail address, not that it can’t be bogus) but I hate requiring people to register to leave comments. Your site is the only one at which I have registered for that purpose, and at other sites that require registration I just bailed. But if I could just tell Akismet which commenters I knew to be OK, that would complete my spam strategy, at least for now.
I’m not requiring registration to post any longer, as I’m sure you’re aware. It was a temporary condition of SOB, ultimately.
I’m pretty sure that email addresses can get whitelisted, too, but I’m not sure. People tend to register at SOB if they post more than once, so it’s difficult to judge whether it works for email addresses as well as registered users. I’ll try to pay more attention to that sort of thing in the future.
If I go over to your site and leave a comment that just reads “phentermine” with a link, then it won’t go into moderation?
Nope — it should just post it directly. Of course, that hasn’t been tested, but I don’t see why it would.
I’m not actually using any filters for moderation except the “every post goes into moderation unless the poster has been whitelisted” filter. With that system, it doesn’t really matter what terms someone is using — people only get approved if they post something useful, then they don’t have to be approved again.
[...] Randy’s State of the Splogosphere, Part IV: “Comment spam is completely out of control”. Agreed. It took more than five months for Akismet to catch my first 10,000 spam comments, but now it has captured another 10,000 in just over a month. Comment spammers can be a lot more sneaky, because unlike email spam, they don’t care if you click on the links. The links are just designed to acquire Google juice. [...]
That’s pretty simple. Could you handle several hundred in moderation per day, though?
I’d have to either way, since with a spam filter I’d have to check everything by eye to see if it’s really spam. Either way, I’m basically just doing a visual check of stuff filtered out of automatic public posting.