Spam

Started by Nick on 3-Mar-2011/21:56:24-8:00
Let's see if this works to stop the recent spam attack. If anyone has any other ideas for algorithms to try, go ahead and post please :)
I hope it doesn't keep people from posting...
If you are referring to a change in the captcha, I doubt it will work for long, as you can still easily parse the word out to type it. If you can parse it in REBOL, the spammer can too. Some other ideas: 1. For thread starters, if the title and the author name is the same, it's likely spam, as it was in this case. 2. Keep the last 1000 posts in memory and compare them to the incoming post. This automatically removes accidental double posts and repeat spammers. There were many double posts in this case. 3. Some form of throttling of thread starts: Only 1 new thread per 20 seconds or similar. Spambots usually post in bursts and they like to post threads to fill up the front page. That was also the case here. Each method is only a few lines of code. :-)
4. Another throttle method could be to measure the time between, showing a form and receiving a good captcha, i.e. when the form is submitted correctly. If that time is less than 5 seconds, the poster is likely not human. But that is host dependent, so it requires a bit more to monitor.
Thanks Henrik!
Ehm, the CAPTCHA I have to type for this is: Type the reverse of this captcha text: "d"
What about using a math expression like this: (ignore the order of operations <- give this info) 3 + 5 * 2 (16 is the correct answer) Keep the + and * symbols and change the numbers. Use simple numbers like 2 ... 9 If the answer is incorrect highlight the "ignore order" message. 2 + 2 * 5 3 + 5 * 8 etc. it is easier than to type in revers order some long words.
Just make users register .. and then you only have to check the once. Making people do the Leonardo thing is ... taxing.
I read a very appropriate post on Smashing Magazine yesterday: http://www.smashingmagazine.com/2011/03/04/in-search-of-the-perfect-captcha/ Generally, I believe if somebody is posting more than one reply per minute, they're most probably spamming and should be suspended for 15 minutes. Also, type the reverse of this captcha text: "= ="
I have a belief that you can put as much work as you want into a captcha, but it will be broken anyway by increasingly complex AI engines. Even user registration is a complex form of a captcha, which can also be broken. Best is to work server-side on post throttling and extra checks on what is posted.
I agree with Henrik, one of forums I read got full of spams at the same day this forum got spammed. And there is also user registration on that forum. There are somehow hundreds of fake users. Topic names and the posts was different. There should be more than one check, posts from same IP in a short period, content of post, captcha, registration.
nick, you may need a combination of checks, I'm just throwing in a few ideas :- 1) user registration and login 1(a) disallow multiple registrations from the same ip or range of ips (using the subnet mask) 2) intelligent quiz - whether mathematical or general question 3) detect ip and disallow greater than x number of posts in a day or even blacklist some ips. 4) user rating - trusted vs non trusted userids, people who have posted for a long time have higher rating and new people who nobody knows about have a lower rating.
Thanks for all the great info and ideas. I implemented just a couple, and it seems to be working so far. I'll add more as needed.
I wasn't saying that registration is foolproof .. but it relieves the burden of having to type in a captcha each post. Also my test site never received any spam :) http://rebol.thruhere.net:8000/registration.rsp But it could be no one knows it's there!
I also use the typepad anti-spam api .. it validates the message against their database. http://antispam.typepad.com/info/get-api-key.html
Nice work Graham, and thanks for the typepad link, I was searching for such free service. I intended to use Akismet first, but it's not free for my use-case.
I've been tempted to use a service, but really would prefer to keep this script as simple and self contained as possible.

Reply