Main Page Content
A Very Simple And Effective Captcha
I have a very simple anti bot-spam technique that works extremely well. It requires no javascript, no cookies, no hidden fields, no complicated server weirdness, has negligible overhead, is fully accessible, has a high usability factor and is trivial to implement. Sound good? And it's dead simple too, so read on.
In order for spam to work, it has to be cost effective. In order for it to be cost effective, it has to be automated. In order for something to be automated, it requires a predictable pattern. The fundamental approach to this technique is to to deny the bot a usable pattern, yet make it easy for even the most inexperienced user to use. Since the spam bots use all of our form and predictable choices in form naming conventions (the pattern), we simply add a key that only a human can turn. We require that the user fill out one field with a number that is displayed in plain text. This makes it effective against automated solutions. That's fundamental captcha. Note: It will not, however, prevent humans from manually submitting spam to your form.
In the three years that I've used this, I've gotten a 100% anti spam-bot effectiveness rate over three years and some 20 odd sites. Sites that were being bombarded with hundreds of spams daily suddenly became quiet and good emails get though. I'm a Coldfusion coder, so my example is Coldfusion, but the technique is cross language. I hope it works as well for you.
Outline: The whole technique in a nutshell.
- Generate a random number
- Set it to a session variable
- Display the random number (session.variable) next to a field in the form
- Get the user to copy it over into the field.
- On submission, verify that the form.value equals the session.random number
- On pass you allow the submission
- On fail you exit, abort, show a message, do as you wish.
Sample code: The very bare bones version.
// Delete previous random number variables if they exist (no re-use of a cached number)
if(isDefined("session.chk_rand")) { StructDelete(session, "chk_rand"); } // Assign a new random number session.chk_rand = NumberFormat(RandRange(0, 9999),'0000'); #session.chk_rand# enter this number here ->
...Some human readable validation message (or other validation function) "Sorry, you need to fill in the following fields..."
StructDelete(session, "chk_rand");
Explanation
If the form.value is equal to the session.random number, then it passes and you process normally, if not, the session.value with the random number is deleted (and thus the next attempt gives you a new random number), then the template is exited. That's it. All of it.
Most bots (any?) won't be able to recognize the system because the random number is plain old page text. The user has only a very simple task to perform, and the number changes every time the form is accessed, so even if the spammer submits manually, it's labour intensive to keep re-entering a random number. It would be even more so when combined with techniques such as permitting only one submission every 30 seconds. Those with javascript enabled will never see the server side response, and those without it (such as Braille readers) will still get a meaningful message.
Variations and additions
One could, if one were so inclined, add logging, honey pot links or emails, or use some sort of function to count how many submissions were made by the IP in a small time period to figure out if you want to ban them for 24 hours or not. Additions to this technique are limited only by your creativity.
When the spammers catch up with this notion, the simple response such as randomising the "put this number here" message, or location of the message (before, after, above or below the field, in a randomly selected P, SPAN or DIV, radio button or select menu) or adding alpha characters across many websites will create too much randomness to define a pattern. If you wanted to get fancy about it, you could even randomise the name of the field as a session.variable. The increase in complexity makes finding a pattern a lot of work, and therefore reduces cost effectiveness for a spammer to come up with a parser that can handle all the variations. The key is to get the human to do the thing that humans do easily and naturally, but that evades the predictability that a bot programmer needs, and patterns that s/he simply cannot foresee.
Bonus trick
I sometimes make a page with a submitable form redirect to the index page of the site if the form cannot REFind() my full domain name (or a specific list of pages) in the referrer. The user can still access the form, but they must follow a link from the site. This technique makes that particular document slightly less indexable, but improves the overall security. The tradeoff is up to you to determine.
Conclusion
You can get as sophisticated or remain as simple about these approaches as you wish. This is basically a low-tech, high effectiveness version of captcha that is as simple a concept as it gets, and that works well. I hope that the simplicity of this technique and it's variations will allow many of us to implement it across the net and if we are lucky, may have the same kind of impact that Bayesian filtering had on spam and in so doing bring one of Evolt's guidelines to the net: Keep the signal high.
Let me know how it works for you, or additional variations that you might come up with.