I have now discovered a few tips and tricks to fend off those bad bots, crawlers, harvesters, you name it – all at the application layer.
First of all you need to create yourself a simple blocklist of IP’s, one IP per line, IP’s you don’t want visiting your site. (hopefully ours will be published for public use soon)
123.567.678.9 657.387.93.2 18.104.22.168 etc
The PHP needs to be something simple like a script that checks if the users IP is in the text file and if it is, display a “No Entry Error”. Originally my script only displayed a HTML page with text along the lines of “Your IP address has been blocked from our network due to suspicious activity, you are being monitored. If you feel you should not be blocked simply email firstname.lastname@example.org”. – This did the job, just not very well, as the harvesters would pick up that email address and spam it like hell and the bots just kept coming back every hour (each time the error message was displayed it would log it on our admin areas).
So, I did some research on how to take this issue. To Google!
I came across the HTTP Error Code 403.6 (http://en.wikipedia.org/wiki/HTTP_403) which simply put is the correct error that is issued by the server (not a normal HTML page i.e. 200) and is interpreted by the browser as “Your IP has been blocked from this website”, perfect. However I have never come across HTTP error codes with decimals before so I went foruming to see how to go about this, I had a suspicion that if we hit the bots with an “official” error they would return to the site less often.
I originally set out asking questions on the webdesignerforum.co.uk with no avail, so off I went to my recently discovered StackOverflow (which is an incredible website with an amazing idea which works bloody well!), as I expected I got the answer and fast. Best thing is just to send out a 403 rather than a 403.6 as its very uncommon.
So I now changed the script to dish out a header including 403 and low-and-behold the log entries in the admin area of blocked IP’s slowly started decreasing. Once the bots got a 403 once or twice they rarely came back!
P.S. if you google 403.6 I’m the top two links (atm) 😀 My first post on WDF (http://www.webdesignerforu…) and my second on Stack Overflow (amazing website, with a really addictive badges thing) (http://stackoverflow.com/questions/569708…)