Auto Ban Rouge Webrobots

txt3rob

Verified User
Joined
Jan 16, 2007
Messages
101
Hi,

I've made my first public script that might be handy to a few of you to prevent rouge robots from stealing your bandwith.

Can people please give me a review of the script its basic at its best but its not bad for some one who's only just learnt php and a few other tricks!

http://www.xcellweb.co.uk/
http://www.xcellweb.co.uk/ban.phps

it works by any robot that does not follow robots.txt and goes on to a hidden link it when it goes to this hidden link it will automatically cause the script to ban that robots i.p address.

all feedback welcome :)
 
A perma ban seems kind of harsh, what if you banned googlebot and made it so they never indexed you anymore? Maybe set up the script so it writes to a file or db the time they get banned and then have another script on a cronjob to go through and delete all bans that were made by that which are over 30min old.

Also, to avoid having to make exec active and using sudo issues, you can write the IP of the bot in question to db or file and just add an include at the top of all your pages to a script that, if the IP is located in db or file, redirects the person to a "You have been temporarily banned." page instead of just banning so it makes it look like you no longer exist.

Just a few suggestions :)
 
google bot should in theroy read robots.txt so it should never ban google.

as long as the web crawler pays attention to robots.txt all should be good im sure google and yahoo both adhear to this.

reason i've decided a perm ban is due to a user that was on my site went thru 1.5gig of bandwith stealing my content using HTTrack.

the suggestions you have currently made are already in a script i could upload if you wish to use.

thank you for your feedback tho :)
 
Last edited:
Hello
If Google index a page, then you disable that page in the robots.txt
Google does not check the robots.txt everytime before he reads a page i think ...
 
Back
Top