Htaccess ip block not working

Richard G

Verified User
Joined
Jul 6, 2008
Messages
13,870
Location
Maastricht
[solved] Htaccess ip block not working

I was trying to block a spider from searching a certain customer domain, because it was not listening to customers robots.txt.

Now this is what is happening:
Code:
46.229.168.73 - - [23/Dec/2016:12:11:27 +0100] "GET /forum/viewforum.php?f=116&sid=298276cb5b15e9836b12577db2c9f5f6 HTTP/1.1" 200 9086 "-" "Mozilla/5.0 (compatible; SemrushBot/1.1~bl; +http://www.semrush.com/bot.html)"
But it's using multiple ip's in this range.

So I tried blocking with 46.229.168 and 46.229.168.0/24 and now 46.229.0.0/20 but the bot still keeps coming.

This is my customers .htaccess file:
Code:
<Files *>
order deny,allow
deny from 51.254.0.0/15
deny from 62.0.0.0/16
deny from 149.202.0.0/16
deny from 136.243.0.0/16
deny from 46.246.0.0/17
deny from 23.20.0.0/14
deny from 5.9.0.0/16
deny from 149.202.157.216
deny from 149.202.157.218
deny from 46.229.160.0/20
allow from all
</Files>

RewriteEngine On
RewriteCond %{HTTP_HOST} ^(www.)?user-domain.nl$
RewriteRule ^(/)?$ forum [L]

I also tried without the "files" directive, but that bot keeps appearing in the logs so it can keep on visiting. How is this possible?
Am I doing something wrong?
 
Last edited:
Overruling htaccess in sub ( domain/ directory) ?

Wrong htaccess ( private/public)

Both you can check with your own ip ofcourse. ( then you should see if the htaccess rule is working there)
 
hi have you just tried banning bot? IT MAY WORK FOR YOU.

PHP:
RewriteEngine On 
RewriteCond %{HTTP_USER_AGENT} ^semrush
RewriteRule ^.* - [F,L]

Bots that are listed above will all receive a 403 Forbidden error when trying to view your site. more info
 
hi have you just tried banning bot? IT MAY WORK FOR YOU.

PHP:
RewriteEngine On 
RewriteCond %{HTTP_USER_AGENT} ^semrush
RewriteRule ^.* - [F,L]

Bots that are listed above will all receive a 403 Forbidden error when trying to view your site. more info

Ofcourse this only works if the right htaccess in right place, wen nginx you have to check more ...!

Also a newer manual/instruction would be nice for Apache >= 2.4.x. https://httpd.apache.org/docs/current/rewrite/access.html see this part

iscussion:

Rather than using mod_rewrite for this, you can accomplish the same end using alternate means, as illustrated here:
SetEnvIfNoCase User-Agent "^NameOfBadRobot" goaway
<Location "/secret/files">
<RequireAll>
Require all granted
Require not env goaway
</RequireAll>
</Location>
 
Last edited:
@Ikkeben:
Overruling htaccess in sub ( domain/ directory) ?
There is a .htaccess in the public_html directory (which I placed here) and one in the /public_html/forum directory. However, the one in the forum directory does not contain any deny,allow statements.

Now I got it fixed a minute ago, but it's very odd. It was fixed by removing the "allow from all" line.
However, on most tutorials you see that this line should be added last. But I went and checked and it only works on 3 different servers when leaving this "allow from all" line out.
Seems odd to me due to the tutorials I read everywhere.

@PSD: Thank you for the tip about banning the bot the other way. I'm going to try that now!
 
@Ikkeben:

Tried this:
Code:
SetEnvIfNoCase User-Agent "^SemrushBot" goaway
<Location "/secret/files">
    <RequireAll>
        Require all granted
        Require not env goaway
    </RequireAll>
</Location>

But it seems that is not allowed in a .htaccess file, got this error notice:
Code:
/home/useraccount/domains/userdomain.nl/public_html/.htaccess: <Location not allowed here

@PSD: the code:
Code:
RewriteCond %{HTTP_USER_AGENT} ^semrush
RewriteRule ^.* - [F,L]
did not work on any .htaccess not even when using SemrushBot as name.

However, I fixed it now via the .htaccess luckily. But will indeed give 403's in the logs.
 
Apache 2.4 ?? see require in place of order deny allow
https://httpd.apache.org/docs/2.4/howto/access.html

The Allow, Deny, and Order directives, provided by mod_access_compat, are deprecated and will go away in a future version. You should avoid using them, and avoid outdated tutorials recommending their use.

That is what i mean with should be updated that TUT. ;)
 
Last edited:
Now I got it fixed a minute ago, but it's very odd. It was fixed by removing the "allow from all" line.
However, on most tutorials you see that this line should be added last. But I went and checked and it only works on 3 different servers when leaving this "allow from all" line out.
Seems odd to me due to the tutorials I read everywhere.

this is because of the order you gave, deny but after that you allow all. ;)

Read my Apache 2.4 part if you have this version then better to not use depricated parts if possible . ;)
 
this is because of the order you gave
Order was as in the tuts. So deny,allow being deny first and allow last.
I knew about the new "require" but .htaccess still works and it's something the customer made, so I wanted to leave it like that and inform him about the fact that he must not wait to long to change them to the new "require" codes.

Thank you for the quick help!
 
All "BAD BOTS" ( not listen to robots.txt) should be firewalled maybe a wishlist to have bot lists in the CSF / or ip tables DA you can choose but difficult though the real bad ones changes IP / headers and so on so makes less/almost none sense.

If you gave these bad boys extra work to do if possible crawling endless other bad known sites.

Redirect ;)

<IfModule mod_rewrite.c>
RewriteEngine On
# Redirect bad IP's
RewriteCond %{REMOTE_ADDR} badipnr-range-here$
RewriteRule (.*) http://nicebadsite-url-here$1 [R=301,L]
</IfModule>

Sarcasm
 
I agree they all should be firewalled. However since they use complete ranges such can fill up your iptables list and use resources too. And as you say they also change ip and stuff... so no sense.

So maybe it's better to only block them where they try to do wrong, I'm not sure.

Whahahaha, I like that redirect. :D :D
 
Back
Top