Page 1 of 1

Webmaster's guide: How to stop spam

Unread postPosted: Fri Dec 30, 2011 4:55 am
by abcd99
(Per Lady Wu's request)
Most spammers use "ready-made" hacking programs. These programs have one weakness in common: They submit HTTP/1.0 requests. So, in order to get rid of such annoyance, we must block HTTP/1.0 requests. Is this deleterious? No. Virtually all modern browsers in both PCs and mobile phones are able to handle HTTP/1.1 requests. So, nothing of value is lost. Except perhaps some hermits with their antiquated browsers.

How? Use Apache's mod_rewrite. It is usually built-in by web hosting every where. So, all you need to do is to enable it. You should be able to adjust the settings in the .htaccess file. This is how to turn it on:
Code: Select all
<IfModule mod_rewrite.c>
  RewriteEngine on
</IfModule>


The IfModule statement detects whether mod_rewrite is installed or not. If so, turn it on.

The mod_rewrite works by "rewriting" request URL matched by RewriteCond statements. Go Google for the topic, but what I want to highlight is this:
Code: Select all
<IfModule mod_rewrite.c>
  RewriteEngine on
  RewriteCond %{THE_REQUEST} HTTP/1.0$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Indy\ Library.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*larbin2\.6\.3\@unspecified.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Mail\.Ru.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ URL\ Control.*$ [NC,OR]

#This one used to be the user agent for anonymizer---which was fine with me---but
#recently it has been left as the user agent for what appear to be malicious bots
#based on their behavior, so I've decided to block it for now.
RewriteCond %{HTTP_USER_AGENT} ^.*TuringOS.*$ [NC,OR]

#These lines block bots that use your bandwidth for their own commercial reasons.
RewriteCond %{HTTP_USER_AGENT} ^abot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^aipbot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Linkwalker$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*nameprotect.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*TurnitinBot.*$

  # End the conditions, forbid them all (F) and this is the last rule (L)
  RewriteRule .* - [F,L]
</IfModule>


The line "RewriteCond %{THE_REQUEST} HTTP/1.0$ [NC,OR]" blocks all HTTP/1.0 requests. The other RewriteCond lines are matching the spam bots. At the very end, the RewriteRule says, "Forbid them all". That's it.

Re: Webmaster's guide: How to stop spam

Unread postPosted: Tue Feb 28, 2012 6:21 pm
by Erdrick
Question- what type of spam does this stop- email address harvesting spam, message board spam, etc...?

Re: Webmaster's guide: How to stop spam

Unread postPosted: Tue Feb 28, 2012 6:25 pm
by James
Hmm... I missed this previously.

I've considered blocking HTTP/1.0 requests after some research in the past but today I cannot remember why I didn't follow through with it. Maybe I just saw something shiny and got distracted. It might be worth giving a shot, but members should be prepared in case they need to report a problem which prevents them from accessing the website and I also need to make sure this doesn't mess with Google.

Erdrick wrote:Question- what type of spam does this stop- email address harvesting spam, message board spam, etc...?

Well, the part that blocks HTTP/1.0 blocks all requests which use the protocol. The idea in this case is to block automated signup methods which depend on it while leaving presumably appropriate traffic (e.g. your web browser) untouched. Other rules are designed to specifically block access from certain programs (web or otherwise) through their user agent (a text string conveyed by a client when it visits a website; for example, when you visit the website in IE8 on Vista it might share, 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)').