Archive for the ‘Apache’ Category

Htaccess Tools

Post under Apache | By LGR | On December 8th, 2009

If you want to save some time editing your .htaccess file check out Htaccess Tools. It has some great .htaccess generators to help save you some time. Generators include:

  • Htpasswd Generator
  • Htaccess Authentication
  • Hotlink protection of images
  • Block IPs with .htaccess
  • Block hitbots with .htaccess
  • Error Document
  • Redirection by Language

While you can do all of these things without using an online generator, I have found that for some people using an online generator like the ones available here enables people to manage their websites more. Perhaps one of the most useful generators available on the site is the hotlink image protection. By using hotlink image protection you will be able to save on your bandwidth and prevent other websites using your images directly.

Trackback Spam on the Rise

Post under Apache | By LGR | On December 20th, 2007

Anyone else notice a rise in trackback spam recently or is it just me they feel like picking on? The last few days I have been getting upwards of 50 trackback spams. Thanks to Akismet I have not seen any of them go through, but I decided that I was tired of deleting it and letting the spammers get access to my server resources. A quick look in my logs showed that the spam was not coming from the same IP so banning the IP or IP range would be pretty much useless.blank1.gif

Here are some entries from my log file:

Host: 216.104.34.250
/2007/03/text-link-ads.html/trackback
Http Code: 200 Date: Dec 18 20:24:03 Http Version: HTTP/1.0 Size in Bytes: 78
Referer: -
Agent: TrackBack/1.6

Host: 91.186.21.51
/2007/02/blogger-label-list-for-ftp-published.html/trackback
Http Code: 200 Date: Dec 18 20:22:38 Http Version: HTTP/1.0 Size in Bytes: 78
Referer: -
Agent: TrackBack/1.6

Host: 66.90.104.22
/2007/02/has-digg-jumped-the-shark.html/trackback
Http Code: 200 Date: Dec 18 20:20:28 Http Version: HTTP/1.0 Size in Bytes: 615
Referer: -
Agent: TrackBack/1.6

Notice anything in common? The User Agent strings are all the same: Agent: TrackBack/1.6.

A quick Yahoo search and I turned up this post Spiders and Bots .htaccess Ban List, which looked like just what I needed. There are tons of bad bots and user agents out there, and this list is only a small number of them I am sure. I really only want to block the Trackback user agent and the libwww-perl user agent since I have been getting several hacking attempts from a libwww-perl user agent.blank1.gif

There are several ways I could have done this but I thought I would try adding this first and see how it goes.


#block bad bots including trackback bot
SetEnvIfNoCase User-Agent "^libwww-perl" bad_bot
SetEnvIfNoCase User-Agent "^TrackBack" bad_bot

<Limit GET POST>
order allow,deny
allow from all
deny from env=bad_bot
</Limit>

I may have to edit the Trackback bot line since I did not include the version number, but I will leave it like that for a day and see what shows up in my log files. I will update this post if/when I do edit the Trackback bot line.

Thanks to Brontobytes Blog for the .htaccess code. It saved me lots of time.

Hope this helps someone that is having problems with automated trackback spam.

Use .htaccess to Block a Country

Post under Apache | By LGR | On October 30th, 2007

blank1.gifThere are occasions when you need to do some serious blocking on your website, and you have to block an entire country. I have helped people in the past block countries like China from accessing their website. While there can be many reasons why you would want to block en entire country from accessing your website it used to be a bit of a chore to create the .htaccess file to do it. Well not anymore, check out block a country and with a couple of clicks you can generate an .htaccess file that will block the countries of your choice.

I have been playing with some screencasting software so I took a short screencast of how to use the site. Watch closely or you might miss it. If you feel like blocking off all of us friendly Canadians it only takes you a few seconds now.

After you either copy the information or download the generated .htaccess file all that is left to do is either upload it to your website or integrate it into your existing .htaccess file. It makes blocking a whole country very easy to do. I will definately use this tool the next time I get a call/email to block a country from access a website.

Using .htaccess to Block Comment Spam

Post under Apache | By LGR | On October 14th, 2007

When I checked my blog on Saturday I had a large amount of comment spam that had been caught by Akismet, larger than usual for my little place on the web. Browsing through it briefly, I quickly noticed a common thread, they were all from the same IP address. I have better things to do on a Saturday (and actually most days) than wade through a bunch of comment spam, so I quickly went and added another new line to my .htaccess file.

deny from 195.225.177.48

I then deleted all of the comment spam and went on my merry way not thinking much about it until I went a checked my error log here today.


[Sun Oct 14 13:40:17 2007] [error] [client 195.225.177.48] client denied by server configuration: /home/*****/public_html/blog/index.php
[Sun Oct 14 13:40:17 2007] [error] [client 195.225.177.48] client denied by server configuration: /home/*****/public_html/blog/wp-comments-post.php
[Sun Oct 14 12:55:59 2007] [error] [client 195.225.177.48] client denied by server configuration: /home/*****/public_html/blog/index.php
[Sun Oct 14 12:55:59 2007] [error] [client 195.225.177.48] client denied by server configuration: /home/*****/public_html/blog/wp-comments-post.php
[Sun Oct 14 12:45:11 2007] [error] [client 195.225.177.48] client denied by server configuration: /home/*****/public_html/services/order_form.php
[Sun Oct 14 12:13:27 2007] [error] [client 195.225.177.48] client denied by server configuration: /home/*****/public_html/blog/index.php
[Sun Oct 14 12:13:27 2007] [error] [client 195.225.177.48] client denied by server configuration: /home/*****/public_html/blog/wp-comments-post.php

It goes on and on actually. I had an average of 14 hits an hour from this IP address. Image how much comment spam I would have had if I had not blocked the IP address? Now I was also curious as to who might be so interested in spamming the daylights out of my blog. A quick IPWHOIS on DNSStuff.com. You can take a look at the IPWHOIS information yourself, but what I found most interesting is they have a complete IP address range 195.225.176.0 – 195.225.179.255. Now I only blocked a single IP address, and I hope that it is just one bad user on their network, but the minute I see another 195.225.*.* address in my comment spam the whole IP address range will be blocked using:

deny from 195.225

I sent an email to the email address on record for the host, but it is my experience that it will either never be read, simply ignored or will dissappear into :blackhole:.

Custom 404 Page using .htaccess

Post under Apache | By LGR | On October 12th, 2007

Mike posted a good question on my earlier post on how to Disable Indexes using .htaccess.

Is there a way to specify what page to redirect to if there is a 404? Currently it’s displaying one created by my web hosting company, which I would prefer to get rid of.

If you want to display your own 404 error page with something other than the standard 404 Not Found Page that is returned by your hosting company all you need to do is add one line to the .htccess file in the root of your web server.

ErrorDocument 404 /404.html

Now when someone types the wrong filename or is trying to browse a folder and you have turned off indexes to the page that will be returned will be your custom 404.html page. It does not have to be just an html page either. You could make it return a php page that has code to email you when someone has triggered a 404 error telling you what page they were looking for, other interesting information to help them find what they might be looking for or maybe something that is just fun. You can do any number of things with your own 404 page.

If you are a blogger and are using WordPress you should take look at this great page “Creating an Error 404 Page” on WordPress.org.

There are other options as well. The one I have on this blog right now is not very exciting, since the home page is simply returned as the 404 page.

RSS Feed Scraper

Post under Apache, RSS | By LGR | On October 8th, 2007

It appears that I have a fan, ok maybe not a fan. I have a website scraper that is just not smart enough to actually read the content they are scraping so they are getting my nice RSS feed additional content and posting it in the site. They have many of my posts and the majority of them have this at the bottom of them:

Copyright © LGR Webmaster Blog. This feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at is guilty of copyright infringement.

Visit the LGR Webmaster Blog for more great content.

You would think that would make it pretty obvious that they stole the content from somewhere. I have sent an email to both the email address on the whois record for the domain and the email address I could find for the web host for their IP address in hopes of having the content removed from the site. Considering the email I sent to the address on the whois record bounced for the domain, I don’t know if I will have much luck.

After the email to the address on the whois bounced I thought I would have a little fun with this scrapper site. If I can’t get the content removed I can at least make sure people know that the content is stolen, just in case they don’t read the copyright notice at the bottom of the post. I post the odd image into my posts, but from now on I will make sure there is always an image in the post, even if it is just a blank image that you can’t see in the post itself. This image is important. It is placed in the WordPress uploads folder, but I suppose it could be placed anywhere on your website. Inside of the WordPress uploads folder I have added another .htaccess file with the following:

ErrorDocument 403 /images/403.gif

RewriteEngine on
RewriteCond %{HTTP_REFERER} websiteIwantBlocked\.com
RewriteRule .* - [F]

I changed the website name obviously, but you should get the idea. This stops sending all the images from the WordPress uploads folder to any request coming with the referrer of websiteIwantBlocked.com and returns the 403 error document. Because these are all images that should be sent out from this folder I have created a custom error document that is an image for this folder and placed it in another folder (images). Now when an image is requested from the websiteIwantBlocked.com instead of the server sending out the image I have in the post it returns a 403 error and my custom error image, which by the way looks like this:

403

Now when someone visits the website that scraped my feed that I have listed they get a nice warning that the site has stolen bandwidth, content or both. It only does this for the sites I have listed so feed readers should not be affected.

There are other things I have done as well. I have added the website IP address into the blogs root .htaccess file and denied access, in case the website was scraping the feed directly. It looks like this if you are wondering:

deny from IP ADDRESS YOU WANT BLOCKED

I use FeedBurner for my feeds, and usually they list uncommon uses of feeds, but there has been no mention of this one. I did notice that one of the bots is WordPress so it is possible that the site is scraping the FeedBurner feed and not directly from the site. One of the features I wish FeedBurner had was the ability to block individual IP addresses from accessing a feed. That would make it so much easier since every website has an IP address.

I guess we will see if I get an email back from the web host. I am not holding my breath. I think I might have to make due with this, or move the feed away from FeedBurner so I can block individual IP addresses.

How do other people handle very persistent RSS feed scrapers?

Disable Indexes using .htaccess

Post under Apache | By LGR | On October 5th, 2007

I have several personal websites on a shared server where indexes are turned on by default in Apache. That is simply annoying, because I hate having stray empty index.html files sitting all over the place. I suppose I could just leave the indexes on but I dislike the idea of anyone in the world being able to just peek into folders, even if it is unlikely they will find anything very interesting, you just never know. They might have some hole into the system. Anyways, if you are like me the easiest way of getting rid of indexes is using one line in an .htaccess file in the root folder:

Options -Indexes

Now if a folder does not have an index.html file the server will respond with a 404 file not found error and send people your error page. Amazing how that one simple line can save time and keep you from having to go and create index.html files in all those folders you don’t want people poking around in.

If you are still wondering why you would want to do this take a read through this post titled “Find almost any kind of Ebook or File Online” over at Earners Blog. One line in the .htaccess file stops that from happening.

Whitehat SEO Tips for Bloggers

Post under Apache, SEO | By LGR | On August 13th, 2007

This video has Matt Cutt’s doing a presentation at WordCamp 2007 with search engine optimization tips for bloggers.

It is a long video, just over 1 hour in length so you might want to just put the headphones on and let it play while you are working on something else.

There is a lot of good, basic information in the video that will help all people that run blogs and websites. Aside from the basic information about SEO, Matt encourages people to be creative to find ways to get links. He also has a great security tip using an .htaccess file to protect the WordPress admin folder. Make sure you change the IP address of your home computer and your work computer.

Put this .htaccess in /wp-admin/ (not in your root directory!

AuthUserFile /dev/null
AuthGroupFile /dev/null
AuthName “Access Control”
AuthType Basic

order deny,allow
deny from all
# whitelist home IP address
allow from 123.45.67.89
# whitelist work IP address
allow from 89.67.45.123
Read more at: http://www.reubenyau.com/protecting-the-wordpress-wp-admin-folder/

WWW or Not: That is the Question?

Post under Apache, SEO | By LGR | On February 20th, 2007

Most people don’t realize that there website can found by using both www.mywebsite.com and mywebsite.com. Why is this important? The search engines will index both the www site and the non-www site, providing the search engine an identical set of pages which could lead to a penalty. While most search engines will eventually sort this out why wait for them to do it. Don’t give them a choice in what they index.

Create a .htaccess file and place it on the web root folder. Include the following lines of code:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^mywebsite\.com$ [NC]
RewriteRule ^(.*)$ http://www.mywebsite.com/$1 [R=301,L]

This will redirect all requests for pages at mywebsite.com to www.mywebsite.com giving only one copy of your website for the search engines to index. Give it a try on this blog. All requests to blog.lgr.ca are redirected to www.blog.lgr.ca.

Process .html as PHP

Post under Apache, PHP | By LGR | On February 7th, 2007

Hanging out at web master forums I am amazed at the number of times I come across posts asking how to process .html files as PHP. Often the question is being asked because they want to start using php to include parts of their template instead of creating static .html files. Well here is a little piece of code that has saved me countless hours, especially when moving a static .html files site to a dynamic PHP site.

To setup a Linux server running Apache web server to process .html (.htm) files as PHP. create an .htaccess file in the root folder of your website and add these lines to the .htaccess file:

AddType application/x-httpd-php .htm
AddType application/x-httpd-php .html

As long as your server has been told to process .htaccess files this will tell the Apache web server to process all .htm and .html files now as php files. This now allows you to include other php files, use variables and all kinds of PHP programming on your old static .html files.