Tips, Tricks, Tools & Techniques

for Internet Business, Life, the Universe and Everything

RSS Feed



Month: August, 2007

WordPress behind a firewall

30 August, 2007 (16:21) | WordPress | By: Nick Dalton

Later versions of WordPress has a tendency to talk to itself. This is not a sign of loneliness, but rather an efficient design for trackbacks, pings and other asynchronous events. After you publish a post, or at a predetermined time for a future posts, WordPress sends a URL request to itself. Normally this works as designed and is totally transparent to the blog owner. But with some firewall configurations this mechanism fails.

In a simple web server setup your domain name resolves to the IP address of the web server.
$ nslookup www.yourdomain.com
Name: www.yourdomain.com
Address: 217.68.70.69
$ ifconfig
eth0 Link encap:Ethernet HWaddr 00:30:1B:43:85:7F
inet addr:217.68.70.69 Bcast:217.68.70.255 Mask:255.255.255.0

But with a firewall or a load balancer in front of the web server the domain name resolves to the IP of the firewall (or load balancer) which then forwards requests to the web server.
$ nslookup www.otherdomain.com
Name: www.otherdomain.com
Address: 64.27.14.2
$ ifconfig
eth0 Link encap:Ethernet HWaddr 00:30:1B:43:85:7F
inet addr:10.0.0.1 Bcast:10.0.0.255 Mask:255.255.255.0

With a firewall you need to explicitly permit traffic for each source and destination. The default configuration for a Cisco PIX firewall for one major hosting company allows web traffic from the outside world through to the web server, but the default rules prevent any traffic from internal servers to internal servers. This is generally a good setup. However, this creates a problem when WordPress tries to access a URL on its own domain. The domain name resolves to the firewall, so the request is sent from the web server to the firewall. The firewall realizes that the request is coming from an internal server so it blocks it.
$ wget www.otherdomain.com/index.php
--21:43:11-- http://www.otherdomain.com/index.php
=> `index.php'
Resolving www.otherdomain.com... 64.27.14.2
Connecting to www.otherdomain.com|64.27.14.2|:80...
^C

There are at least two ways to resolve this problem:

1. Create a new rule in the firewall that allows traffic from the IP address of the web server to itself.

If you don’t have access to the firewall configuration or you don’t want to mess with the rules, then another option is:

2. Add a line to /etc/hosts for each domain served by the web server:
10.0.0.1 www.otherdomain.com

There is a slight performance benefit to this latter approach since requests don’t have to go through the firewall. The drawback is that you have to remember to add a line to /etc/hosts for each new domain.

As far as I can determine WordPress does not show any error messages for the trackbacks and pings that fail. My blog was broken in this regard until I figured this out. And I just thought that other bloggers were ignoring or deleting my trackbacks. :-(

It also turns out that WordPress queues all the failed pings and trackbacks. So if you suddenly received a trackback to your blog from an old post here, that’s the reason why.

Happeneur Call

29 August, 2007 (16:06) | Security | By: Nick Dalton

I had a great teleconference today with Mike Jay and his Happeneur coaching students. The topic was web site security. It was a very interactive call with great questions from the participants. Before we were done we had covered topics from backups and protecting sensitive data to redundant systems.

I managed to squeeze in one sentence or two about my Digital Security Report. But this was a no pitch, no fluff just pure information type of call . Just the way I like it.

If you want me to do a similar call with your customers let me know. My availability is limited, but this is something I enjoy doing so I do my best to accommodate requests.

Browser toolbars reveal more than you think

27 August, 2007 (06:34) | Search Engines, Security | By: Nick Dalton

All the major search engines provide toolbars that you can download and install in your browser. Each toolbar has some nifty features that are commonly not found in browsers, which makes them compelling enough to download and install. One feature of all toolbars is to be ale to search the web using the search engine that made the toolbar. This is of course the reason for the toolbar’s existence: to funnel more searches to the search engine.

Another common “feature” of search engine toolbars is to report home about each web page that you visit. Even though you can in most cases turn off this feature, the toolbar offers some compelling extra benefit so that most users keep it enabled. (Or they are just unaware of the “call home” feature.)

If we for the moment disregard the privacy aspects of reporting every web page that you visit, there is another implication that most web site owners are not aware of: The web pages reported by toolbars are fed into the search engine’s web crawler. (I don’t have prof that this is the case for all toolbars, but I know it’s true in at least one case. And that’s enough to cause trouble for web masters.)

What’s the problem with that, you say? One example could be that you’re working on a new web site that is not quite ready to be public yet. And you haven’t bothered to password protect it during the development. Who is going to guess your new domain name anyway? As you’re busy developing your site, the toolbar sends the URL of every page – finished or not – to the search engine.

Another, perhaps more serious, example is the thank you page of web sites that sell digital products. When you – or anyone of your customers – goes to the thank you page, the toolbar reports the URL to the search engine. If you don’t have any additional protection on the thank you page it will be included in the search engine index. Then when a potential customer uses that search engine it’s possible that your thank you page shows up in the search results. And it’s very likely that the person searching was looking to buy your product. But now, with direct access to the thank you page the potential customer can download it for free. You just lost a sale.

If you have good web analytics it may be possible to see these direct accesses and calculate how much money you’re loosing. But it’s also very likely that the search engine has cached your page, and possibly even the product download itself. In that case you will never even know that your product was downloaded without payment.

My Digital Security Report has advice on how to protect your digital products from overzealous search engine toolbars.

Can anyone view your WordPress plugins?

20 August, 2007 (06:24) | Security | By: Nick Dalton

If you are running WordPress go to www.yourdomain.com/wp-content/plugins. If you see a directory listing of all your installed plugins you may want to follow the steps described by Shoemoney here.

This is not a major security hole and you are not alone in exposing your plugins. Google has indexed over 500,000 plugin directory listing pages.

It appears that this will be fixed in the 2.3 release of WordPress.

robots.txt

13 August, 2007 (22:02) | Search Engines, Security | By: Nick Dalton

Back in the days around 3 B.G (Before Google) AltaVista was the new search engine on the block. In an effort to show off the power of their minicomputers, the AltaVista team at Digital decided to crawl and index the entire web. This was at the time a new concept. Many web masters didn’t relish the idea of a “robot” program accessing every page on their web site as this would add more load to their web servers and increase their bandwidth costs. So in 1996 the Robots Exclusion Standard was created to address these web master concerns.

Using a simple text file called robots.txt you can instruct web crawlers (a.k.a. robots) to stay out of certain directories. Here is a very simple robots.txt which disallows all robots (User-agents) access to the /images directory.

User-agent: *
Disallow: /images

By disallowing /images you are also implicitly disallowing all subdirectories under /images, such as /images/logos and any files beginning with /images such as /images.html.

Curiously there was no “Allow” directive in the first draft of the standard. It was added later, but it’s not guaranteed to be supported by all robots. So anything that is not specifically disallowed should be considered fair game for web crawlers.

To disallow access to your entire web site use a robots.txt like this:

User-agent: *
Disallow: /

If User-agent is * then the following lines apply to all search engine robots. By specifying the signature of a web crawler as the User-agent you can give specific instructions to that robot.

User-agent: Googlebot
Disallow: /google-secrets

Since the original spec was published several search engines have extended the protocol. One popular extension is to allow wildcards.

User-agent: Slurp
Disallow: /*.gif$

This prevents Yahoo! (whose web crawler is called Slurp) from indexing any files on your site that end with “.gif”. Keep in mind that wildcard matches are not supported by all search engines so you have to preface these lines with the appropriate User-agent line.

You can combine several of the above techniques in one robots.txt file. Here’s a theoretical example.

User-agent: *
Disallow: /bar
User-agent: Googlebot
Allow: /foo
Disallow: /bar
Disallow: /*.gif$
Disallow: /

This would result in the following access results for a few URLs:

URL Googlebot Other robots
example.com/foo.html Allowed Allowed
example.com/food.html Allowed Allowed
example.com/foo/ Allowed Allowed
example.com/foo/index.html Allowed Allowed
example.com/foo.gif Allowed Allowed
example.com/fu.html Blocked Allowed
example.com/bar.html Blocked Blocked
example.com/bar/index.html Blocked Blocked
example.com/img.gif Blocked Allowed

Computer programs are pretty good at following instructions like these. But for a human brain it can quickly get overwhelming, so I highly encourage you to keep it simple. One of the longer robots.txt files I’ve encountered is from www.seobook.com – it’s over 300 lines long. The site owner Aaron Wall is the author of the excellent SEO Book; he knows what he’s doing.

For us mortals there is a robots.txt analysis tool in Google’s webmaster tools. Highly recommended. Another good resource for more information on the Robots Exclusion Standard is www.robotstxt.org

Today when companies are spending a lot of money to be included in search engine listings, the idea of excluding your content may seem quaint. But from a security perspective there are many valid reasons for limiting what a search engine indexes on your site. See my Digital Security Report for more information.

Update to WordPress 2.2.2

6 August, 2007 (21:55) | Security, WordPress | By: Nick Dalton

If you are using WordPress 2.2.1 you should immediately get the 2.2.2 security update.

The discovered bug is a Cross-Site Scripting vulnerability. See http://trac.wordpress.org/ticket/4689 for more details.

The WordPress developers assigned this bug a priority of “highest omg bbq” :-)