Avoiding broken links

When Written: April 2013

I have an announcement… the Internet is broken! Well not quite but I thought it might make you read further! What I am talking about is the frustrating number of broken links out there pointing to sites like knowledge bases.

Take the scenario where you are trying to find the answer to a particular problem, this being the second main use of the Internet, the first being looking at pictures of cats. After trawling various forums and reading past the personal attacks on various individuals, you find a link to what might be the answer to your particular problem. The link looks good, it is not a link to someone’s software utility site which claims to fix your computer, but often ‘fixes’ it in ways you don’t want and which will cost you time and possibly money to ‘un-fix’ it. No, this link points to a manufacture’s knowledgebase, let’s say for the sake of this example, Microsoft, although most are guilty of the problem I am about to describe. You click on this link and you are not directed to the page you require but to a generic page that helpfully informs you that ‘this page has moved’ but … there is no clue as to where it might have moved to.

Often the reason for this is that most knowledgebase content is fed from a database and each article is identified by a unique reference and it is this reference that has been changed for some reason. In theory, even if the knowledge base has been moved to a different domain or a different web page this should still be querying the database and it should still be possible to point to the new location, or better still, to redirect the user’s browser to the new location. However in many cases the article number has just been changed for some reason and so you are left with no alternative but to search this second site in the desperate hope of finding the article that the person on the previous forum had found that held to answer to your problem. The reason for the change in article IDs is simply lazy coding, and all too often new sites and products appear with little or no thought to the previous versions.

One of the main strengths but also a weakness of the web is that links to a web site from other sites stay there forever and have no way of knowing that a link has changed. There currently is no way for these links to auto update and so it is up to the webmaster of any website to make every effort to ensure that visitors coming to their site via older links are presented with a relevant page. By relevant page I do not mean a cutesy ‘404’ error page with a kitten on it, although these have their charm and I have used them myself (http://www.ecatsbridge.com/404.asp ) , but they should be considered as a final ‘fall back’ for the links that have become so broken and cannot be mapped on the new site.

Beware of ‘cutesy’ 404 error pages, here is one I made earlier.

Another case could be where a previously free facility starts to require a login. In this case the older link will now take them to a login page or a create account page. After completing this, the user should then be taken to the previously linked page. I have come across sites which after creating an account your original destination is lost and you are directed to a jolly welcome page and you now have to re-trace your steps to try and get to the page you were originally interested in which, as you are now logged into the site direct you straight to the page. All this is certainly a ‘user experience’ but not a good one.

So why do the unique ID’s for these articles change? The most common reason is that they are a field in the database table that is also the key field and this is often an auto generated value by the database as well as obviously being unique so as to identify the article. When you move your data to another database this key field value can sometimes change as it is auto generated by the database itself. The answer here is to copy the old ID field into another field in the new database and imaginatively call it something like ‘oldID’. Then in your web page you can test for the referring URL that the visitor has come from with something like:

Request.UrlReferrer.AbsoluteUri

ServerVariables("SCRIPT_NAME")

If the visitor is coming from an old link then your code can query the ‘oldID’ field as its reference to find the correct article rather than just presenting a generic ‘tough, we have move this article’ page. If, because your company has to deal with the ‘database administrator from hell’ and will not let you modify their shiny new database then you could implement a lookup table that takes the old ID, searches for it in a lookup database table and returns the new reference. Obviously all this requires a little work but it is not rocket science hence my initial claim of ‘lazy programming’. With ASP .NET the best place to put your code to detect the URL that a visitor is coming from is on each of your new pages. This of course is easily done by putting your code in the MasterPage for the site so that it will immediately appear on all the pages that use this MasterPage. With the older ASP you could put this code on your custom error page but things are a little different in .NET.

Let us just go back a bit and look at the web server itself. If you are developing with ASP .NET then you will be using IIS so I will concentrate on the techniques for this group of technologies but Apache and other web servers also support similar capabilities. If a web server receives a request from a user client it will attempt to resolve the URL by examining its file system to see if the web page exists and can be rendered. If there is an error in this process for whatever reason then an error web page is generated by the web server. These web pages are predefined by the manufacturer of the web server but they are editable, or a better option is to instruct the web server to use a different web page that you have written in order to provide a more friendly feedback to your user. This is, of course, where you should resist pictures of kittens!

Creating this page is done within the management screens of IIS for ASP and HTML pages. However things are different in ASP .NET, in many ways they are easier particularly if, as in many cases, you will not have full access to the settings of the IIS web server with a hosting company. In ASP .NET to enable custom error pages you simply need to edit the web.config page of your site. A default entry will look something like this:

<customErrors mode="Off"/>

With this setting if any error occurs the user will be presented with the default web page that you may have seen on some sites, this looks a bit like this:

With custom error pages you can give your users a better feedback as to what might have gone wrong.

To allow custom error pages you need to change this to :

<customErrors mode="RemoteOnly" />

There are three settings for mode, ‘Off’ which is the default setting, ‘On’ and ‘RemoteOnly’ which we are using here and means that should you browse locally on the server, perhaps to try and debug a problem, the web server will not give you your custom error pages but serve you the default page with more description as you are browsing from the web server itself, this might help you to discover what is going wrong. This can be very handy for debugging your live site should it be necessary and is the setting I would recommend you use.

The next stage is to write and define the custom web pages by editing the web.config file to something like:

<customErrors mode="RemoteOnly" redirectMode ="ResponseRedirect" defaultRedirect="GenericErrorPage.htm">

<error statusCode="404" redirect="404.aspx" />

</customErrors>

The line of code with the ‘error’ tag in it can be repeated to handle any of the HTTP error codes to give you considerable control over the messages your users will see should things go wrong. So everything seems to be working like a dream. The trouble comes when you want to detect the link that the user is trying to follow when they get an error on your site. You need this information so that you can perhaps redirect them to the correct page on the new site. But there is a problem when you come to try and detect this referring URL when using custom error pages in ASP .NET. Instead of getting the full URL and any querystring parameters of the link that directed the user to your web site to a page that is missing, you will get the incomplete URL of the referring page but without the all-important querystring which instructs the dynamic web page as to which record to query in your database, so for example say that the user has clicked on a link on another web site that links to your masterpiece and this link is :

http://www.youroldsite.com/article.aspx?id=1234

Normally you simply use the Request object to return the referring URL, however in this case if our custom error page is 404.aspx where we examine Request.PathWithQueryString() we get

404.aspx?aspxerrorpath=http://www.youroldsite.com/article.aspx

Actually in the real world this would be URL encoded to make it machine readable but I have removed this to make it more human readable. As you can see the all-important ‘?id=1234’, which identifies the actual article that the user is interested in, is missing. So if your custom error page code is redirecting the user to the new correct page, it will have no idea exactly which article the user was originally interested in and this is the problem. I wish I could tell you about a definitive solution to this, but there isn’t. I use a work-around which works well in most scenarios.

There are three main ways of moving a web site that would cause this problem of broken links and we are assuming that the various article IDs have also changed:

If you no longer own the old domain then obviously there is nothing you can do about the broken links that users will encounter.

If you still own the old domain:

The pages are the same but it is now hosted on a different domain.
The domain is the same but the page names and or the article IDs have changed.
The pages and or the article IDs and the domain changes

You could maintain some redirection pages on the old domain, but for the sake of maintenance is probably better to point the old domain to the web server with the new domain so both domains point to the same web site and web pages. This could cause some confusion to users who are expecting the previous familiar web site but are suddenly directed to a new looking web site so it is a good idea to put some code into the Master Page for all your web pages that detects the full URL that the user is following to get there, then if they are coming from a link pointing to your old domain you can present them with an explanation page before displaying the new page. You can do this by simply using the Request object.

However for this to work your new site must contain the same page names as your old web site for all dynamic pages that rely on a querystring for passing parameters; it doesn’t matter if some of these new pages are blank with just the redirect code in them. If you don’t have the same page names the user will get a ‘404’ error and be presented with your custom error page but as I explained earlier you will have lost the full URL the user was following and so will be unable to know the ID of the article that they wanted. It also is important for the sake of your search engine placing that any broken links found by the search engine are met by a ‘301’ HTTP request. 301 tells the search engine that the page has moved permanently and is not just unavailable for some reason.

This is done in the configuration of the web server for iis 7 see http://www.iis.net/configreference/system.webserver/httpredirect for iis6 http://www.hosting.com/support/iis6/301-redirect-in-iis-6. With Apache you can modify the .htaccess file to achieve the same effect. By doing this, when the search engine tries to index a page that was there and gets a redirect, it will be told that this is a permanent situation and it will amend its index accordingly. Obviously if you are not bothered about your previous SEO rankings then you can ignore permanent redirects.

Article by: Mark Newton
Published in: Mark Newton

Avoiding broken links

Leave a Reply Cancel reply