When Written: Sept 2006
It’s a busy time at the moment with several major projects on the go as well as amendments to existing sites, that and the inevitable cruises that must be attended in the Mustang during the summer all means that the pressure has been on.
One new site that we are building looks like it could be a particularly interesting project as it posed a different sort of problem for us. The brief stated that they wanted to show their varying stocks of products on the web and for users to be able to find these items by searching for part numbers via web search engines rather than searching for the client’s company specifically which if they are unlikely to have heard of as they are not yet customers.
Obviously the web site would have its own search page as well, which would give greater functionality. The best way of displaying such information is by storing the products in a database and displaying them via dynamic web pages. This is the way one would build a web site to display stock items site. But what about the search engines, could they index such a site? There are several views on such a question and years ago search engines like Yahoo would only look at .htm and .HTML files so would ignore .asp and the like.
One solution could be to generate static HTML web pages every time the database is updated as a batch process but this is messy and if the database gets large, it could take some time. Perhaps we could write a static web page automatically when an update event fires on the database? These are possible solutions that were considered until talking to Anna in the office about the problem. She said that she often gets email enquires about some information that a user has found on one of our web sites via Google, this information is only held in a database and displayed via a dynamic ASP page. So at least Google must able to index dynamic pages, but how?
Stock Search – Version for web search engines to see.
Stock Search – Version that users browsers are redirected to
Let us just consider the two main types of writing a dynamic web page to display information from a database. One way would be to present the user with a text box to type into it some information such as a part number hit the ‘submit’ button and the web page returns the data it has that matches. You might use this way if the user knows the item name or part number, it certainly can be a fast way of getting to the right information. However one problem with a dynamic web page such as this, is that a web search engine can not index the data as it has no idea what to enter into the text box.
However consider a second way of displaying such data, this is where you have a web page showing the first say twenty records with ‘next’ and ‘previous’ buttons to enable the user to navigate through the entire dataset. Each of these lines would contain a link which may be of the format:
http://www.mywebsite.com/showproducts.asp?partno=6736
This link would take the user to a web page showing the details of a single product, in this example a product with a part number of 6736. The search engine is able, by stepping through the database, with the ‘next’ and ‘previous’ links to display web pages for all the items held in the database, and thus index all these items. Building such a page is very easy and products like Dreamweaver will almost do it for you.
However in this case we didn’t want the user to go to this simple display page when they clicked on a link from a search engine, what we wanted was for them to be redirected to a small web application that would allow them to add this item to a ’basket’ and also to easily search for other items. What we needed to was for the web server to redirect the user to our web application. Whilst redirecting everyone and everything is easy to achieve in a variety of ways either using code, meta-tags or configuration of the web server, we didn’t want the search engines to be redirected as then they may not be able to index things fully. So we need to detect if a search engine’s crawler bot is accessing the page, it which case we allow it to explore all the links without any redirection. Thankfully most (but not all) search engine bots identify themselves with the word ‘crawler’ in the HTTP_User_Agent variable in the HTTP request which gets sent with every request from a browser to a web server. So in ASP we would write something like:
<%
FromSearchEngine = false
After setting the FromSearchEngine flag to false we need to examine the HTTP header variables for the identity of the user agent (browser, web crawler) we do this with the following code:
If instr(request.ServerVariables(“HTTP_USER_AGENT”),”crawler”) then FromSearchEngine = true
If its not a search engine crawler visiting our site but is a web browser clicking on a link from a search engines indexed page we need to redirect the browser to our ‘user friendly’ web application, rather than the pages that we created for the search engine to index. If it is a search engine’s crawler bot visiting our site, just let it access this page and don’t redirect it. So we would write something like:
If not FromSearchEngine then
if request.querystring(“txtid”) = “” then
response.Redirect(“Search.aspx?txtid=” & rsitems.fields.item(“SparesId”).value)
end if
response.Redirect(“Search.aspx.asp?txtid=” & request.querystring(“txtid”))
End if
%>
When a search engine’s crawler bot hits such a page the web site displays the product information in full without any redirecting, however when a user clicks on a link to the same page their browser with not announce itself to the web server as a ‘crawler’ and so will be redirected to the alternative display page which in this case is a Web application search page.
This way we should be able to make sure that the web site is fully indexed but that the user is still presented with a modern web experience. As soon as the database data changes the indexable pages will change automatically ready for the next time the search engine crawler indexes it. If the index is out of date and products no longer exists, rather than the web server replying with a standard ‘404’ error message, configure your web server with a custom ‘404 error web page that redirects the user to the web site default page or similar. That way the user will never click to a broken linkBy:
Article by: Mark Newton
Published in: Mark Newton