When Written: June 2009
The very first article I wrote for PcPro way back was a joint piece with Jon Honeyball about Microsoft Index server. This was a service that would run in the background and index all the files in selected folders. This index could then be queried via a web based language; I wrote a commercial web site using this technology, the search was fast and it would fully index office documents as well as HTML and pdf files. The web pages to query this index had to be written in a custom query language called idq and displayed with .htx pages, but thankfully this capability was later added to ASP. Fast forward to today and index server is now capable of indexing data not only in files but also stored in SQL Server database, where it is called ‘ Full Text Indexing’ .
The other day I had need to reuse this technology but the path was not as easy as it first looked so I thought you might be interested in what I found. Like most Microsoft technology, getting it to work can involved jumping through some hoops before things work the way they are supposed to. Why bother with Index server if your data is stored in a database, surly finding information is what databases are about? Not always the case if you need to search less structured data such as a database of blog entries. So say you have your database with large amounts of text in some of the fields, if you used the normal search with a SELECT * FROM myTable WHERE text field LIKE ‘%searchtext%’ this could be very slow with large amounts of text. However by using FreeText indexing you simply tell the content indexing service which fields you what to index in this way and then you can query it with :
SELECT * FROM myTable WHERE CONTAINS ‘searchtext’
The results will be returned very quickly and if the ‘searchtext’ contains more than one word it will do a ‘fuzzy’ search where the words are not necessarily next to each other.
There are other ways of querying this index but let’s first look at how you set it up. The FreeText search is part of SQL Server , it is included in the advanced version of the product but is not part of the default install, and it also works with SQL Server Express edition which is free. If you already have SQL Server installed, first check if the FullText index service is also installed with the query:
SELECT FULLTEXTSERVICEPROPERTY('isfulltextinstalled')
If the service is installed this will return 1 for true. If not then you will need to download the Advanced pack version of your SQL Server and reinstall. Don’t worry because all your exsisting settings and databases will remain intact, just make sure that you know what the instance of SQL Server is called so you can update the correct one, this is often called SQLEXPRESS with the free version but a quick look with either SQL Management Studio or SQL Server Configuration Manager should reveal which one you are using. When you run the install, after a couple of initial check screens you will get a screen whch has a checkbox called ‘hide advanced options’ and this is by default checked. Make sure that you uncheck this otherwise you will not see the next screen, in this next screen you can expand out a tree and select what bits you want installed. Feel free to go on a click fest if this is your development box, but for our purposes we only need to add the Free Text Indexing service. Once this has been done let the install run. If all goes well you should have everything running. However, if you are like me and several others, judging by the forums, then it will not work. To test if all is ok run the query:
SELECT * FROM sys.fulltext_languages
You should see a list of all the languges support by index server, if not then the chances are that you, like me, have the common permissions problem that I came across on the two boxes I tried. The problem is caused by the SQL Server and the Full Text Services running logged as the NetworkService account, which apprently does not have the right level of permission, and you need to change this to the LocalSystem Account. You can do this in SQL Server Configuration Manager. When this is done the services will be restarted, run the query again then the previous query should return a list of some seventeen laguages. If all is well, the next stage is to set up your FeeText index. The first thing to do is to build a Catalog which will hold this, you do this with a query like :
CREATE FULLTEXT CATALOG MyCatalog
You now need to tell the indexing service which fields you what to index in this way, the query for this would be something like:
CREATE FULLTEXT INDEX ON dbo.Blogs (title ,body) KEY INDEX PK_Blogs
Where Blogs is the name of the table, title and body are the two fields containing the text we want to index and PK_Blog is the primary key index on the table. Index server needs this key, but you would have one on your table wouldn’t you? The full syntax for creating indexes also has the ability to select the language you wish to index by, but by not specifying this, the system will take the default language of the SQL server as the required one to index by. If the query completes successfully then try querying your newly created index with the query:
SELECT * FROM Blogs WHERE CONTAINS(*,’yoursearchtext’)
If you don’t get the result you expect then just bear in mind that the indexing service works in the background and may take a little time to index your data if it is large. If nothing is happening then you may need to start the indexing with the command :
ALTER FULLTEXT INDEX ON Blogs START FULL POPULATION
Although I found this was not necessary, it does give you the ablity to start and stop the indexing process which could be handy if doing a large update of data. By stopping the FreeText indexing you can take some of the load off the server and perhaps run it during a quiet period, but it really depends on your application.
The query syntax allows for very complex queries where you can, for instance, apply a weighting value to each word in the query so that the words with the highest weighting will rate higher in the returned result set. Index server will also index and allow you to query binary files like office documents stored in the database. You can specify inflectional forms of your query words so for example if you search for drive, the system would return results with drives, drove, driving and driven as well as drive in them, all very clever stuff.
Index server does not index every word, so words like ‘and’ ‘to’ etc are ignored to stop the index getting bloated, this is achieved by using ‘noise’ files which are simple text files with a list of the words to ignore when indexing and these are stored in
$SQL_Server_Install_Path\Microsoft SQL Server\MSSQL.1\MSSQL\FTDATA\
You can customise these to suit your own needs. So there you have it, a relatively easy way to allow a flexible way to query text across your database, perhaps you can now go off and build the next Google, or do I mean the next Bing ?
Article by: Mark Newton
Published in: Mark Newton