When Written: Aug 2009
In a previous article I wrote about how to enable full text indexing on a SQL Server which would allow you to query large blocks of text stored in a database in a fast and very flexible way. A use for this would be if you were building a blogging type of web application. Often you will want to store the text as HTML to allow formatting; the drawback to this is that the indexing system will also index all the HTML tags.
Now you could add all these tags to the ‘noise list’ of words that the indexer will ignore, but this is a clumsy and flawed way of going about things as tags like ‘image’ would have to be on the list and of course that word may well be a valid part of your text. Adding ‘image’ to the ‘noise list’ would mean a user searching for the phrase ‘personal image’ for example would be returned no results. Thankfully there is a better and easier way to stop the html tags being indexed. SQL Server free text indexer has an HTML filter. If you store your HTML text in a text field such as varchar then the indexer will assume, quite reasonably that it is plain text and index everything except words in the ‘noise list’.
If however your text is stored in a binary format of a particular type ( we will come to this later ) then the indexing service will apply the HTML iFilter and ignore all the HTML tags. Unfortunately storing your HTML text this way means that the controls on your web page will not be able to display the text without some help from your code. The easiest way to achieve all this is to leave your content stored in your database as text and create an extra field of type binary to store the binary version of the text you want to be indexed. This can be several fields combined into this single field if you wish. The HTML content has to be Unicode, this is important otherwise this technique will not work, but with a standard SQL server setup it should be.
You need to add two fields to your database table, one field will be of type varbinary(max) and this will hold the content to be indexed, the other field is of type Char(3) and holds text that describes the type of data held in the binary field so that the indexing service knows what sort of filter to apply, in our case this will be ‘htm’. For our example we will call these fields ‘ DATAFIELD’ and ‘DATATYPE’ respectively. Once these fields are setup all you need to do is copy the text you want to index into the binary field and add the text ‘htm’ to the other field. The query you use to do this is a little different and will look something like this:
UPDATE MYTABLE SET DATAFIELD = (0xFFFE+convert(varbinary(max),(isnull(TEXTFIELD1,'')) ,0 )), DATATYPE = 'htm'
The strange bit with 0xFFEE is necessary when you are storing Unicode in a binary field, if you leave this out the indexer will not index correctly and it will return zero results when queried. Obviously you need to tell the indexing service to index this column using the command:
EXEC sp_fulltext_column 'MYTABLE', 'DATAFIELD', 'add', 'DATATYPE'
Then enable change tracking so that the indexing is updated when data is changed:
EXEC sp_fulltext_table ‘MYTABLE’,'start_change_tracking'
And finally turn the indexing service on with:
EXEC sp_fulltext_table ‘MYTABLE’,'activate'
Nothing too difficult there to get it all working but hardly intuitive and it took a little finding, perhaps a wizard from Microsoft might be in order?
Whilst thinking about useful functionality that Microsoft like to leave out of certain versions for their products it seems that they have a penchant for leaving out any way to backup their basic offerings. This is the case with Vista and although a way to backup SQL Server 2005 Express has been provided, not being able to setup an automated backup of a database is a major weakness, particularly on a web server where regular backups are so important. So I thought I would have a look to see if a way could be found to do this without having to upgrade the program.
The problem is that SQL Agent is not included with the Express version so any way of automating processes needs to be done via the task scheduler. There is also a mechanism for running SQL commands via the command line using a program called SQLCMD so hopefully by combining these two items we should be able to build our own scheduled backup system. SQLCMD is not installed by default so you may need to go back and do a reinstall and select it from the advanced options. If everything is installed correctly then first we need to create a script for running the backup command via SQLCMD, which is easily done via the GUI interface of SQL Server Management Studio Express. This is the free tool that enables you to create databases and tables and most other objects. You can also run a manual backup from this tool by right clicking on your database and selecting ‘all tasks’. The sneaky bit here is that once you have used the twin GUI screens to set your options for the backup you can then ask it to script that selection. Save this script to a text file, then test the script by opening a command window and typing:
“C:\Program Files\Microsoft SQL Server\90\Tools\Binn\SQLCMD.EXE” -S [SERVER] -U backupadmin -P [PASSWORD] -i C:\sqlscripts\myfullbackup.sql
Where [SERVER] is the name of the server with SQL on it, backupadmin is the user with permissions to create a backup and [PASSWORD] is their password. The file ‘myfullbackup.sql’ is the one you created earlier. If the command runs fine just open task scheduler and add a task, when prompted for a program then put in the command line you just tested, set the times for the task to run and there you have it, a scheduler automated backup in SQL server Express 2005. This missing component from the Express version of SQL Server is called SQLAGENT which can also send emails out based on various status, for example the success or otherwise of the backup, it is a great pity that this is missing but then I guess if you want this and other functionality then you are just going to have to pay for it!
Article by: Mark Newton
Published in: Mark Newton