When Written: Aug 2008
Those of you who have to manage web sites are probably aware of the increase in the number of attacks of the type called ‘Cross – Site – Scripting’ ( XSS). So many high profile sites and commercial products have been found to be vulnerable to such attacks. See http://www.xssed.com if you don’t want to sleep at night! I thought that it would be worthwhile to look at this problem and how to solve it in detail. XSS attacks come in a variety of forms, but let’s look at a very simple type of attack. The examples here are shown in ASP but all the other web languages are vulnerable as it is not a weakness of the language or operating system but of your code. Yes this one is your fault! The attacks are caused by the calling URL to the web page being changed and embedding a script into it. For example imagine a web page that displays items from a database; say this page is called ‘showitems.asp’ and needs to know which item to display so it takes a parameter called ‘id’ which is the item number to display, so we would have a URL like:
Http://www.yoursite.com/showitems.asp?id=1234
For clarity let us keep things simple and assume that somewhere on the page we show this item value. We would use code like:
<% response.write(request.querystring(“id”)) %>
On our web page we would see the value of ‘1234’ displayed. Now let us consider if someone modified the calling URL to:
Http://www.yoursite.com/showitems.asp?id=<script>alert(“Hello World”)</script>
What happens now is our code will insert
<script>alert(“Hello World”)</script>
into the HTML source of our page. The browser will render this and a pop up dialog box will appear with “Hello World” in it. If this happens then that web page is susceptible to an XSS attack. Now you may be thinking that making a popup appear is no big threat, but consider what is actually happening. Someone from outside is being able to write unauthorised code into your web pages! You can see that the possibilities for attacks are considerable! I have no intention of showing you how to launch such an attack but trust me it happens. If you have access to the log files from your web server then an examination of these will reveal these attempts. Don’t think that because you don’t use a database that you are exempt. Any page that displays information from a previous page could be vulnerable.
The way to stop these attacks is to always test the data that you are reading from the previous page. Not just the data contained in the URL but also cookies as these could be hijacked from a previous page with an XSS attack. So if your code is only expecting a numerical value then only let numeric values through. If you are expecting a string then examine it, perhaps check the length and discard anything else, or remove any characters that should not be in the string. For example the characters ‘<’ ‘>’ could be removed as could ‘;’ which is often used in SQL injection attacks as is the ‘ character.
One does wonder why, when such characters are so potentially dangerous and can be removed with no effect on the running of a web site, then why doesn’t the programming language block them by default or perhaps the web server could? I wondered that as well and have no answer. For users with IIS6 I have found a very good add-in which will stop SQL injection attacks, which saves you having to go through a web site checking for vulnerable code. This plug-in is available at http://www.codeplex.com/IIS6SQLInjection and works as an ISAPI filter. The source code is available so you can modify it if you want. I used it on an old site of a client’s which was getting attacked and it solved the problem quickly, giving the programmers time to fix their code just in case.
The easiest way of correcting your web site if it appears to be vulnerable to XSS attacks is to write a function that will remove the selected characters. To make things easy to manage probably the best way to do this is to create a file which contains a function that can be called from within your code. Set your web pages to ‘include’ this file using the code:
<!–#include file=”Useful.asp” –>
Then if you need to add extra routines in the future to protect your site you only have to edit this file rather than every page. You can also re-use it in further web sites.
The code in this ‘useful.asp’ file needs to contain something like :
Function stripTags(HTMLstring)
HTMLstring = Replace(HTMLstring,”<“,””)
HTMLstring = Replace(HTMLstring,”>”,””)
HTMLstring = Replace(HTMLstring,”onmouseover”,””)
HTMLstring = Replace(HTMLstring,”‘”,””)
HTMLstring = Replace(HTMLstring,”‘”,””)
HTMLstring = Replace(HTMLstring,”;”,””)
stripTags = HTMLstring
End Function
What each line of this function does is to look for a particular character and then replace it with a null. The code is not particularly elegant because of a limitation of ASP, but it does work. A more elegant ASP .NET version can use the Regular Expression Object and so would look like:
Function stripTags(HTMLstring)
Set RegularExpressionObject = New RegExp
With RegularExpressionObject
.Pattern = “<[^>]>{};”
.IgnoreCase = True
.Global = True
End With
stripTags = RegularExpressionObject.Replace(HTMLstring, “”)
Set RegularExpressionObject = nothing
End Function
In php it would look like:
strip_tags ( string $str [, string $allowable_tags ] )
So back to our ASP example, on our web page which we have found to be vulnerable to XSS attacks we just need to change our code to :
<% response.write(stripTags(request.querystring(“id”))) %>
Now when the attack is tried the code
scriptalert(“Hello World”)/script
is inserted into our web page which apart from not looking very pretty, because it is not recognized as code without the ‘<’ ‘>’ it can do no harm.
The examples I have given are very simple ones deliberately because I remember when I was first looking into this problem it was difficult to see exactly how to stop these attacks. Lots of articles talked about validating the data coming from a previous page, but quite how much, or preferably how little, code one had to write to block this exploit was not clear. Now, just checking for ‘<’ and similar characters is sometimes not enough as browsers will decode other sequences of characters into these. For example:
< in Hex is %3C which in HTML is < is decimal value is < ( note no semicolons ) and its base64 is PA==. So stripping & # and = would also be a good move.
You could just make sure that you HTML encode any values before you display them. Using our previous example the code would be <% response.write(server.HTMLEncode(request.querystring(“id”)))%>
That way no data would be lost but there is a danger that if the ‘id’ querystring value was used elsewhere in your code then any rogue code embedded in it might execute. I feel that it is better to strip out the unwanted characters if you can.
Article by: Mark Newton
Published in: Mark Newton