Adding Cross-Site Scripting Protection to ASP.NET 1.0

 

Scott Hanselman
Chief Architect
Corillian Corporation

November 2003

Summary: ASP.NET 1.1 added the ValidateRequest attribute to protect your site from cross-site scripting. What do you do, however, if your Web site is still running ASP.NET 1.0? Scott Hanselman shows how you can add similar functionality to your ASP.NET 1.0 Web sites. (12 printed pages)

Contents

The Problem
C#-Eye for the IL Guy
HttpModule
Programmer Intent
Installation and Configuration
The Results
Conclusion

The Problem

I've got a customer that has deployed a site on Microsoft® ASP.NET and the Microsoft® .NET Framework 1.0. It's a large site, and they are a large customer, and as a large customer they tend to move, well, slow. We were in the middle of a large deployment when ASP.NET/Framework 1.1 came out. The team felt that it was too risky to move everything over to ASP.NET/Framework 1.1 so close to the finish line. So we decided to move to ASP.NET/Framework 1.1 later in the year. However, since we build complex e-banking Web sites that cross many lines of business and deal with folks' money, security is job #1 (or job #0 if you're zero based). The client has a requirement that we deal with cross-site scripting (often called "XSS") attacks aggressively.

XSS is a particularly sinister kind of hacking, where an l33t hx0r (elite hacker) or a "script kiddie" tries to retrieve personal information or fool a site into doing something it shouldn't do by entering JavaScript into a Web Form, or by encoding the script into a parameter in the URL. A simple example is a Web Form that has a single text box and a single button. The user enters their name into the text box and submits the form. The page then prints out "Hello firstname" ** by string concatenation, String.Format, a Response.Write or through a server-side label.

ms972967.scriptingprotection_fig01(en-us,MSDN.10).gif

Figure 1. Entering text; seems safe enough

Since the page takes the users input and directly "regurgitates" it, if I entered a swear word, I'd get a different kind of greeting! But what happens if instead of entering their name, the user enters a script fragment like "<script>alert('bad stuff happens');</script>." The code behind looks like this:

if (this.IsPostBack) Response.Write("Hello " + this.TextBox1.Text);

You can see that the contents of the text box will be written directly out to the response stream and the JavaScript will be evaluated on the user's browser! This is a trivial example, but imagine if the malicious JavaScript contained code to access the user's cookies collection or redirect a form post to another site?

ms972967.scriptingprotection_fig02(en-us,MSDN.10).gif

Figure 2. Entering JavaScript where text is expected

ms972967.scriptingprotection_fig03(en-us,MSDN.10).gif

Figure 3. JavaScript executes on response

For simplicity's sake, we'd rather not build extra complexity into our Web tier or business logic to deal with someone entering JavaScript into a form field or some other chicanery. We'd like to deal with XSS in some central way, perhaps as a filter, earlier in the HTTP worker request chain, certainly before the actual page executes. Well, ASP.NET 1.1 includes a new @Page directive to do just this! Input validation is turned on by default, and can be controlled with the ValidateRequest attribute of the @Page directive.

<%@ Page language="c#" Codebehind="WebForm1.aspx.cs" 
    ValidateRequest="true" AutoEventWireup="false" 
    Inherits="Junk.WebForm1" %>

ASP.NET 1.1 request validation catches malicious scripting code in the Cookie Collection, the QueryString, and Forms Posts. It checks all input data against a list of potentially dangerous values. In case you're worried that this kind of validation will impair functionality for your users in some way, let me assure you that if your users are entering JavaScript into your form fields, they're not the kind of users you want. ValdidateRequest=true won't hamper your users experience in any way. If malicious script is detected in some input data, an HttpRequestValidationException is thrown. You can certainly catch this error in the Global.asax and replace the default error page with your own personal threats if you'd like.

It's great that ASP.NET 1.1 has included this powerful filter for free, but it doesn't help me and my client's pending ASP.NET 1.0 site launch. How can I protect against cross-site scripting with ASP.NET 1.0 while I wait for my client to upgrade? We kicked around a few ideas like writing some regular expressions and searching the HTTP Headers in Application_BeginRequest, but none of our ideas felt good. I also reminded myself that I work for an e-finance company, not a company that makes components to prevent cross-site scripting attacks. No need for me to attempt to reinvent the wheel.

Then I realized that I had the solution sitting right in front of my face; ASP.NET 1.1 had already solved this problem, I just needed to solve the problem backwards. So, I decided to back-port the existing 1.1 to ASP.NET 1.0

C#-Eye for the IL Guy

In order to explore what was going on inside ASP.NET 1.1, I needed a tool that was a little higher level than ILDASM.EXE, the .NET disassembler included with the .NET Framework SDK. Were I a smarter person, perhaps I could take System.Web apart with only ILDASM, but reading IL is non-trivial and I had a schedule. I found that tool in Lutz Roeder's Reflector. Reflector is an object browser that gives you a great tree view of all the namespaces and classes that the Base Class Library (BCL) provides.

ms972967.scriptingprotection_fig04(en-us,MSDN.10).gif

Figure 4. Looking at the CrossSiteScriptingValidation class in Reflector

ms972967.scriptingprotection_fig05(en-us,MSDN.10).gif

Figure 5. Exporting source code for the CrossSiteScriptingValidation class

However, where Reflector really shines is in its ability to decompile .NET assemblies and present the results, not as IL, but equivalent C# or Microsoft® Visual Basic® .NET code. Of course, some obvious fidelity is lost in the process, such as local variable names, but that's life (and code).

So, I ran around in System.Web until I found an internal class called CrossSiteScriptingValidation. Sounded promising. This is where the tough questions are answered, such as IsDangerousString or IsDangerousScriptString. All the methods in CrossSiteScriptingValidation return booleans; true on most qualifies as dangerous. But what strings are we evaluating and who calls this utility class? Seemed to me that the answer would lie in HttpRequest as we are attempting to validate all requests.

HttpRequest contains collections for Form variables, Cookies, and the QueryString. These objects of type NameValueCollection (cookies is actually an HttpCookieCollection, which has some trivial extra stuff), so if your URL is https://localhost/junk/test.aspx?id=3, then the QueryString collection would contain an entry for the name ID with the value 3. HttpRequest has a public get property for this collection, so when you code Request.QueryString, you're accessing that property. Here's where it all happens. When the collection is accessed for the first name, it's checked for dangerous strings through ValidateNameValueCollection. If an HttpRequestValidationException isn't thrown, the now valid QueryString is returned and a flag is set to avoid the overhead of checking the collection again.

if (this._flags[1] != null)
{
    this._flags[1] = 0;
    this.ValidateNameValueCollection(this._queryString,
               "Request.QueryString");
}
return this._queryString;

Validation code like this is all through the HttpRequest collections in ASP.NET 1.1. Of course, since I want a solution that runs on ASP.NET 1.0, and I can't override the behavior of the Forms, QueryString and Cookie collections, I'll need to find another opportunity within the call stack to validate the collections.

HttpModule

An HttpModule seemed the perfect choice. A simple custom public class that implements IHttpModule. The IHttpModule interface consists of only two methods, Init() and Dispose(). Init() is called once by ASP.NET with the HttpApplication as the only parameter, and is my opportunity to hook up any event handlers to the application. For performance reasons, I wanted to make sure that my cross-site scripting validation code only ran once and ran before and independently from the page and associated business logic.

The HttpApplication has these events that fire in the order shown:

  1. BeginRequest
  2. AuthenticateRequest
  3. AuthorizeRequest
  4. ResolveRequestCache
  5. [A handler (a page corresponding to the request URL) is created at this point.]
  6. AcquireRequestState
  7. PreRequestHandlerExecute
  8. [The handler is executed. In our case the Page]
  9. PostRequestHandlerExecute
  10. ReleaseRequestState
  11. [Response filters, if any, filter the output.]
  12. UpdateRequestCache
  13. EndRequest

It looks like the time to run the validator is during the PreRequestHandlerExecute event handler, just before the page itself. If I find something potentially dangerous and throw an exception, the page will never run. This is the desired behavior.

So, I created a class called ValidateInput that implements IHttpModule and in the Init() hooks up an EventHandler for PreRequestHandlerExecute to call my custom function, ValidateRequest. It will be inside ValidateRequest where I'll call the functions I'll bring over from ASP.NET 1.1.

I'll also add a quick version check to make sure no one tries to use this module on ASP.NET 1.1. I'd hate to have someone forget to remove this module when we upgrade to 1.1.

public class ValidateInput : IHttpModule
{
HttpContext context;
HttpApplication application;
public ValidateInput(){}
public void Init(HttpApplication app)
{
    Version v = System.Environment.Version;
    if (v.Major != 1 && v.Minor != 0)
        throw new NotSupportedException(@"The ValidateInput HttpModule is 
           not supported on this version of ASP.NET. 
           Remove it from your Web.config file!");
    app.PreRequestHandlerExecute += new EventHandler(this.ValidateRequest) ;
}

I hooked up PreRequestHandlerExecute to my class's ValidateRequest method. Since I can't hook into the Forms, QueryString, and Cookies collections, I'll need to do all the request validation here in order to make sure that only validated requests are passed to my Page handler.

public void ValidateRequest(Object src, EventArgs e)
{
   //Store away what may be useful during this Request...
   application = (HttpApplication)src;
   context = application.Context;
   this.ValidateNameValueCollection(context.Request.Form, "Request.Form"); 
   this.ValidateNameValueCollection(context.Request.QueryString, 
          "Request.QueryString"); 
   this.ValidateCookieCollection(context.Request.Cookies);
}

In ValidateRequest I called my own implementations of ValidateNameValueCollection and ValidateCookieCollection. Each of them spins through the already parsed collections representing the Form POST data, including pre-parsed Cookies and the QueryString.

It's important to know that the parsing of this HTTP header data and organizing into NameValueCollections is safe, as any potentially malicious data from the request hasn't reached the Page handler or browser yet. Additionally, if I had chosen the BeginRequest application event instead of PreRequestHandlerExecute, I'd have had to parse the raw HTTP request myself. So, I get the best of both worlds, tedious parsing has been done for me (and is already in well-tested code) and the page hasn't executed yet, giving me time to possibly throw an exception and stop execution of the request.

Next I pulled all the other helper functions into my new class, including IsDangerousExpressionString, IsDangerousOnString, IsDangerousScriptString, IsDangerousString, and IsAtoZ from Reflector. It's worth mentioning that the decompiled C# code that Reflector shows is actually a new C# representation of the IL contained in the assembly. The local variable names have been changed, and what was once a loop may now be a series of goto and if statements. Don't judge the writer of the code from the IL representation! Remember that the compiler needs to take liberties when generating the final IL and what's more important is the concept of programmer intent. I'll talk about this a little later below.

ms972967.scriptingprotection_fig06(en-us,MSDN.10).gif

Figure 6. Looking at the IsAtoZ method

Now, we'll need a custom Exception class that derives from ApplicationException that shall be aptly named HttpRequestValidiationException. This coincidentally is the same name that ASP.NET 1.1 uses, but in a different namespace. This exception will be thrown if any potentially dangerous-looking script appears in the HttpRequest. If you chose to show the exception page or log the exception, it's up to you. Some might feel that a potential script attack is a significant event and may chose to handle this exception differently. Either way, be sure to have an exception-handling strategy in place.

Programmer Intent

I wanted to mention a little something about the intent of the programmer. What has really been decompiled here is the programmer's intent. We're not actually looking at the C# source code as the original writer wrote it. When decompiling to IL, then converting to a C# representation of that same IL, things change. For example, a bit of code from IsDangerousOnString looks like this in Reflector:

goto L_0045;
L_0040:
index = (index + 1);
L_0045:
if (index >= len)
{
goto L_005E;
}
if (CrossSiteScriptingValidation.IsAtoZ(s[index]))
{
goto L_0040;
}

This is hard to read for the average programmer, but it correctly conveys the programmer's intent. But just what was that intent? We can "fold" the code back up only so far. It might have been a call to String.IndexOf that was in-lined for all we know. However, we can rewrite it like this (or a half dozen other ways) so that we might better understand it:

//Programmer intent: look for non-alphas...
while (index < len)
{ 
   if (!CrossSiteScriptingValidation.IsAtoZ(s[index]))
   break;
   index++;
}

Remember, "Gotos considered harmful" only applies to YOU, not the compiler! Note also that this code could also have been expressed as a "for" loop or some other looping construct, and the intent is still correctly expressed.

Installation and Configuration

To install ValidateInputASPNET10 on the Web server, we'll need only to add it to the list of httpModules configured in our web.config. The assembly, in this case ValidateInputASPNET10.dll needs to reside in the \bin folder of our site, and any other sites on our box that we wish to protect.

<configuration>
    <system.web>
        <httpModules>
            <add name="ValidateInput" 
               type="Corillian.Web.ValidateInput,ValidateInputASPNET10" />
        </httpModules>
    </system.web>
</configuration>

The Results

When I add the HttpModule to the web.config, I'll be able to launch the same ASP.NET application without recompiling, since the HttpModule is its own assembly and Microsoft® Visual Studio® .NET project. On start up, ASP.NET will call Init() on the new ValidateInputASPNET10 HttpModule, and it will chain to the PreRequestHandlerExecute Event. If I try to enter JavaScript into the Form (or QueryString or Cookies Collection) as before, I'm presented with this error message declaring an HttpRequestValidationException. Notice that part of the JavaScript is shown, but only part; we don't want the error message to output and execute the same JavaScript we are trying to protect ourselves from.

ms972967.scriptingprotection_fig07(en-us,MSDN.10).gif

Figure 7. Protecting your Web site from script input

Note Remember, decompiling should be used primarily for debugging and your personal education. Be sure to be aware of intellectual property rules and remember that just because unobfuscated assemblies are easier to decompile than C++ applications, this doesn't give us carte blanche to swipe code. If you're concerned about your code and intellectual prosperity, take a look at the Dotfuscator Community Edition that ships with Visual Studio .NET 2003.

Conclusion

Cross-site scripting is one of the many types of hacks you need to worry about when creating ASP.NET Web sites. Hackers can use this technique to execute code on the server, possibly leading to loss of data, or worse, theft of customer information. Defensive programming demands you protect yourself from these attacks. Adding validation to input, as done in this article, is a first step towards protecting your Web site.

About the Author

Scott Hanselman is Chief Architect at the Corillian Corporation, an e-finance enabler. He has over a decade of experience developing software in C, C++, Visual Basic, COM, and currently Visual Basic .NET and C#. Scott is proud to have been appointed the MSDN Regional Director for Portland, Oregon for the last three years, developing content for, and speaking at Developer Days and the Visual Studio .NET launch in both Portland and Seattle. Scott also spoke at the Microsoft® Windows Server™ 2003 and Visual Studio .NET 2003 launches in 4 cities. He's spoken internationally on Microsoft technologies, and has co-authored two books from Wrox Press. In 2001, Scott spoke on a 15-city national tour with Microsoft, Compaq, and Intel featuring Microsoft technologies and evangelizing good design practices. This year Scott spoke at the Windows Server 2003 launch event in 4 PacWest cities, at TechEd in the U.S. and in Malaysia, and at ASPLive in Orlando. His thoughts on the Zen of .NET, Programming and Web services can be found at http://www.computerzen.com.

© Microsoft Corporation. All rights reserved.