An ASP.NET Framework for Human Interactive Proofs

 

Stephen Toub
Microsoft Corporation

September 2004

Summary: Stephen Toub introduces concepts involved in Human Interactive Proofs and creates a framework for their incorporation into your ASP.NET sites. (26 printed pages)

Download the MSDNHip.msi sample file.

Contents

Introduction
Reverse Turing Test
Human Interactive Proofs
Bringing HIP to ASP.NET
HipChallenge
ImageHipChallenge
HipValidator
AudioHipChallenge
Does It Work?
Related Books

Introduction

The Web is a dangerous place. Attackers come at your sites constantly from all sides, wielding powerful weapons in attempts to degrade, defile, compromise, shut down, or simply take advantage of your Web presence. You fight back with security patches, input validation checks, encryption, running with least privilege, reducing possible attack surface, and a myriad of other secure coding techniques such as those outlined in Michael Howard and David LeBlanc's excellent treatise on the subject, Writing Secure Code (Microsoft Press, 2003). But there are always new attacks and new angles, so the defenses must evolve as well.

One of the greatest tools in an attacker's arsenal is computing power. That may sound trite, but it's true. The speed at which a computer can send hundreds and thousands of requests to hundreds and thousands of sites is staggering. Often, this is used for attacks that in moderation might not be considered an attack at all. For example, Hotmail provides free e-mail accounts to the general public, something they're happy to do for any user who wants one (as evidenced by the hundreds of millions of registered Hotmail users). However, they're not happy to provide thousands of accounts to an attacker looking to use them to send unsuspecting users boatloads of spam. The problem for Hotmail and other sites in the same predicament is that it can be very hard, if not impossible, to differentiate a browser controlled by a grandmother looking to open an account to correspond with her grandson, and an attacker with a custom application looking to automatically open multiple accounts to send that grandson mail of a different sort. So what's the solution? How does a Web site differentiate human-initiated requests from requests being scripted by a program?

Reverse Turing Test

In 1936, Alan Turing published his now famous paper On Computable Numbers, with an Application to the Entscheidungsproblem in which he presented his idea for Turing machines (http://mathworld.wolfram.com/TuringMachine.html). In the early forties, along with Gordon Welchman, he developed a machine that could break the Enigma codes of the Luftwaffe. Truly an amazing guy. But in 1950, Alan Turing fundamentally changed the world of artificial intelligence by proposing what is now known as the Turing Test.

Turing believed that manmade computers would one day have abilities and intelligence rivaling that of humans. After all, the human brain is a form of a computer, albeit one based in biology rather than in silicon. He believed that asking whether computers can think is meaningless, that the true test of intelligence is whether a human could differentiate an artificial intelligence developed by man from man himself. To further his argument, Turing proposed an "imitation game." Put a computer in one room and a human in another, both of which are separated from a third party, a human interrogator, who doesn't know which room contains which contestant. This examiner poses questions to both rooms through a text-based interface, and when satisfied he guesses which room contains the computer and which room contains the human. If he is unable to come to the correct answer more than half the time, the computer passes the test and must be considered as intelligent as its human counterpart.

The Turing Test is based on the problem of a human differentiating between a computer and a human. To solve our problem, in a sense we need the reverse. A computer needs to differentiate between a computer and a human, a task many researchers would agree is significantly more complex. This scenario has formally been named the Reverse Turing Test (RTT), although the term has also been used in some circumstances to describe a similar scenario but where both contestants attempt to be recognized as a computer.

So how does a computer differentiate a human from a computer?

Human Interactive Proofs

In 2000, Yahoo was searching for an answer to this very problem. Their chief scientist, Udi Manber, recruited the help of Carnegie Mellon professor Manual Blum, and with the aid of one of Blum's Ph.D. students, Luis von Ahn, they created CAPTCHA. CAPTCHAs, or more specifically "Completely Automated Public Turing Tests to Tell Computers and Humans Apart," are based on the RTT scenario and are a kind of Human Interactive Proof (HIP), presenting to a user a puzzle that should be easily solvable by a human but difficult for a computer. In essence, they take advantage of the intelligence gap that exists between humans and computers. CAPTCHAs differ from the standard Turing Test in that they're "completely automated," meaning that the questions or puzzles posed to the user come from a computer, and thus must be able to be generated automatically. CAPTCHA-based systems are now in use all over the Web, from large Internet portals like Yahoo and MSN to smaller, individually run personal sites.

In their paper, Ahn and Blum proposed a few different types of CAPTCHA puzzles, including one that renders words in a distorted form (you've probably seen these more often than other kinds), asking the user to enter the presented word, and one that displays one or more pictures of a familiar entity (such as a monkey), also distorted, asking the user to name the contents of the pictures. Examples of these types of puzzles are shown in Figure 1 and Figure 2, respectively. Note that in both cases the distortion is important. In the former case it's necessary to thwart optical character recognition (OCR) software for recognizing the word. In the latter case it's necessary to prevent an attacker from cataloguing images known to be used by the server.

ms972952.hip_aspnet_fig01(en-us,MSDN.10).gif

Figure 1. What does this say?

ms972952.hip_aspnet_fig02(en-us,MSDN.10).gif

Figure 2. What kind of animal is this?

Microsoft Research has done a fair amount of investigation into these types of systems. For example, Yong Rui and Zicheng Liu have proposed a set of HIP design guidelines that ensure that a HIP puzzle is both secure and usable. They've also designed HIP puzzles based on the recognition of human faces and on the detection of human facial features. An example of such a puzzle is shown in Figure 3. Here, users must be able to locate a certain number of specific facial features within the image, such as four eye corners and two mouth corners. You can read Rui and Liu's paper at http://www.research.microsoft.com/~yongrui/ps/MMSJ04HIP.pdf.

ms972952.hip_aspnet_fig03(en-us,MSDN.10).gif

Figure 3. Find the facial features

There are many scenarios in which these puzzles can be put to good use. As mentioned, they can be an effective way of limiting automatic creation of user accounts. AltaVista successfully used similar systems to block more than 95 percent of automated attempts to add URLs to their search engine. Benny Pinkas and Tomas Sander of Hewlett Packard wrote a paper detailing how these systems could be effectively used on login pages to prevent online dictionary attacks (http://www.pinkas.net/PAPERS/pwdweb.pdf). CAPTCHAs have been used effectively to prevent comment spam on blogs and to prevent automated voting for online polls. A few companies now provide e-mail spam prevention applications and plug-ins based on this type of technology (for example, the mail server might force the sender of an e-mail to solve a puzzle before allowing the mail to be delivered). With a slight variation on the system developed in this article, puzzles could even be used to ensure that requests to a Web service are initiated by a user and not by an automated attacker.

No HIP implementation exists in ASP.NET out-of-the-box. However, creating such a system requires surprisingly little code.

Bringing HIP to ASP.NET

Good puzzles are central to the success of any CAPTCHA system, and yet the types of puzzles I present in this article (those based on image distortion) are relatively easy to break by someone determined to do so. Suffice it to say, however, that all puzzles will eventually be broken. Recognition research is progressing at such a rate that puzzles currently sufficient for HIP purposes may not be so in the near future, and many in widespread use have already been broken (though even "broken" puzzles can still be very useful on sites that aren't incredibly interesting to attackers). As such, the actual challenge presented to the user, while absolutely important, should initially be secondary to the framework used to deploy and render these puzzles. When the puzzles are later broken, new puzzles can be written and integrated into the existing framework without significant developer efforts related to refactoring the protected sites. Thus, this article focuses on designing a good framework for implementing HIP in ASP.NET.

When breaking down a HIP system into its core elements, one finds three main parts. The first part is the actual challenge, the puzzle presented to the user. Of course, the user must have a way to respond to the challenge, and thus the second part is a method of input whereby a user can answer with his solution. Third, the system must have a way of informing the user of her success or failure in solving the puzzle. Breaking down the system in this fashion makes it easy to craft a good framework for an ASP.NET solution. I've chosen to base mine on the ASP.NET control framework.

The download associated with this article includes two sample puzzle implementations, one based on a visual puzzle and one based on an aural puzzle (note that these puzzles have not been tested against the current state of image and audio recognition systems and exist purely as samples; more on this later). Both of these implementations are actually custom ASP.NET controls, derived from an abstract base class I've created named HipChallenge.

HipChallenge maps to the first of the three parts. All concrete puzzles derive from HipChallenge and can be plugged into an existing system with almost no work from a developer. ASP.NET provides validation controls that make short order of notifying a user when provided input is invalid, so the validation and user notification portion of the system is implemented as a custom ASP.NET validation control, HipValidator. This validator, like all other ASP.NET validators (RegularExpressionValidator, RequiredFieldValidator, and so on) is configured to interact with other controls on the page. In the case of HipValidator, it interacts with two. First, it can be configured to work with any HipChallenge-derived control on the page, validating user input against the puzzle presented by that control. It's also configured to monitor a developer-specified input control, this being the remaining piece of the system. That can be any control into which a user can supply textual input that could be validated against the challenge.

Thus, a page that implements CAPTCHA will have at least three controls: a HipChallenge-derived control that presents the challenge to the user, an input control such as a TextBox that accepts user input, and a HipValidator that coordinates the other two and validates whether the user successfully solved the puzzle.

HipChallenge

HipChallenge is the base class for any control that renders puzzles to the user. A derived class is only responsible for generating the output form of the puzzle, not the challenge content, with everything else handled by HipChallenge or by the related HipValidator control (to be discussed later). Thus, the core functionality of HipChallenge lies in choosing the content (usually a word or phrase) with which to puzzle the user, storing information about that content in a hidden control on the page, and providing a method to check the user's input on postback against the data stored in that hidden control. All of this functionality is implemented in three core methods supplemented by several helpers. The first and the simplest of the three is an override of Control.CreateChildControls. All this method does is create a hidden control on the page into which we can store data later in the page cycle.

protected override void CreateChildControls()
{
    _hiddenData = new HtmlInputHidden();
    _hiddenData.EnableViewState = false;
    Controls.Add(_hiddenData); 
    base.CreateChildControls();
}

The next method and the most important one for the generation of the challenge is an override of OnPreRender and deserves a bit of explanation. It's the responsibility of OnPreRender to choose the challenge text to be displayed to the user, to store information about that word to the _hiddenData control created in CreateChildControls, and then to pass along the responsibility of generating the puzzle to the derived class. But a problem lies in the second step just defined. I need to store information about the selected word to the client so that on subsequent requests I can validate the client's response, but I can't simply store the selected word itself in plaintext. Why? Because that would make it very easy for an automated application to obtain the word in question, simply by parsing the supplied HTML and searching for the word. Of course, to avoid sending the word to the user as part of the HTML, I could use server-side resources to maintain information for each client, but that could become expensive.

To solve this predicament, one approach is to encrypt the selected word. The encrypted information could then be stored into the hidden field created by CreateChildControls rather than storing the plaintext. As the .NET Framework provides the System.Security.Cryptography namespace complete with a wide-range of supported encryption protocols, adding this layer of protection is straightforward. However, whenever attempting to use encryption to protect secrets, one really needs to think about and examine the types of attacks that could be mounted against the system. As a prime example, I need to ask myself: does encryption really help here? Yes and no. Encrypting the selected word does make it extremely difficult, if not impossible, to determine the chosen word. But does that really prevent all attacks? Of course not. One possible attack against a CAPTCHA solution is to make a bunch of requests, creating a database of the puzzles presented, and allowing one or more people to then later iterate through and solve all of the puzzles in the database. With the puzzles solved, the automated application can resume its attack. It doesn't matter that the text is encrypted; the puzzle itself can still be solved by a human. To get around this attack, I need to ensure that the CAPTCHA is solved within a reasonable amount of time. So, rather than just encrypting the challenge text, an expiration time can be included in the encrypted content. When a user presents his solution to the server, not only is the data decrypted but this expiration time is checked against the current time. An expired solution is no better than an incorrect answer.

This might sound familiar to those of you who have used ASP.NET Forms Authentication. With forms authentication, information about an authenticated user is sent back and forth between the client and the server, usually in cookies. This information is encrypted and can include an expiration date that forces a user to re-authenticate after a predetermined length of time. Fortunately for me, this functionality is exposed through the FormsAuthentication class, specifically its static Encrypt and Decrypt methods, and I can take advantage of this functionality rather than reinventing the wheel. Plausible Encrypt and Decrypt methods are shown below.

internal static string Encrypt(string content, DateTime expiration)
{
    FormsAuthenticationTicket ticket = new FormsAuthenticationTicket(
        1, HttpContext.Current.Request.UserHostAddress, DateTime.Now,
        expiration, false, content);
    return FormsAuthentication.Encrypt(ticket);
}

internal static string Decrypt(string encryptedContent)
{
    try
    {
        FormsAuthenticationTicket ticket =
            FormsAuthentication.Decrypt(encryptedContent);
        if (!ticket.Expired) return ticket.UserData;
    }
    catch (ArgumentException) { }
    return null;
}

The Encrypt method creates a new FormsAuthenticationTicket containing the content to be encrypted along with expiration date and time. It then encrypts this ticket using FormsAuthentication.Encrypt and returns the resulting string, which can be embedded directly into the hidden field sent to the client. The Decrypt method decrypts the encrypted content (most likely obtained from either the hidden field on postback or from a query string generated from a derived class). If the ticket decrypts successfully and has not yet expired, the plaintext content is returned.

That's all fine and dandy. Encryption allows me to send the puzzle to the client, eliminating the burden of maintaining server-side resources. And it also allows me to limit how long a puzzle is valid. But there's a much more damaging attack that this doesn't prevent. What if the attacker solves the puzzle manually and then feeds the solution to his application? His automated attack can then use the solution over and over until the puzzle expires, which might not be for seconds or minutes depending on how the control is configured, and such an amount of time could permit a plethora of requests. In fact, to get around this problem, server-side resources are required. It's the only known way to prevent multiple submissions of the same solution. Have you ever bought something on the Web where the checkout page warned you not to click the submit button twice so that you didn't place your order twice? That's usually the result of the site not tracking which orders have already been submitted. A solution used by some of the more robust Web stores is to send to the client a unique identifier associated with the current shopping cart. When the shopping cart is submitted, that identifier is stored server-side indicating that it has already been submitted, and any future submissions including that ID are ignored. We can use that same solution to prevent an attacker from using a given puzzle and solution more than once.

The simplest way given the steps I've already discussed would be to store the encrypted text server-side in addition to sending it to the client. When authentication is performed, that value can be marked as used on the server, and any future authentication requests using that same encrypted text would fail. Of course, if we're going to store data server-side, there's little reason to also send the encrypted text to the client, given that the encrypted data is relatively large and there's no reason to provide an attacker with more information than she needs, even if that information is encrypted (for example, could she determine from the length of the encrypted data how long the puzzle word is?). So, if preventing this attack is important to you (which it probably should be given the scenarios in which HIP is usually used), an easy solution is as follows, and is the one implemented in the associated sample code.

In OnPreRender, I select the challenge text and generate a unique identifier. I then use that identifier to store the selected text server-side (the storage mechanism doesn't matter for the purposes of this explanation. I'm using the ASP.NET Cache, however if you'll be deploying HIP in a Web farm environment, you'll most likely need some form of shared state between the servers, such as a SQL Server database). Instead of encrypting and storing the text to the _hiddenData control, the ID is stored to _hiddenData. An attacker gains no information about the selected text from this randomly-generated ID. The text and the ID are then passed to the derived class so that the puzzle can be generated.

protected sealed override void OnPreRender(EventArgs e)
{
    string content = ChooseWord();
    Guid id = Guid.NewGuid();

    SetChallengeText(id, content, DateTime.Now.AddSeconds(Expiration));
    _hiddenData.Value = id.ToString("N");
    RenderChallenge(id, content);

    base.OnPreRender(e);
}

SetChallengeText simply stores the content to the ASP.NET Cache using the ID as a key (notice that the expiration concept is still employed here by removing this puzzle text from the cache after the specified expiration delay). Its counterpart GetChallengeText takes an ID and returns the associated challenge text if any can be found.

The last important method on HipChallenge is Authenticate.

    internal bool Authenticate(string userData)
    {
        
    if (_authenticated == true) return _authenticated;


        
    if (userData != null && userData.Length > 0 &&

            
    _hiddenData.Value != null && _hiddenData.Value.Length > 0)

        
    {
            try
            {
                Guid id = new Guid(_hiddenData.Value);
                string text = GetChallengeText(id);
                if (text != null && string.Compare(userData, text, true) == 0)
                {
                    _authenticated = true;
                    SetChallengeText(id, null, DateTime.MinValue);
                    return true;
                }
            }
            catch(FormatException){}    
        
    }
        
    return false;

    }

This method is called during validation of user input to determine whether the user-entered word matches that used to generate the puzzle. It obtains the challenge ID from the hidden field and passes it to GetChallengeText in order to get the text for the user's puzzle. If the text is found and if it matches the user-supplied solution, authentication succeeds. In order to prevent the same solution from being used for the same ID multiple times, a successful authentication also results in removing the ID and its associated text from the cache. Of course, by doing so this also prevents Authenticate from operating correctly twice in the same request. In order to fix that, the result of Authenticate is cached in the authenticated private member variable (which is initially false). After the correct userData has been supplied to Authenticate, any additional calls to Authenticate in the same HTTP request will return true. Since _authenticated is not static, future HTTP requests (resulting in a new instance of HipChallenge) will still have to authenticate.

HipChallenge also exposes a few additional methods and properties. The Expiration property allows a developer to configure the number of seconds until a puzzle expires (the default is 120, or two minutes). The Words property exposes a StringCollection that should be populated with the domain of words from which puzzles can be rendered. Alternatively, a derived control can override the ChooseWord method in order to further customize how the base class selects the next word for a puzzle. HipChallenge also implements a few protected random number generation methods for retrieving random integers and doubles. All of these methods are wrappers around my RandomNumbers class, which in turn wraps System.Security.Cryptography.RNGCryptoServiceProvider, providing Next and NextDouble methods similar to those exposed by System.Random. As you can probably tell from the namespace, RNGCryptoServiceProvider is a cryptographically-strong pseudo-random number generator, where as Random is not.

ImageHipChallenge

The ImageHipChallenge control presents a visual puzzle to the user. In its current implementation, this is simply distorted text over a gradient background. The control derives from HipChallenge and is declared as follows:

[ToolboxBitmap(typeof(ImageHipChallenge), "msdn.bmp")]
[ToolboxData("<{0}:ImageHipChallenge Runat=\"server\"" +
    "Height=\"100px\" Width=\"300px\" />")]
public class ImageHipChallenge : HipChallenge
{
   ...
}

The ToolboxBitmapAttribute informs Visual Studio .NET what image I'd like to use in the toolbox for the control ("msdn.bmp", which is compiled into the assembly as an embedded resource), and the ToolboxDataAttribute tells the designer what markup to generate for the control when it is initially added to a page.

As mentioned, when rendered on a page, ImageHipChallenge needs to generate an image link to ImageHipChallenge.aspx (or to whatever endpoint URL has been configured using the control's RenderUrl property). Two methods are involved in this step. First, the control overrides Control.CreateChildControls in order to add an Image control. This is what will end up rendering the img tag when the control's Render method is called. Second, it overrides HipChallenge.RenderChallenge in order to properly configure the ImageUrl property of the Image control created in CreateChildControls.

protected sealed override void CreateChildControls()
{
    base.CreateChildControls();

    // Make sure that the size of this control has been properly defined.
    if (this.Width.IsEmpty || this.Width.Type != UnitType.Pixel ||
        this.Height.IsEmpty || this.Height.Type != UnitType.Pixel)
    {
        throw new InvalidOperationException(
            "Must specify size of control in pixels.");
    }

    // Create and configure the dynamic image.
    _image = new System.Web.UI.WebControls.Image();
    _image.BorderColor = this.BorderColor;
    _image.BorderStyle = this.BorderStyle;
    _image.BorderWidth = this.BorderWidth;
    _image.ToolTip = this.ToolTip;
    _image.EnableViewState = false;
    Controls.Add(_image);
}

protected sealed override void RenderChallenge(Guid id, string content)
{
    // Generate the link to the image generation handler
    _image.Width = this.Width;
    _image.Height = this.Height;
    _image.ImageUrl = _renderUrl + "?" + 
        WIDTH_KEY + "=" + (int)Width.Value + "&" + 
        HEIGHT_KEY + "=" + (int)Height.Value + "&" +
        ID_KEY + "=" + id.ToString("N");
}

The base HipChallenge control passes to the RenderChallenge method both the plaintext word content as well as the ID of the challenge. ImageHipChallenge only uses the latter because of its delayed image generation mechanism, but another implementation might use the former or even both. The width and height are appended to the RenderUrl URL along with the challenge ID. This URL is then set as the URL for the Image control, and that's all that is required for this request.

When the browser receives the page's rendering, it'll find an img tag with a src attribute that points back to ImageHipChallenge.aspx. As such, my solution needs to handle requests for this endpoint. To do so, I've created ImageHipChallengeHandler, an IHttpHandler that can generate CAPTCHA images based on the width, height, and challenge ID parameters provided on the query string. To configure this handler, all that is required on the part of the developer is to tell ASP.NET that any requests for the specified endpoint should be handled by an instance of ImageHipChallengeHandler, which she can do by modifying the Web.config to include the following:

<httpHandlers>
    <add verb="*" path="ImageHipChallenge.aspx"
        type="Msdn.Web.UI.WebControls.ImageHipChallengeHandler, Hip"/>
</httpHandlers>

With that in place, any requests for ImageHipChallenge.aspx will be routed to an instance of ImageHipChallengeHandler and handled by its ProcessRequest method, shown here:

public void ProcessRequest(HttpContext context)
{
    // Retrieve query parameters and the challenge text
    NameValueCollection queryString = context.Request.QueryString;
    int width = 
        Convert.ToInt32(queryString[ImageHipChallenge.WIDTH_KEY]);
    if (width <= 0 || width > MAX_IMAGE_HEIGHT) throw new
        ArgumentOutOfRangeException(ImageHipChallenge.WIDTH_KEY);
    int height =
        Convert.ToInt32(queryString[ImageHipChallenge.HEIGHT_KEY]); 
    if (height <= 0 || height > MAX_IMAGE_HEIGHT) throw new
        ArgumentOutOfRangeException(ImageHipChallenge.HEIGHT_KEY);
    string text = HipChallenge.GetChallengeText(
        new Guid(queryString[ImageHipChallenge.ID_KEY]));

    if (text != null)
    {
        // We successfully retrieved the information, so generate 
        // the image and send it to the client.
        HttpResponse resp = context.Response;
        resp.Clear();
        resp.ContentType = "img/jpeg";
        using(Bitmap bmp = GenerateImage(
            text, new Size(width, height)))
        {
            bmp.Save(resp.OutputStream, ImageFormat.Jpeg);
        }
    }
} 

Upon receiving the request, the method obtains the width, height, and challenge ID from the query string. It then uses the ID to retrieve the challenge text, which will succeed only if the ID is valid and if the content hasn't expired from the cache. Assuming text is retrieved, the current HttpResponse is cleared and its ContentType set to "img/jpeg", informing the browser that the content being sent is a JPEG image. A new image is then dynamically generated and saved to the HttpResponse's OutputStream, sending the image to the client. Note that the ContentType and the ImageFormat used in the Image.Save method aren't important as long as they both refer to the same file format. Thus, instead of "img/jpeg" and ImageFormat.Jpeg, I could have used "img/gif" and ImageFormat.Gif.

RNGCryptoServiceProvider is used as a source of randomness when generating the images. First, a new Bitmap is created of the specified width and height and a Graphics surface is created around that Bitmap. Two 24-bit colors are randomly generated and are used as the two endpoint colors for a LinearGradientBrush, which is used to fill the bitmap. The brush is also configured with a random gradient angle from 0 to 360 degrees. With the background drawn, a FontFamily is chosen at random from those available (while I could have selected one at random from those installed on my system using FontFamilies.Families, I chose to hardcode a short list of families for ease of implementation but more so that I wouldn't end up choosing a symbol font that would render text practically impossible for a user to decipher). A font size is chosen based on the space available, and the text is drawn into the center of the image using a new LinearGradientBrush randomized just as was the one used for the background. After the text has been drawn, distortion is added to the image by moving the pixels around in a simple wave-like fashion:

for (int y = 0; y < height; y++)
{
    for (int x = 0; x < width; x++)
    {
        int newX = (int)(x + (distortion * Math.Sin(Math.PI * y / 64.0)));
        int newY = (int)(y + (distortion * Math.Cos(Math.PI * x / 64.0)));
        if (newX < 0 || newX >= width) newX = 0;
        if (newY < 0 || newY >= height) newY = 0;
        b.SetPixel(x, y, copy.GetPixel(newX, newY));
    }
}

The distortion amount is also chosen randomly. With all of this randomness, the same word will look different every time it's rendered. For example, Figure 4 shows two different renderings of the word "word."

ms972952.hip_aspnet_fig04a(en-us,MSDN.10).gifms972952.hip_aspnet_fig04b(en-us,MSDN.10).gif

Figure 4. Two random renderings of "word" by ImageHipChallengeHandler

And Figure 5 shows two different renderings of the word "excel."

ms972952.hip_aspnet_fig05a(en-us,MSDN.10).gifms972952.hip_aspnet_fig05b(en-us,MSDN.10).gif

Figure 5. Two random renderings of "excel" by ImageHipChallengeHandler

HipValidator

HipValidator is the simplest of all of the controls in my solution. It derives from BaseValidator, overriding one method from the base class, EvaluateIsValid, and adding one additional property which allows the user to specify with which HipChallenge the validator is associated (note that it inherits ControlToValidate from BaseValidator, which allows it to be hooked up to an input control).

[TypeConverter(typeof(HipChallengeControlConverter))]
[Category("Behavior")]
public string HipChallenge
{
    get { return _hipChallenge; }
    set { _hipChallenge = value; }
}

private HipChallenge AssociatedChallenge
{
    get
    {
        if (HipChallenge == null || HipChallenge.Trim().Length == 0) 
            throw new InvalidOperationException(
                "No challenge control specified.");
        HipChallenge hip = 
            NamingContainer.FindControl(HipChallenge) as HipChallenge;
        if (hip == null) throw new InvalidOperationException(
            "Could not find challenge control.");
        return hip;
    }
}

protected override bool EvaluateIsValid()
{
    // Get the validated control and its value. If we can get a value, 
    // see if it authenticates with the associated HipChallenge.
    string controlName = base.ControlToValidate;
    if (controlName != null)
    {
        string controlValue = base.GetControlValidationValue(controlName);
        if (controlValue != null && 
          ((controlValue = controlValue.Trim()).Length > 0))
        {
            return AssociatedChallenge.Authenticate(controlValue);
        }
    }
    return false;
}

EvaluateIsValid simply retrieves the ControlToValidate from the base class and gets its validation value (the validation value is the value of the property specified through the ValidationProperty attribute attached to the control, which in the case of the TextBox is the Text property). This value is then passed to the associated HipChallenge control's Authenticate method, returning the result.

The only other interesting thing to note is the TypeConverter attribute applied to the HipChallenge property. TypeConverters have two related uses. The first is to convert values from one type to another at runtime, for example converting a System.Drawing.Point to and from a string value. The second is to aid in design-time property configuration. For example, the System.ComponentModel.EnumConverter is used automatically for any Enum-typed properties shown in a PropertyGrid. This allows the user to select the value of the enumeration from a drop-down in the grid. In order to aid a developer at design-time using the HipValidator, I've created a special TypeConverter-derived class, HipChallengeControlConverter, that allows the developer to easily select an existing HipChallenge-derived instance on the page. Instead of having to manually type the ID of the control into the box in the PropertyGrid, a drop-down list is shown that lists the IDs of all HipChallenge-derived controls on the page; all the developer has to do is select one from the list. Implementing a custom TypeConverter for this purpose requires little code.

private class HipChallengeControlConverter : ValidatedControlConverter
{
    private object[] GetControls(IContainer container)
    {
        ArrayList list = new ArrayList();
        foreach(IComponent comp in container.Components)
        {
            HipChallenge hip = comp as HipChallenge;
            if (hip != null)
            {
                if (hip.ID != null && hip.ID.Trim().Length > 0)
                {
                    list.Add(hip.ID);
                }
            }
        }
        list.Sort(Comparer.Default);
        return list.ToArray();
    }

    public override StandardValuesCollection GetStandardValues(
        ITypeDescriptorContext context)
    {
        if (context == null || context.Container == null) return null;
        object [] controls = GetControls(context.Container);
        if (controls != null) 
            return new StandardValuesCollection(controls);
        return null;
    }
}

It derives from ValidatedControlConverter (the TypeConverter used for the ControlToValidate property) and overrides the GetStandardValues method. This method needs to return a StandardValuesCollection filled with the string IDs to display in the drop-down list. So all I do is loop through all of the IComponent instances in the context.Container.Components collection looking for HipChallenge controls and populating the StandardValuesCollection with their IDs. It should be noted that this scenario is made even simpler in ASP.NET 2.0 through the existence of the ControlIDConverter class. By deriving from ControlIDConverter instead of from ValidatedControlConverter, my implementation of HipChallengeControlConverter will then simply look like:

private class HipChallengeControlConverter : ControlIDConverter
{
    protected override bool FilterControl(Control control)
    {
        return c is HipChallenge;
    }
}

While this solution was written for ASP.NET 1.1, it can be used in ASP.NET 2.0 without any changes. Figure 6 shows the ImageHipChallenge and HipValidator controls incorporated into the Personal Web Site Starter Kit that is included with the Visual Web Developer 2005 Express Edition Beta.

ms972952.hip_aspnet_fig06(en-us,MSDN.10).gif

Figure 6. Login modified to incorporate ImageHipChallenge and HipValidator

In fact, run this solution under ASP.NET 2.0 and the controls developed here will automatically be augmented to support new ASP.NET 2.0 functionality that improves the solution. For example, one problem with validation controls in ASP.NET 1.x is that they have page-wide scope. This means that any control on the page that initiates a postback will cause the validation logic to execute, even if that control has nothing to do with the validator. ASP.NET 2.0 introduces validation groups to solve this problem, allowing a control to select to which validators it is related. This functionality is exposed through the ValidationGroup property on the BaseValidator control as well as on controls that can cause a form to submit, such as Button. When running under ASP.NET 2.0, HipValidator instantly gains this functionality as it derives from BaseValidator.

AudioHipChallenge

AudioHipChallenge uses the text-to-speech (TTS) engine distributed with the Windows operating system to generate a WAV file that plays a spoken challenge to the user. The user is then required to type in the spoken word to pass the test. For an automated attacker to solve the puzzle, it would need to be able to parse the WAV file to extract the spoken word using some form of voice recognition software. Ideally, the WAV would be generated in such a way that makes it difficult for the attacker to do so while still allowing a non-automated user access.

As with ImageHipChallenge, the AudioHipChallenge class works in conjunction with an IHttpHandler, in this case AudioHipChallengeHandler. AudioHipChallenge is used as a control in the page that renders the HTML challenge to the client, whereas AudioHipChallengeHandler generates WAV audio files based on the query string information rendered by AudioHipChallenge.

RenderChallenge is the core method of the AudioHipChallenge control, taking the challenge ID and rendering it to the browser. As mentioned previously, the base HipChallenge class handles the generation of the hidden field to store the encrypted content, so RenderChallenge only has to generate the controls specific to this challenge display.

protected override void RenderChallenge(Guid id, string content)
{
    // Get the url to the audio
    string url = null;
    try
    {
        // If it's a valid URL, go with it.
        new Uri(RenderUrl);
        url = RenderUrl;
    }
    catch{}
    // If a fully-qualified URL wasn't supplied, treat it as relative
    if (url == null)
    {
        string appPath = Page.Request.ApplicationPath;
        url = Page.Request.Url.GetLeftPart(UriPartial.Authority) +
            appPath + (appPath.Length > 0 ? "/" : "") + 
            RenderUrl + "?" + ID_KEY + "=" + id.ToString("N");
    }

    // Add the WMP player control to the output
    string wmpId = "wmp" + Guid.NewGuid().ToString("N");
    HtmlGenericControl player = new HtmlGenericControl("object");
    player.Attributes["ID"] = wmpId;
    player.Attributes["CLASSID"] = 
        "CLSID:6BF52A52-394A-11d3-B153-00C04F79FAA6";
    player.Attributes["height"] = "1";
    player.Attributes["width"] = "1";
    player.InnerHtml = 
        "<PARAM name=\"URL\" value=\"" + url + "\">" +
        "<PARAM name=\"autoStart\" value=\"" + _autoStart + "\">";
    Controls.Add(player);

    // Add a button to play the sound
    if (_showPlayButton)
    {
        Button playButton = new Button();
        if (!this.Width.IsEmpty) playButton.Width = this.Width;
        if (!this.Height.IsEmpty) playButton.Height = this.Height;
        playButton.Text = Text;
        playButton.EnableViewState = false;
        playButton.CausesValidation = false;
        playButton.Attributes["OnClick"] = 
            wmpId + ".controls.play(); return false;";
        Controls.Add(playButton);
    }
}

The HTML rendered to the browser by this method consists of an object tag that creates an embedded client-side Windows Media Player control. The URL referenced by the media player is set according to the value of the AudioHipChallenge.RenderUrl property (it must be a fully-qualified URL for WMP to successfully connect), and it can be configured to play the WAV automatically when the page loads using the AudioHipChallenge.AutoStart property. If the AudioHipChallenge.ShowPlayButton property is true, an additional Button control is rendered that is configured to play the WAV when the button is clicked, allowing the user to hear the challenge as many times as they require.

For the embedded browser control to retrieve this WAV file, AudioHipChallengeHandler must be mapped as the IHttpHandler for all requests for the URL specified in the RenderUrl property. As with ImageHipChallengeHandler, this can be configured in the Web.config for the ASP.NET solution.

<httpHandlers>
    <add verb="*" path="AudioHipChallenge.aspx"
        type="Msdn.Web.UI.WebControls.AudioHipChallengeHandler, Hip"/>
</httpHandlers>

AudioHipChallengeHandler's implementation of ProcessRequest is very straightforward. The challenge ID is retrieved from the query string and is used to retrieve the challenge text from the cache. If the text is available (meaning that the ID is valid and that the associated content hasn't expired), a temporary file is created to store the WAV. The WAV data is created from the decrypted challenge word and the temporary file is streamed to the client, after which the temporary file is deleted so as not to pollute the file system with unnecessary garbage.

public void ProcessRequest(HttpContext context)
{
    // Get the challenge text
    string text = HipChallenge.GetChallengeText(new Guid(
        context.Request.QueryString[AudioHipChallenge.ID_KEY]));

    if (text != null)
    {
        // Get a path for the temporary audio file.
        FileInfo tempAudio = new FileInfo(Path.GetTempPath() + "/" +
            "aud" + Guid.NewGuid().ToString("N") + ".wav");
        try
        {
            // Speak the data to the file
            SpeakToFile(text, tempAudio);

            // Send the audio to the client
            HttpResponse resp = context.Response;
            resp.ContentType = "audio/wav";
            resp.WriteFile(tempAudio.FullName, true);
        }
        finally
        {
            // Delete the temporary audio file
            tempAudio.Delete();
        }
    }
}

If you look at AudioHipChallengeHandler.SpeakToFile, you'll notice that there's very little code involved in what is actually a complicated task. Fortunately for me, I didn't have to write my own TTS engine and was able to take advantage of the Microsoft-provided speech libraries already available on my system. These libraries are programmatically exposed as a set of COM components that are easily accessed from a .NET client through the wonders of COM interop.

ms972952.hip_aspnet_fig07(en-us,MSDN.10).gif

Figure 7. Importing the Microsoft Speech Object Library

Using tlbimp.exe from the .NET Framework SDK or the "Add Reference..." option in Visual Studio .NET, you can import the Microsoft Speech Object Library (sapi.dll) into your own project (by default, the wrapper will be named Interop.SpeechLib.dll), as shown in Figure 7.

SpVoice is the core class necessary for TTS. I first retrieve a list of the voices installed on my system using SpVoice.GetVoices. This method returns an ISpeechObjectTokens collection from which I pick a random voice to be stored into the SpVoice object's Voice property. An SpAudioFormat is then created for use with the SpVoice and is configured to use the GSM610 11kHz mono audio compressor. This compressor creates decently small WAV files, which has the side benefit for our purposes of distorting the generated voice. Finally, an SpFileStream is generated for a disk-based file and the SpVoice.Speak method is used to convert the specified text to speech, writing it to the file.

private void SpeakToFile(string text, FileInfo audioPath)
{
    SpFileStream spFileStream = new SpFileStream();
    try
    {
        // Create the speech engine and set it to a random voice
        SpVoice speech = new SpVoice();
        ISpeechObjectTokens voices = 
            speech.GetVoices(string.Empty, string.Empty);
        speech.Voice = voices.Item(NextRandom(voices.Count));

        // Set the format type to be heavily compressed.
        SpAudioFormatClass format = new SpAudioFormatClass();
        format.Type = SpeechAudioFormatType.SAFTGSM610_11kHzMono;
        spFileStream.Format = format;

        // Open the output stream and speak to it
        spFileStream.Open(audioPath.FullName,
            SpeechStreamFileMode.SSFMCreateForWrite, false);
        speech.AudioOutputStream = spFileStream;
        speech.Rate = -5; // Ranges from -10 to 10
        speech.Speak(text, SpeechVoiceSpeakFlags.SVSFlagsAsync);
        speech.WaitUntilDone(System.Threading.Timeout.Infinite);
    }
    finally
    {
        // Close the output file
        spFileStream.Close();
    }
}

AudioHipChallenge also provides the SpellWords property, which can be used to force the control to generate an audio file that speaks the spelling of the word rather than its pronunciation. This is done by overriding the HipChallenge.ChooseWord method that selects the next word to be spoken.

protected override string ChooseWord()
{
    // Get a word
    string word = base.ChooseWord();

    // If the user has opted to have words spelled, generate
    // a string that contains the spelling and return that instead.
    if (_spellWords) 
    {
        char [] letters = word.ToCharArray();
        StringBuilder sb = new StringBuilder(letters.Length*3);
        foreach(char letter in letters)
        {
            int pos = (int)(Char.ToLower(letter) - 'a');
            if (pos >= 0 && pos < 26)
            {
                sb.Append(_spelledLetters[pos]);
                sb.Append("; ");
            }
        }
        return sb.ToString();
    }
    // Otherwise, just return the word
    else return word;
}

The overridden method first calls to the base method to get the next word. If the SpellWords property has been set to true, it splits the word string into its constituent characters and generates a new string with all of the letters separated by semicolons, forcing the TTS engine to speak each letter individually. Unfortunately for my purposes, the TTS engine doesn't pronounce each character as I would have hoped (although the pronunciation it chooses does make sense for certain scenarios). To force it to use the pronunciation I want it to, I created an array of strings that map to the pronunciation of each letter as I'd expect: "hay" for 'a', "bee" for 'b', and so on.

private static string [] _spelledLetters = 
{
    "hay", "bee", "see", "dee", "ee", "ef", "gee", 
    "haych", "eye", "jay", "kay", "el", "em", "en", 
    "oh", "pee", "queue", "are", "es", "tee", "you", "vee", 
    "double you", "ex", "why", "zee"
};

The array has 26 elements, one for each letter, and stores them in alphabetical order. Thus, to retrieve the pronunciation string for a particular letter, all I have to do is subtract 'a' from the lower case version of that letter and I end up with the correct index into the array.

That's it for the implementation of the control itself. From a developer's perspective, using the AudioHipChallenge is very straightforward. An instance of the control is added to a page along with a TextBox into which a user can enter his solution. A HipValidator is added to the page, its ControlToValidate property set to the ID of the TextBox and its HipChallenge property set to the ID of the AudioHipChallenge control. Finally, in the page's Load event handler, words are added to the AudioHipChallenge.Word collection. Simple and easy to use.

This is a simple implementation of an audio-based challenge, and there are certainly plenty of avenues to explore if you want to build something more robust. For example, you could overlay the spoken text on top of a background conversation and add reverb to frustrate even the best speech recognition engines. You should also evaluate how well your users can understand the words being spoken. If you opt to go the route of having individual letters read, evaluate how well your users do with letters that sound similar, such as 'B' versus 'P', and 'D' versus 'T'; you might decide to go with numerical digits instead.

Does It Work?

An Internet search reveals that a fair amount of research is being done into breaking these puzzles, recently resulting in decent success rates. The so-called "EZ-Gimpy" puzzle like the one in use by Yahoo has been broken by researchers with a success rate as high as 93 percent (http://www.cs.berkeley.edu/~mori/gimpy/gimpy.html), while simple TTS puzzles such as the one created in this article have been solved by recognizers almost as well as by humans (http://csdl.computer.org/comp/proceedings/ictai/2003/2038/00/20380226abs.htm). Does that mean we should give up and scrap the whole thing? Probably not. Even if the puzzles can be solved by a computer, they still significantly raise the bar for an attacker. Developers of automated programs to attack a CAPTCHA-protected site will need to incorporate the recognition engines into their applications and scripts, requiring a significant investment of resources from the attacker. In addition, computer-based recognition engines are computationally intensive, also raising the cost of an attack. Eventually these automated solutions could be distributed to "script kiddies" around the Internet, at which point the compromised puzzle could be changed to something more difficult to solve, and the cycle begins again. Like many related problems, it's an arms race between the defenders and the attackers.

Other attacks have been proposed that take advantage of cheap labor instead of attempting to actually break a certain puzzle. After all, if a problem is difficult for a computer to solve, why not pay a human to do it? One might imagine a room full of minimum-wage employees solving these puzzles all day long, but that's not an extremely cost effective solution for an attacker. "Pay" doesn't necessarily involve money, however, and could be based on a barter system. The classic example of this is the attacker setting up an adult Web site, offering free viewing materials to anyone willing to solve a puzzle. The attacker can then transfer the puzzle from the attacked site to the user of the adult site, prompting the user to solve the problem, and then submitting the human-generated solution back to the attacked site. While certainly feasible, this solution can also require significant investments of resources, and unless there is a non-trivial amount of traffic to the adult site, it can be thwarted with some of the defenses discussed in this article, such as putting a short time limit on a puzzle.

Denial of service (DoS) attacks are also an issue when dealing with pages that require significant server resources. Both the image and audio generation logic presented in this article require a fair amount of CPU cycles, so as with any resource intensive Web page an attacker could try to mount a DoS attack by bombarding the ImageHipChallengeHandler and AudioHipChallengeHandler endpoints with requests. Fortunately, there are many plausible solutions to preventing such attacks. One solution would be to cache the puzzles for some period of time such that requests for a puzzle given the same challenge text would yield the same image (you'd want to keep the cache timeout small enough so that other attacks wouldn't benefit). Another solution sometimes used to lessen the server impact of DoS attacks would be to add a random but relatively significant time delay to the Web page's response by doing a Thread.Sleep before any processing happens. A normal user won't mind if a request takes an extra second to return, but an attacker attempting to pin the CPU at 100 percent won't have as easy a time doing so.

As noted earlier, HIP puzzles are very difficult to get right and will be broken eventually. As a result, there is significant churn in implementation as both the puzzles and the attacks progress (all the more reason to build your HIP framework such that new puzzles can be swapped in as needed and with little effort). This means that HIP solutions need to be watched carefully, actively monitored for successful attacks and usability problems. All this points towards centralizing a good solution into a hosted service that your site can then consume. However, until such a service exists, is trustworthy, is time-tested, and is cost-effective, the approaches described in this article can get you and keep you up and running.

Also keep in mind that while attacks against these types of systems exist (they almost always do for any system), real-world deployments have shown CAPTCHAs to be very effective, on both large and small sites, deterring all but the most determined of attackers.

Of course, arguments against the use of such systems aren't necessarily limited to their vulnerabilities. Some people find these puzzles just plain annoying and are dissuaded from using a site that presents them. Before implementing this on your own site, you might consider reading up on some of the work being done to better this area. The paper by Pinkas and Sander mentioned at the beginning of this article discusses some good ideas that can be used to make the user experience more pleasurable, specifically relating to using CAPTCHAs in a login scenario.

No article on CAPTCHA would be complete without mentioning the accessibility restrictions imposed by such a solution. Current CAPTCHA puzzles rely on human senses for solution generation. Visual puzzles require good eyesight, while aural puzzles require good hearing, and obviously a significant portion of humans would be stymied by one or both of these types of puzzles (for a more in-depth analysis, see the W3C's Inaccessibility of Visually-Oriented Anti-Robot Tests, available online at http://www.w3.org/TR/turingtest/). Even those with good eyesight might have trouble with certain visual puzzles; for example, a puzzle that relies on a user differentiating greens from reds might cause trouble amongst the (approximately) 10 percent of the male population that is colorblind. Allowing the presentation of multiple puzzle types (and allowing the user to choose based on their capabilities) is important for a site to best accommodate as large a portion of the user population as possible. Most Web users do have satisfactory use of at least one required sense so that they could solve either an aural or visual problem, but as computers and the Web become more and more accessible, this won't always be the case. For now, any site that chooses to use CAPTCHAs should allow a user to solve either kind. In the future, one could imagine puzzles that don't involve either of those senses, possibly ones based on smell or on taste, or even puzzles that rely on logic rather than on the presentation of the puzzle itself. Of course, if Turing was correct, eventually computers will reach a state of sophistication where no human could tell a human apart from a computer, at which point all of these systems would become obsolete. That is, unless computers become smarter than humans, at which point maybe a computer could still tell a computer from a human, even if a human couldn't. Food for thought.

Stephen Toub is the Technical Editor for MSDN Magazine, for which he also writes the .NET Matters column.

© Microsoft Corporation. All rights reserved.