International Active Server Pages

 

Seth Pollack
Lead Program Manager, Internet Information Server, Microsoft Corp.

April 2, 1997

For years, software houses have been producing localized releases of their products where user interface elements, documentation, and so forth are translated to various languages in order to reach new markets. Active Server Pages (ASP) is no exception. We feel it's important to reach beyond the English-speaking community, so we are shipping versions in French, German, Spanish, Swedish, Dutch, and Japanese. In fact, you should be able to download these versions at the end of March, 1997, from the Microsoft Internet Information Server (IIS) Web site.

However, the Internet brings a whole new set of globalization challenges. For example, say I put a Web server in the basement of my house in Seattle, plug it into the Internet, and publish a Web site on Slugs of the Northwest. Even though my server sits physically in the United States, runs English language software, and so forth, I'm likely to have slug enthusiasts from around the world hitting my site. How can I cater to these people, who may speak different languages, and run client software localized to those languages? Although the Internet highlights this issue, the same problem strikes in the intranets of multinational organizations.

Don't hold your breath because there aren't any magic answers. But there are some techniques you can use, including a new feature that wasn't in the original IIS version 3.0.

The simplest solution to this problem is to provide the site in only one language. But many Web sites want to do better than this by providing versions of their site in several languages. There is certainly a sizable cost in terms of development and maintenance in taking this step, but it allows you to reach many more people.

In this case, how does a user get funneled to the content in the right language? The simplest way is to make the user explicitly choose which area to visit, e.g., "click here for English, click here for Nihongo". The user's selection sends her into a different portion of your content tree. Or you might just have the client choose the language once, the first time she visits your site, and then persistently associate that information with this user using the Internet Personalization System through ASP. Another approach is to take advantage of the HTTP request header called HTTP_ACCEPT_LANGUAGE. Browsers send this header to the server with every request, specifying which language the user would like to see content returned in (for example, "EN" for English). Under ASP, you can check this header and make a decision programatically, without bothering the user.

So this all sounds pretty reasonable. There is a catch, however, in building global sites with ASP, and it has to do with character set conversions. To grasp the issue, we need to take a peek inside ASP for a moment. Internally, ASP and the language engines it calls—such as Visual Basic® Scripting Edition, (VBScript), JScript™, and so forth—all speak in Unicode strings. However, Web pages currently consist of content that can be in ANSI, DBCS, or other character encoding schemes. Therefore, when form or query string values come in from the browser in an HTTP request, they must be converted from the character set used by the browser into Unicode for processing by ASP script. These conversions map characters from one code page (a set of characters organized in some scheme, e.g. ANSI) to another. For example, the value that refers to the letter "a" in ANSI will be converted to the different value that refers to that same letter "a" in Unicode. Similarly, when output is sent back to the browser, any strings returned by scripts must be converted from Unicode back to the code page used by the client.

In the ASP shipped in December, 1996, these internal conversions are done using the default code page of the Web server. This works great if the users and the server are all speaking the same language (more precisely, if they use the same code page). However, if you have a Japanese client hitting an English server, the code page translations mentioned above won't work, because ASP will try to treat Japanese characters as English ones.

The solution: For the ASP DBCS release due at the end of March, 1997, we've added a way to control the code page that ASP uses to do these inbound and outbound string translations. This can be set in one of two ways. A special flag may be set in the ASP file, using the <% @ %> compiler directive block that can appear at the beginning of any ASP file. (This is used today to control the inline scripting language choice for the page.) The tag is of the form CODEPAGE = nnn. As before, this compiler directive block must appear before any executable script, and can only occur once in an .ASP file and its included files. For example, the block to set the code page to 1252 might look like the following:

<% @ LANGUAGE=VBScript CODEPAGE=1252 %>
      

Alternatively, in script code a new Session.CodePage property is available that sets the code page to use for string translations for the current session. Setting this property overrides the value set by the CODEPAGE directive. For example:

Response.Write (Session.CodePage)
      Session.CodePage = 1252

Either of these values may be set to any code page that is available on the server machine. The default is the system's default ANSI code page (CP_ACP for the Win32 programmers out there).

How are these code page settings applied? First of all, any static content (HTML) in the .ASP file is not affected at all; it is returned exactly as authored. Any static strings in the script code (and in fact the script code itself) will be converted based on the CODEPAGE setting in the .ASP file. Think of CODEPAGE as the way an author (or better yet, the authoring tool, which should be able to stick this in the .ASP file automatically) tells ASP the code page in which the .ASP file was authored.

Any dynamic content (Response.Write(x) calls, where the x is a variable) is converted using the value of Session.CodePage, which defaults to the CODEPAGE setting but can be overridden. Why do we need this override? Because the code page used to author the script may not be the same as the code page you want to use to send its output to a particular client. For example, the author may have written the .ASP page in a tool that generates JIS, but the end user's browser may want UTF-8. With this code page control feature, ASP now enables correct handling of code page conversion.

Where do we want to take this moving forward? There are more problems to address. A language specified in the HTTP_ACCEPT_LANGUAGE header, such as Japanese, might map to several code pages (JIS, Shift-JIS, UTF,.... ), making it difficult to decide programmatically which code page to use based on that header. Having browsers additionally send in the request of the code page they are using would make this possible. The best solution would be standardizing on Unicode in the browsers, so that in the future these code page translations would become unnecessary.

Author Seth Pollack is the lucky guy who got to choose the codename "Denali" for the original Active Server Pages work, as he was planning a climbing trip to Mt. McKinley (a.k.a. Denali) at the time the project started.