Managing Data Conversion Between a Unicode Server and a Non-Unicode Client

This topic describes how to preserve the integrity of character data when the server-side data storage is in Unicode, but the client-side application that interacts with the data uses a specific code page.

Data Input

When non-Unicode data is sent from the client to be stored on the server in Unicode, data from any client with any code page can be stored correctly if one of the following conditions is true:

  • Character strings are sent to the server as parameters of a remote procedure call (RPC).

  • String constants are preceded with the capital letter N. This is required regardless of whether your client-side application is Unicode-aware. Without the N prefix, SQL Server will convert the string to the code page that corresponds to the default collation of the database. Any characters not found in this code page will be lost.

Data Retrieval

If the client application is not Unicode-enabled and retrieves the data into non-Unicode buffers, a client will only be able to retrieve or modify data that can be represented by the client machine's code page. This means that ASCII characters can always be retrieved, because the representation of ASCII characters is the same in all code pages, while any non-ASCII data depends on code-page-to-code-page conversion.

For example, suppose you have an application that is currently running only in the United States (U.S.), but is deployed to Japan. Because the SQL Server database is Unicode-aware, both the English and Japanese text can be stored in the same tables, even though the application has not yet been modified to deal with text as Unicode. As long as the application complies with one of the two previous options, Japanese users can use the non-Unicode application to input and retrieve Japanese data, and U.S. users can input and retrieve English data. All data from both sets of users is stored intact in the same column of the database and represented as Unicode. In this situation, a Unicode-enabled reporting application that generates reports that span the complete data set can be deployed. However, English users cannot view the Japanese rows, because the application cannot display any characters that do not exist in their code page (1252).

This situation might be acceptable if the two groups of users do not have to view each other's records. If an application user must be able to view or modify records with text that cannot be represented by a single code page, there is no alternative but to modify the application so that it can use Unicode.

Web-based Applications

If the client-side program is Web-based or connects to an Active Server Pages (ASP) page, there are metadata specifications on both the client-side HTML page and the server-side ASP page. These specifications must be made to specify how character strings should be converted between the server, the ASP engine, and the client browser.

On the client side HTML page, the META attribute must specify that the character set data should be converted to the encoding scheme of the client by specifying a CHARSET code. For example, the following HTML page instructs the client to convert character data to the 950 (Chinese Traditional) code page by specifying big5 as the CHARSET code.

<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=big5">
<!-- 
     
-->
</HEAD>
<BODY>
<!--
   body
-->
</BODY>
</HTML>

On the server-side ASP page, you must instruct the ASP Web application what code page the client browser is using. You can specify the Session.CodePage property, or the @CodePage directive. These methods will handle the conversion of data from server to client and also both GET and POST client requests. In the following examples, both methods are used to specify conversion to and from the code page of the client, which is 950 (Chinese Traditional).

<%@ Language=VBScript codepage=950 %>
<%  Session.CodePage=950 %>

And finally, you must remember to prefix any string literals with the letter N.