Go Global

Make the .NET World a Friendlier Place with the Many Faces of the CultureInfo Class

Michael Kaplan

This article discusses:
  • What CultureInfo can do for you
  • Collation, casing, formatting, and resource loading
  • CultureInfo changes in the .NET Framework 2.0
This article uses the following technologies:
.NET Framework and C#

Contents

Getting CultureInfo from Properties
CultureInfo from Initial Settings
Getting CultureInfo from Input Methods
Creating New CultureInfo Objects
Uppercasing and Lowercasing
Collation
Formatting and Parsing
Resource Loading
Encoding
Input Languages
Recent Changes
Custom Cultures and Beyond

One of the most widely used classes in the Microsoft® .NET Framework is CultureInfo, a class whose objects are used for resource loading, formatting, parsing, casing, sorting, and other conventions that change as the language, location, or writing system is changed. It's a relatively complex class and can be tricky to use correctly in every situation.

In this article I'll walk through some of these scenarios and provide enough information about the behavior, best practices, and consequences of a wrong decision to allow you to make the right choices for your use of CultureInfo and its related classes in the System.Globalization namespace in your future projects.

It all starts with the creation of the object, and there are many ways to get a CultureInfo object. You can use the cultures available through the built-in CultureInfo.CurrentCulture, CultureInfo.CurrentUICulture, or CultureInfo.InvariantCulture properties. In addition, you can use a built-in CultureInfo based on selected or installed input methods. You can use an application-created CultureInfo instance, or alternatively, you can use no culture at all.

Getting CultureInfo from Properties

The CultureInfo instance that is returned from the CultureInfo.CurrentCulture property is based on the locale the user selected in Windows® Regional Options (shown in Figure 1). This is called the "user locale" for programmers, and "language" for Standards and Formats in Windows XP and Windows Server™ 2003. This value can be changed in .NET at the thread level, but it does not change while the application is running when the language in Regional Options is changed by the user. In fact, even changes to individual settings are not detected unless the CultureInfo.ClearCachedData method is called.

Figure 1 Windows Regional Options as CurrentCulture

Figure 1** Windows Regional Options as CurrentCulture **

Unfortunately, there is no specific event that is raised when the user changes settings in Regional Options. There is an event raised when the current input language has changed, but not when the general settings change. However, Windows broadcasts the WM_SETTINGSCHANGE message any time this setting is changed by a user. The code shown in Figure 2 will help an application created in the .NET Framework respond to changes in this context. It listens for the appropriate message, namely, WM_SETTINGCHANGE with the string "intl" in the LPARAM. If that message is passed, the code makes the appropriate changes either to the whole locale or to the individual settings that may have changed. It is important to use methods such as these if you want a .NET-based application to respond to changes that the user has made through the control panel.

Figure 2 Responding to Regional Options Changes

private const int WM_SETTINGCHANGE = 0x001A;

[DllImport("kernel32.dll", ExactSpelling=true)]
private static extern int GetUserDefaultLCID();

CultureInfo m_ciOld = new CultureInfo(GetUserDefaultLCID());

protected override void WndProc(ref Message m) {
  switch(m.Msg)
  {
    // change in a systemwide or policy setting
    case WM_SETTINGCHANGE: 

      if(m.LParam != IntPtr.Zero) {
        int localeCur = GetUserDefaultLCID();
        string val = Marshal.PtrToStringAuto(m.LParam);

        if(val == "intl") {
          // change in locale settings
          Thread thread = Thread.CurrentThread;

          if(thread.CurrentCulture.LCID != localeCur() &&
            thread.CurrentCulture.LCID == m_ciOld.LCID) {
            // user default locale has changed — so
            // change the current culture.
            thread.CurrentCulture = new CultureInfo(localeCur);
          }
          else
          {
            // Some individual setting has changed — so
            // clear the cached data to pick up that change.
            thread.CurrentCulture.ClearCachedData();
        }

        m_ciOld = new CultureInfo(localeCur);
      }
    }
    break;
  }

  base.WndProc(ref m);
}

When deciding whether to use the CurrentCulture or another option, keep in mind that the CurrentCulture setting is the user's preference for how information such as date, time, and number formatting and sorting are presented.

CultureInfo from Initial Settings

The CurrentUICulture object's initial setting is based on the interface language of Windows. When Windows has built-in Multi-lingual User Interface (MUI) functionality, this choice can be set by the user; otherwise it is the language into which Windows is localized. As the application developer, you are the provider of the user interface and whatever language or languages you want to provide. You can even choose to provide your own user interface for language selection if you want your users to be able to change to one of the available languages. In most cases, it's only appropriate to use the CurrentUICulture when you're choosing a language for your user interface. Although usually it will be the same value as the CurrentCulture, sometimes it will differ. This should always be respected when you have the appropriate user interface language available.

Getting CultureInfo from Input Methods

An input language is a culture/keyboard layout pair that determines how the physical keys on a keyboard map to characters in a language. The System.Windows.Forms.InputLanguage class provides the following:

  • Read-only access to a collection containing the full list of input languages installed on the computer, available through the InstalledInputLanguages property.
  • Read-only access to the default input language, available through the DefaultInputLanguage property. This will be the initial input language for any new thread or process.
  • Read-write access to the input language for the current thread, available through the CurrentInputLanguage property.

In addition, Windows Forms provides two events, InputLanguageChanging and InputLanguageChanged, both of which provide an InputLanguage object that has within it a CultureInfo. That CultureInfo can be useful for any operation in which the language the user is typing might be relevant. This might include storing information about the language or choosing an appropriate dictionary or thesaurus.

There are many times that you may need an object that does not change based on user settings or changes in application settings. It is for this reason that the InvariantCulture was created and exposed for developers.

As important as it can be to adhere to one of the cultures I've mentioned, there are times that no culture (even the invariant one) is an appropriate choice. Some of these scenarios are discussed in the later section about collation. It is important to recognize when such a decision is needed.

Creating New CultureInfo Objects

Although most applications do not need it, there are times where the ability to simply create a CultureInfo object that is not based on any particular user settings can be very useful. Some of these scenarios include enumerating available CultureInfo choices, checking the behavior of an application with other culture settings, reproducing bugs that have been reported with specific cultures, and creating the objects to assign to the current culture, or current UI culture, based on some externally provided information (such as in HTTP requests to an ASP.NET page). Now that you know how to get CultureInfo, the following sections will explain operations you can perform using CultureInfo objects.

Uppercasing and Lowercasing

Although casing is arguably the easiest operation, it is nevertheless one that has caused a great many bugs due to misunderstandings. Casing is the operation that moves text between uppercase and lowercase. It gets tricky because there are two languages (Turkish and Azeri) that use Turkic casing rules. As depicted in Figure 3, in those two cultures, #1 and #2 are cased variations of each other, as are #3 and #4. In every other culture, #1 and #4 are cased variants of each other.

Figure 3 Turkic Casing Rules

Number Letter Unicode Code Point Character Name
1 I U+0049 LATIN CAPITAL LETTER I
2 ı U+0131 LATIN SMALL LETTER DOTLESS I
3 İ U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE
4 i U+0069 LATIN SMALL LETTER I

If you perform an uppercasing or lowercasing operation using Turkic rules and then later try to compare results without them, your code can run into serious problems trying to identify strings.

In deciding whether to use the Turkic casing rules, it is important to look at the specific scenario. Obviously, a Turkish or Azeri user would expect strings displayed for them to follow the Turkic rules when casing is performed, and just as obviously, operations involving file names and registry keys should never use them. It is important to distinguish between what is visible to the user and what needs to be consistent with an external source or with anyone else running the application.

No other cultures have alternate casing rules, and I don't know of any other languages that do. However, if you are able to properly work with this one exceptional case, then you will be able to handle any that come up in the future. Properly using CurrentCulture to handle Turkish will allow any such future culture to work as users would expect.

Collation

Although I'm discussing collation in terms of CultureInfo, all of the relevant methods come out of System.Globalization.CompareInfo. However, since the only way to get a CompareInfo is to either use the one hanging off of a CultureInfo or to pass a culture name or ID to the static CompareInfo.GetCompareInfo method, it's easier to talk about how to choose the collation method in terms of cultures.

Collating, or sorting, is simply the ordering of items. It is one of the most fundamental features and everyone expects it to work properly. Ideally, it should be entirely transparent. When users click the top of a list view column that looks like Windows Explorer, they assume that the column will be sorted according to their cultural and language expectations.

Luckily, the bulk of the work has been done , and you just have to select the appropriate culture/settings. There are three possibilities that together handle virtually every conceivable scenario: CurrentCulture, InvariantCulture, and CompareOptions.Ordinal.

CurrentCulture is used any time data is presented to the user and you want that data to be ordered or compared intuitively. It is the default setting, but it's often better to put it in explicitly, as doing so will prevent static analysis warnings from tools like FxCop and will make you aware that you have considered all the options.

InvariantCulture is used any time you still want proper results from a linguistic standpoint, but you need results that do not change their order when user settings do. It is an uncommon requirement for ordering; it's much more common in comparisons.Parse versus ParseExact

There are two parsing methods you can use to parse your strings: Parse and ParseExact. The Parse method's functionality is rooted in COM (which was itself rooted to older versions of Visual Basic) and conversions from string to date happened no matter what the cost. The risks of improper parsing are an unfortunate side effect, one that is visible to people who have to work with both dd/mm/yy and mm/dd/yy dates. The DateTime.Parse method in the Microsoft .NET Framework has goals much like its predecessors, but unfortunately it suffers from some of the same problems. The code is slower since the extra checking takes time, and there will always be some new format that is not properly detected. In those older products, you may remember, the behavior was sometimes disparagingly referred to as "evil date parsing."

DateTime.ParseExact, on the other hand, takes the exact formats specified in the DateTimeFormatInfo object and uses them and nothing else. There is no forgiveness for data that does not match, and the issue of whether or not gratuitous spaces should be forgiven makes for some interesting arguments in certain hallways at Microsoft. Simply put, its goal is more along the lines of "here is the format; here are the strings in that format. Just do the job." This makes it faster and more exact as a semantic, and as such, it is much better suited if the flexibility of DateTime.Parse is not desired.

CompareOptions.Ordinal is an ordinal flag (as is the new OrdinalIgnoreCase flag added in the .NET Framework 2.0) and is used for unchanging binary comparisons of strings. This is the type of comparison that is not only the fastest but also the best if characters that have no weight in sorting (such as directional formatting characters used in bidirectional rendering) must be given some weight rather than being ignored.

There is also a possible fourth requirement which concerns making decisions about file names and whether they are equal. Unfortunately, none of the three methods I just mentioned will give a complete answer for that question, in either managed or unmanaged code. The only way to get a correct answer is to attempt to create a file and trap the exception if it already exists.

The final features I'll discuss include resource loading, encoding based on locales, and input methods. You can think of these as more understandable features, or features less likely to be broken in ways that are difficult to decipher.

Formatting and Parsing

When it comes to parsing, you may want to consider using ParseExact instead of Parse, when you can, to protect your own code. Flexibility is great when you need it, but when you don't, its better not to risk the problems associated with it. The sidebar "Parse versus ParseExact" explains the differences between the two methods.

Of all of the uses of cultures, the most difficult ones to manage and that cause the most difficulty are the formatting and parsing functions. They are ideally converse operations, which is why they are lumped together here. There are two situations in which parsing and formatting are not converse operations. The first is when a custom format is used to convert a number, date, or time to a string, and the identical formatting string is not used to parse that string and retrieve the data. The second is when formatting strings are not the default and Parse is used, rather than ParseExact, since the heuristics in Parse can, at times, read information improperly.

The heart of this support is the IFormatProvider interface, which many of the classes in the System.Globalization namespace support (including CultureInfo, NumberFormatInfo, and DateTimeFormatInfo). This interface has one method—GetFormat—which the parsing and formatting code can use to acquire the correct format information. Most of the formatting and parsing methods have at least one override that accepts an IFormatProvider.

The fact that a single interface is used can be a bad thing. The code has to be able to handle the case where a NumberFormatInfo is passed to a DateTime.Parse, which it does by ignoring the parameter. (In current versions of FxCop, however, this also silences the warning about providing an IFormatProvider.) The same issue can be seen with formatting and with a DateTimeFormatInfo being passed to the number formatting and parsing methods. I usually suggest passing a CultureInfo, since the more specialized DateTimeFormatInfo and NumberFormatInfo classes can always be extracted from it unless you have specifically customized an object and are sure you are passing the right one. With any luck, future versions of FxCop will warn you when you pass a parameter that is nothing but a slower version of NULL (slower because it has to check if it's one of two other types first).

Resource Loading

For apps that do the work to support other languages, the simple rule of using the CurrentUICulture as a default is what makes this item easier to deal with. If you localize either an ASP.NET app or a Windows Forms app, the resource loading methods will use this setting as their default. You can, of course, override that default by passing a separate CultureInfo that uses a different language.

Make sure to set the CurrentUICulture to the user's expectations. Whether it is using the Windows UI language, the Web Server's impression of the user based on the HTTP_ACCEPT_LANGUAGE header, or some interface choice that you provide in your application will completely depend on your application. One good rule is to use the same model as Windows MUI—provide a list of languages that an application supports using the native name of each language (CultureInfo.NativeName). This amazingly simple plan is really common sense, for if a user does not understand how to read a language name, why would they want to switch their interface to that language?

Encoding

Encoding is supported through the System.Text.Encoding class and the various classes that support its interfaces. This topic is worthy of many articles, but the support that is culture-specific is limited to the ANSI, OEM, Mac, and EBCDIC code pages of each language. In the real world, when you need to convert anything to or from a legacy code page, you must know what the code page is rather than guessing based on a specific culture. The utility of a culture-based encoding plan is therefore quite limited. It is best to look at encodings in a code page-based way, using the exact code page conversion in which the data is encoded.

Input Languages

Support for input languages (otherwise known as input methods) is provided through the InputLanguage class. The class is limited, since through the .NET Framework you can only use or change to languages that the user has already installed. Every InputLanguage object has a Handle property on it. Although handles do not always have a specific size (32 bits on a 32-bit platform and 64 bits on a 64-bit platform), the InputLanguage Handle only ever uses 32 bits, and the first half of that number is actually an LCID value. The culture attached to the InputLanguage is actually created from that LCID.

In an application, the culture can be quite useful any time language-specific choices, such as the way to use tools that do spelling or grammar checking, must be made. Unmanaged applications like Microsoft Word use the LCID half of the input language handle for this very purpose. They tag portions of the document to be in the appropriate language, based on the language under which the input method was placed.

If you follow this route, you may want to also have logic to work with characters based on the characters entered rather than purely by the language choice. This will avoid a hazard that even Word does not handle well, such as when a user puts a keyboard under an entirely unrelated language (like an Arabic keyboard under the Hungarian language). Within the .NET Framework, there is no way to retrieve the keyboard language, but you can use P/Invoke to do the work for you, as shown in Figure 4.

Figure 4 Retrieving the Keyboard Language

[DllImport("user32.dll")]
private static extern bool GetKeyboardLayoutName(StringBuilder pwszKLID);
private const int KL_NAMELENGTH = 9;

private CultureInfo CultureOfCurrentLayout() {
    StringBuilder sb = new StringBuilder(KL_NAMELENGTH);

    if(GetKeyboardLayoutName(sbKLID)) {
        int klid = int.Parse(
           sbKLID.ToString().Substring(KL_NAMELENGTH - 1),
           NumberStyles.AllowHexSpecifier, CultureInfo.InvariantCulture);

        // strip all but the bottom half of the number
        klid &= 0xffff;

        return new CultureInfo(klid, false);
    }

    return(null);
}

This function will return a CultureInfo representing the language, region, and script for which the keyboard was designed.

Recent Changes

Windows-Only Cultures The list of supported cultures in the .NET Framework is more extensive than it is in various versions of Windows. Starting with Windows XP SP2, however, the reverse happened when 25 new locales were added. This causes a huge problem for both the CurrentCulture/CurrentUICulture properties and the InputLanguage class, which were always able to simply create a culture using the locale identifier (LCID) value from Windows.

The solution to this problem, which will ship with the release of the .NET Framework 2.0, is to synthesize a CultureInfo object any time one attempts to create a culture that is unavailable in the Framework, but available in Windows. All of the principles discussed in prior sections will allow your applications to work well with these "Windows-only" CultureInfo objects.

The New, Improved CultureTypes Enumeration In the .NET Framework 1.x, the CultureTypes enumeration was defined as:

[Flags]
public enum CultureTypes 
{
    NeutralCultures        = 0x0001,
    SpecificCultures       = 0x0002,
    InstalledWin32Cultures = 0x0004,
    AllCultures            = NeutralCultures | SpecificCultures |
                             InstalledWin32Cultures,
}

This does not leave much room for the addition of new types of cultures (Windows-only, Custom, and Replacement). Therefore, the updated CultureTypes enumeration that will ship with the .NET Framework 2.0 is defined in Figure 5.

Figure 5 Updated CultureTypes Enumeration

public enum CultureTypes 
{
    NeutralCultures         = 0x0001,  // Neutral cultures such as "en", 
                                       // "de", and "zh".
    SpecificCultures        = 0x0002,  // Non-neutral cultures such as 
                                       // "en-us", and "zh-tw".
    InstalledWin32Cultures  = 0x0004,  // Win32-installed cultures in the 
                                       // system that exist in the 
                                       // Framework, too.
    
    AllCultures             = NeutralCultures | SpecificCultures | 
                              InstalledWin32Cultures,
    
    UserCustomCulture       = 0x0008,  // User-defined custom culture
    ReplacementCultures     = 0x0010,  // User-defined replacement 
                                       // custom culture.
    WindowsOnlyCultures     = 0x0020,  // Culture exists in Win32 but 
                                       // not in the Framework.
    FrameworkCultures       = 0x0040,  // Language tag matches a culture 
                                       // that ships with the Framework.
}

The updates not only make room for the new culture types but they also allow for future growth. Existing code that uses AllCultures will continue to work, but new code should use the new CultureTypes enumeration members. These members are enumeration values and flags and, as such, can be used in combination. For example, you can have UserCustomCultures that have either the NeutralCultures or SpecificCultures flag associated with them. When calling CultureInfo.GetCultures, you should pass the most restrictive type to cover your requirements.

Custom Cultures and Beyond

The ability to create custom cultures, also new in the .NET Framework 2.0, is accomplished by creating a new culture with the CultureAndRegionInfoBuilder class. Following the best practices I've suggested in this article will ensure that your managed application is much more likely to work properly with custom cultures!

CultureInfo is a rich and powerful class, encompassing almost every common aspect of how users interface with computers and applications. With the great power that this class provides it is important that you make sure you're making the right choices when deciding what CultureInfo to use in each situation. It is often that little bit of extra thought that can maximize both the efficiency and the effectiveness of CultureInfo usage.

Michael Kaplan is a technical lead at Microsoft, working on both Windows and the .NET Framework, particularly on collation, keyboards, locales, and Unicode support. He is the developer/owner of MSKLC, the Microsoft Keyboard Layout Creator, written in C#. He can be reached at blogs.msdn.com/michkap.