International Features
WideCharToMultiByte
Maps a wide character string to a new character string. The new character string is not necessarily from a multibyte character set.
Note: The ANSI code pages can be different on different computers, or can be changed for a single computer, leading to data corruption. For the most consistent results, applications should use Unicode, such as UTF-8 (code page 65001) or UTF-16, instead of a specific code page, unless legacy standards or data formats prevent the use of Unicode. If use of Unicode is not possible, applications should tag the data stream with the appropriate encoding name when protocols allow it. HTML, XML, and HTTP files allow tagging, but text files do not.
int WideCharToMultiByte(
UINT CodePage,
DWORD dwFlags,
LPCWSTR lpWideCharStr,
int cchWideChar,
LPSTR lpMultiByteStr,
int cbMultiByte,
LPCSTR lpDefaultChar,
LPBOOL lpUsedDefaultChar
);
Parameters
- CodePage
- [in] Code page to use in performing the conversion. This parameter can be set to the value of any code page that is installed or available in the operating system. For a list of code pages, see Code Page Identifiers. Your application can also specify one of the values shown in the following table.
| Value | Meaning |
| CP_ACP | The current system Windows ANSI code page. This value can be different on different computers, even on the same network. It can be changed on the same computer, leading to stored data becoming irrecoverably corrupted. This value is only intended for temporary use and permanent storage should be done using UTF-16 or UTF-8 if possible. |
| CP_MACCP | The current system Macintosh code page.This value can be different on different computers, even on the same network. It can be changed on the same computer, leading to stored data becoming irrecoverably corrupted. This value is only intended for temporary use and permanent storage should be done using UTF-16 or UTF-8 if possible.
Note: This value is used primarily in legacy code and should not generally be needed since modern Macintosh computers use Unicode for encoding. |
| CP_OEMCP | The current system OEM code page. This value can be different on different computers, even on the same network. It can be changed on the same computer, leading to stored data becoming irrecoverably corrupted. This value is only intended for temporary use and permanent storage should be done using UTF-16 or UTF-8 if possible. |
| CP_SYMBOL | Windows 2000 and later: Symbol code page (42). |
| CP_THREAD_ACP | Windows 2000 and later: The Windows ANSI code page for the current thread. This value can be different on different computers, even on the same network. It can be changed on the same computer, leading to stored data becoming irrecoverably corrupted. This value is only intended for temporary use and permanent storage should be done using UTF-16 or UTF-8 if possible. |
| CP_UTF7 | Windows 98/Me, Windows NT 4.0 and later: UTF-7. Use this value only when forced by a 7-bit transport mechanism. Use of UTF-8 is preferred. With this value set, lpDefaultChar and lpUsedDefaultChar must be set to null pointers. |
| CP_UTF8 | Windows 98/Me, Windows NT 4.0 and later: UTF-8. With this value set, lpDefaultChar and lpUsedDefaultChar must be set to null pointers |
Note: On Windows 95, the Microsoft Layer for Unicode enables WideCharToMultiByte to support CP_UTF7 and CP_UTF8.
- dwFlags
- [in] Flags indicating the conversion type. The application can specify a combination of the following values. The function performs more quickly when none of these flags is set. The application should specify WC_NO_BEST_FIT_CHARS and WC_COMPOSITECHECK with the specific value WC_DEFAULTCHAR to retrieve all possible conversion results. If all three values are not provided, some results will be missing.
| Value | Meaning |
| WC_NO_BEST_FIT_CHARS | Windows 98/Me and Windows 2000 and later: Translate any Unicode characters that do not translate directly to multibyte equivalents to the default character specified by lpDefaultChar. In other words, if translating from Unicode to multibyte and back to Unicode again does not yield the same Unicode character, the function uses the default character. This flag can be used by itself or in combination with the other defined flags.
|
| WC_COMPOSITECHECK | Convert composite characters, consisting of a base character and a nonspacing character, each with different character values. Translate these characters to precomposed characters, which have a single character value for a base-nonspacing character combination. For example, in the character è, the e is the base character and the accent grave mark is the nonspacing character.
Your application can combine WC_COMPOSITECHECK with any one of the following flags, with the default being WC_SEPCHARS. These flags determine the behavior of the function when no precomposed mapping for a base-nonspacing character combination in a wide character string is available. If none of these flags is supplied, the function behaves as if the WC_SEPCHARS flag is set. For more information, see WC_COMPOSITECHECK and related flags in the Remarks section.
WC_DISCARDNSDiscard nonspacing characters during conversion.
WC_SEPCHARSDefault. Generate separate characters during conversion.
WC_DEFAULTCHARReplace exceptions with the default character during conversion.
|
| WC_ERR_INVALID_CHARS | Windows Vista and later: Fail if an invalid input character is encountered. If this flag is not set, the function silently drops illegal code points. A call to GetLastError returns ERROR_NO_UNICODE_TRANSLATION. Note that this flag only applies when CodePage is specified as CP_UTF8. It cannot be used with other code page values. |
For the code pages listed below, dwFlags must be 0. Otherwise, the function fails with ERROR_INVALID_FLAGS.
| 50220
50221
50222
50225
| 50227
50229
52936
54936
| 57002 through 57011
65000 (UTF7)
42 (Symbol)
|
Note: For the code page 65001 (UTF-8), dwFlags must be set to either 0 or WC_ERR_INVALID_CHARS. Otherwise, the function fails with ERROR_INVALID_FLAGS.
- lpWideCharStr
- [in] Pointer to the wide character string to convert.
- cchWideChar
- [in] Size, in WCHAR values, of the string indicated by lpWideCharStr. If this parameter is set to -1, the function assumes the string to be null-terminated and calculates the length automatically, including the null terminator. If cchWideChar is set to 0, the function fails.
- lpMultiByteStr
- [out] Pointer to a buffer that receives the converted string.
- cbMultiByte
- [in] Size, in bytes, of the buffer indicated by lpMultiByteStr. If this parameter is set to 0, the function returns the required buffer size for lpMultiByteStr and makes no use of the output parameter itself.
- lpDefaultChar
- [in] Pointer to the character to use if a wide character cannot be represented in the specified code page. The application sets this parameter to a null pointer if the function is to use a system default value. To obtain the system default character, the application can call the GetCPInfo or GetCPInfoEx function.
For the CP_UTF7 and CP_UTF8 settings for CodePage, this parameter must be set to a null pointer. Otherwise, the function fails with ERROR_INVALID_PARAMETER.
- lpUsedDefaultChar
- [out] Pointer to a flag that indicates if the function has used a default character in the conversion. The flag is set to TRUE if one or more characters in the source string cannot be represented in the specified code page. Otherwise, the flag is set to FALSE. This parameter can be set to a null pointer.
For the CP_UTF7 and CP_UTF8 settings for CodePage, this parameter must be set to a null pointer. Otherwise, the function fails with ERROR_INVALID_PARAMETER.
Return Values
Returns the number of bytes written to the buffer pointed to by lpMultiByteStr if successful. The number includes the byte for the terminating null character.
If the function succeeds and cbMultiByte is 0, the return value is the required size, in bytes, for the buffer indicated by lpMultiByteStr.
The function returns 0 if it does not succeed. To get extended error information, the application can call GetLastError. GetLastError can return one of the following error codes:
- ERROR_INSUFFICIENT_BUFFER
- ERROR_INVALID_FLAGS
- ERROR_INVALID_PARAMETER
Remarks
Security Alert
Using the
WideCharToMultiByte function incorrectly can compromise the security of your application. Calling this function can easily cause a buffer overrun because the size of the input buffer indicated by
lpWideCharStr equals the number of WCHAR values in the string, while the size of the output buffer indicated by
lpMultiByteStr equals the number of bytes. To avoid a buffer overrun, your application must specify a buffer size appropriate for the data type the buffer receives.
Data converted from Unicode UTF-16 to non-Unicode code pages (code pages other than UTF-7 or UTF-8) is subject to data loss, because a code page might not be able to represent every character used in the specific Unicode data. For more information, see Security Considerations: International Features.
For strings that require validation, such as file, resource, and user names, the application should always use the WC_NO_BEST_FIT_CHARS flag with WideCharToMultiByte. This flag prevents the function from mapping characters to characters that appear similar but have very different semantics. In some cases, the semantic change can be extreme. For example, the symbol for "∞" (infinity) maps to 8 (eight) in some code pages.
The lpMultiByteStr and lpWideCharStr pointers must not be the same. If they are the same, the function fails, and GetLastError returns ERROR_INVALID_PARAMETER.
WideCharToMultiByte does not null-terminate an output string if the input string length is explicitly specified without a terminating null character. To null-terminate an output string for this function, the application should pass in -1 or explicitly count the null terminator for the input string.
If cbMultiByte is less than cchWideChar, this function writes the number of characters specified by cbMultiByte to the buffer indicated by lpMultiByteStr. However, if CodePage is set to CP_SYMBOL and cbMultiByte is less than cchWideChar, the function writes no characters to lpMultiByteStr.
The WideCharToMultiByte function operates most efficiently when both lpDefaultChar and lpUsedDefaultChar are set to null pointers. The following table shows the behavior of the function for the four possible combinations of these parameters.
| lpDefaultChar | lpUsedDefaultChar | Result |
| NULL | NULL | No default checking. These parameter settings are the most efficient ones for use with this function. |
| non-NULL | NULL | Uses the specified default character, but does not set lpUsedDefaultChar. |
| NULL | non-NULL | Uses the system default character and sets lpUsedDefaultChar if necessary. |
| non-NULL | non-NULL | Uses the specified default character and sets lpUsedDefaultChar if necessary. |
Windows Vista and later: This function fully conforms with the Unicode 4.1 specification for UTF-8 and UTF-16. The function used on earlier operating systems encodes or decodes lone surrogate halves or mismatched surrogate pairs. Code written in earlier versions of Windows that rely on this behavior to encode random non-text binary data might run into problems. However, code that uses this function on valid UTF-8 operating systems will behave the same way on Windows Vista and later as on earlier Windows operating systems.
Windows 95 and Windows NT 4.0: The WC_NO_BEST_FIT_CHARS flag is not available on these operating systems. If your application must run on these platforms, you can "round-trip" the string using MultiByteToWideChar. Any code point that does not round-trip is a best-fit character.
Windows 95/98/Me: A version of WideCharToMultiByte is included in these operating systems, but a more extensive version of the function is supported by the Microsoft Layer for Unicode. To use this version, you must add certain files to your application, as outlined in Microsoft Layer for Unicode on Windows 95/98/Me Systems.
WC_COMPOSITECHECK and related flags
As discussed in the Unicode Normalization topic, Unicode allows multiple representations of the same string (interpreted linguistically). For example, Capital A with dieresis (umlaut) can be represented either precomposed as a single Unicode code point "Ä" (U+00C4) or decomposed as the combination of Capital A and the combining dieresis character ("A" + "¨", that is U+0041 U+0308). However, most code pages provide only composed characters.
The WC_COMPOSITECHECK flag causes the WideCharToMultiByte function to test for decomposed Unicode characters and attempt to compose them before converting them to the requested code page. This flag is only available for conversion to single byte (SBCS) or double byte (DBCS) code pages (code pages < 50000, excluding code page 42). If your application needs to convert decomposed Unicode data to single byte or double byte code pages, this flag might be useful. However, not all characters can be converted this way and it is more reliable to save and store such data as Unicode.
When an application is using WC_COMPOSITECHECK, some character combinations might remain incomplete or might have additional nonspacing characters left over. For example, A + ¨ + ¨ combines to Ä + ¨. Using the WC_DISCARDNS flag causes the function to discard additional nonspacing characters. Using the WC_DEFAULTCHAR flag causes the function to use the default replacement character (typically "?") instead. Using the WC_SEPCHARS flag causes the function to attempt to convert each additional nonspacing character to the target code page. Usually this flag also causes the use of the replacement character ("?"). However, for code page 1258 (Vietnamese) and 20269, nonspacing characters exist and can be used. The conversions for these code pages are not perfect. Some combinations do not convert correctly to code page 1258, and WC_COMPOSITECHECK corrupts data in code page 20269. As mentioned earlier, it is more reliable to design your application to save and store such data as Unicode.
Windows normally represents Unicode strings with precomposed data, making the use of the WC_COMPOSITECHECK flag unnecessary. The less common applications that create decomposed data cannot accurately represent many decomposed character combinations in most code pages. Unicode is the preferred way to save and store such data.
Example
For an example, see Looking Up a User's Full Name.
Windows NT/2000/XP/Vista: Included in Windows NT 3.1 and later.
Windows 95/98/Me: Included in Windows 95 and later.
Header: Declared in Winnls.h; include Windows.h.
Library: Use Kernel32.lib.
See Also
Unicode and Character Sets, Unicode and Character Set Functions, MultiByteToWideChar