International Features
MultiByteToWideChar
Maps a character string to a wide character (Unicode UTF-16) string. The character string mapped by this function is not necessarily from a multibyte character set.
Note: The ANSI code pages can be different on different computers, or can be changed for a single computer, leading to data corruption. For the most consistent results, applications should use Unicode, such as UTF-8 (code page 65001) or UTF-16, instead of a specific code page, unless legacy standards or data formats prevent the use of Unicode. If use of Unicode is not possible, applications should tag the data stream with the appropriate encoding name when protocols allow it. HTML, XML, and HTTP files allow tagging, but text files do not.
int MultiByteToWideChar(
UINT CodePage,
DWORD dwFlags,
LPCSTR lpMultiByteStr,
int cbMultiByte,
LPWSTR lpWideCharStr,
int cchWideChar
);
Parameters
- CodePage
- [in] Code page to use in performing the conversion. This parameter can be set to the value of any code page that is installed or available in the operating system. For a list of code pages, see Code Page Identifiers. Your application can also specify one of the values shown in the following table.
| Value | Meaning |
| CP_ACP | The current system Windows ANSI code page. This value can be different on different computers, even on the same network. It can be changed on the same computer, leading to stored data becoming irrecoverably corrupted. This value is only intended for temporary use and permanent storage should be done using UTF-16 or UTF-8 if possible. |
| CP_MACCP | The current system Macintosh code page. This value can be different on different computers, even on the same network. It can be changed on the same computer, leading to stored data becoming irrecoverably corrupted. This value is only intended for temporary use and permanent storage should be done using UTF-16 or UTF-8 if possible.
Note: This value is used primarily in legacy code and should not generally be needed since modern Macintosh computers use Unicode for encoding. |
| CP_OEMCP | The current system OEM code page. This value can be different on different computers, even on the same network. It can be changed on the same computer, leading to stored data becoming irrecoverably corrupted. This value is only intended for temporary use and permanent storage should be done using UTF-16 or UTF-8 if possible. |
| CP_SYMBOL | Windows 2000 and later: Symbol code page (42). |
| CP_THREAD_ACP | Windows 2000 and later: The Windows ANSI code page for the current thread. This value can be different on different computers, even on the same network. It can be changed on the same computer, leading to stored data becoming irrecoverably corrupted. This value is only intended for temporary use and permanent storage should be done using UTF-16 or UTF-8 if possible. |
| CP_UTF7 | Windows 98/Me, Windows NT 4.0 and later: UTF-7. Use this value only when forced by a 7-bit transport mechanism. Use of UTF-8 is preferred. |
| CP_UTF8 | Windows 98/Me, Windows NT 4.0 and later: UTF-8. |
Note: On Windows 95, the Microsoft Layer for Unicode enables MultiByteToWideChar to support CP_UTF7 and CP_UTF8.
- dwFlags
- [in] Flags indicating the conversion type. The application can specify a combination of the following values, with MB_PRECOMPOSED being the default. MB_PRECOMPOSED and MB_COMPOSITE are mutually exclusive. MB_USEGLYPHCHARS and MB_ERR_INVALID_CHARS can be set regardless of the state of the other flags.
| Value | Meaning |
| MB_PRECOMPOSED | Default; do not use with MB_COMPOSITE. Always use precomposed characters, that is, characters having a single character value for a base or nonspacing character combination. For example, in the character è, the e is the base character and the accent grave mark is the nonspacing character. If a single Unicode code point is defined for a character, the application should use it instead of a separate base character and a nonspacing character. For example, Ä is represented by the single Unicode code point LATIN CAPITAL LETTER A WITH DIAERESIS (U+00C4).
|
| MB_COMPOSITE | Always use decomposed characters, that is, characters in which a base character and one or more nonspacing characters each have distinct code point values. For example, Ä is represented by A + ¨: LATIN CAPITAL LETTER A (U+0041) + COMBINING DIAERESIS (U+0308). Note that this flag cannot be used with MB_PRECOMPOSED.
|
| MB_ERR_INVALID_CHARS | Windows 2000 Service Pack 4, Windows XP and later: Fail if an invalid input character is encountered. A call to GetLastError returns ERROR_NO_UNICODE_TRANSLATION.
Windows Vista and later: The function does not drop illegal code points if the application does not set this flag.
|
| MB_USEGLYPHCHARS | Use glyph characters instead of control characters. |
For the code pages listed below, dwFlags must be set to 0. Otherwise, the function fails with ERROR_INVALID_FLAGS.
| 50220
50221
50222
50225
| 50227
50229
52936
54936
| 57002 through 57011
65000 (UTF-7)
42 (Symbol)
|
Note: For UTF-8, dwFlags must be set to either 0 or MB_ERR_INVALID_CHARS. Otherwise, the function fails with ERROR_INVALID_FLAGS.
- lpMultiByteStr
- [in] Pointer to the character string to convert.
- cbMultiByte
- [in] Size, in bytes, of the string indicated by the lpMultiByteStr parameter. Alternatively, this parameter can be set to -1 if the string is null-terminated. Note that, if cbMultiByte is 0, the function fails.
If this parameter is -1, the function processes the entire input string, including the null terminator. Therefore, the resulting wide character string has a null terminator, and the length returned by the function includes the terminating null character.
If this parameter is set to a positive integer, the function processes exactly the specified number of bytes. If the provided size does not include a null terminator, the resulting wide character string is not null-terminated, and the returned length does not include the terminating null character.
- lpWideCharStr
- [out] Pointer to a buffer that receives the converted string.
- cchWideChar
- [in] Size, in WCHAR values, of the buffer indicated by lpWideCharStr. If this value is 0, the function returns the required buffer size, in WCHAR values, including any terminating null character, and makes no use of the lpWideCharStr buffer.
Return Values
Returns the number of WCHAR values written to the buffer indicated by lpWideCharStr if successful. If the function succeeds and cchWideChar is 0, the return value is the required size for the buffer indicated by lpWideCharStr.
The function returns 0 if it does not succeed. To get extended error information, the application can call GetLastError. GetLastError can return one of the following error codes:
- ERROR_INSUFFICIENT_BUFFER
- ERROR_INVALID_FLAGS
- ERROR_INVALID_PARAMETER
- ERROR_NO_UNICODE_TRANSLATION
ERROR_NO_UNICODE_TRANSLATION is returned if the input string is a UTF-8 string that contains invalid characters and the MB_ERR_INVALID_CHARS flag is set.
Remarks
Security Alert
Using the MultiByteToWideChar function incorrectly can compromise the security of your application. Calling this function can easily cause a buffer overrun because the size of the input buffer indicated by lpMultiByteStr equals the number of bytes in the string, while the size of the output buffer indicated by lpWideCharStr equals the number of WCHAR values. To avoid a buffer overrun, your application must specify a buffer size appropriate for the data type the buffer receives. For more information, see Security Considerations: International Features.
The default behavior of this function is to translate to a precomposed form of the input character string. If a precomposed form does not exist, the function attempts to translate to a composite form.
The lpMultiByteStr and lpWideCharStr pointers must not be the same. If they are the same, the function fails, and GetLastError returns the value ERROR_INVALID_PARAMETER.
MultiByteToWideChar does not null-terminate an output string if the input string length is explicitly specified without a terminating null character. To null-terminate an output string for this function, the application should pass in -1 or explicitly count the null terminator for the input string.
The function fails if MB_ERR_INVALID_CHARS is set and an invalid character is encountered in the source string. An invalid character is one of the following:
- A character that is not the default character in the source string but translates to the default character when MB_ERR_INVALID_CHARS is not set
- For DBCS strings, a character that has a lead byte but no valid trail byte
When an invalid character is found, and MB_ERR_INVALID_CHARS is set, the function returns 0 and sets
GetLastError with the error code ERROR_NO_UNICODE_TRANSLATION.
Windows Vista and later: This function fully conforms with the Unicode 4.1 specification for UTF-8 and UTF-16. The function used on earlier operating systems encodes or decodes lone surrogate halves or mismatched surrogate pairs. Code written in earlier versions of Windows that rely on this behavior to encode random non-text binary data might run into problems. However, code that uses this function on valid UTF-8 operating systems will behave the same way on Windows Vista and later as on earlier Windows operating systems.
Windows XP and later: To prevent the security problem of the non-shortest-form versions of UTF-8 characters, MultiByteToWideChar deletes these characters.
Windows 95/98/Me: A version of MultiByteToWideChar is included in these operating systems, but a more extensive version of the function is supported by the Microsoft Layer for Unicode. To use this version, you must add certain files to your application, as outlined in Microsoft Layer for Unicode on Windows 95/98/Me Systems.
Example
For an example, see Looking Up a User's Full Name.
Windows NT/2000/XP/Vista: Included in Windows NT 3.1 and later.
Windows 95/98/Me: Included in Windows 95 and later.
Header: Declared in Winnls.h; include Windows.h.
Library: Use Kernel32.lib.
See Also
Unicode and Character Sets, Unicode and Character Set Functions, WideCharToMultiByte