The first one is the wide character to multi-byte character conversion function. The function prototype is as follows:
int WideCharToMultiByte(
UINT CodePage,
< p>DWORD dwFlags,LPCWSTR lpWideCharStr,
int cchWideChar,
LPSTR lpMultiByteStr,
int cbMultiByte,
< p>LPCSTR lpDefaultChar,LPBOOL lpUsedDefaultChar
);
This function converts a wide string into a specified new string, such as ANSI, UTF8, etc. , the new string does not have to be a multibyte character set. Parameters:
CodePage: Specify the character set code page to be converted to. It can be any installed or system-provided character set. You can also use one of the code pages shown below.
CP_ACP Current system ANSI code page
CP_MACCP Current system Macintosh code page
CP_OEMCP Current system OEM code page, an original equipment manufacturer hardware scan code< /p>
CP_SYMBOL Symbol code page, used in Windows 2000 and later versions, I don’t understand what it is
CP_THREAD_ACP Current thread ANSI code page, used in Windows 2000 and later versions, I don’t understand what it is What
CP_UTF7 UTF-7, both lpDefaultChar and lpUsedDefaultChar must be NULL when setting this value
CP_UTF8 UTF-8, both lpDefaultChar and lpUsedDefaultChar must be NULL when setting this value
I think the most commonly used ones are CP_ACP and CP_UTF8. The former converts wide characters to ANSI and the latter converts to UTF8.
dwFlags: Specifies how to handle characters without conversion, but the function will run faster without this parameter. I always set it to 0. The settable values ??are as shown in the following table:
WC_NO_BEST_FIT_CHARS Convert Unicode characters that cannot be directly converted into corresponding multi-byte characters into the default characters specified by lpDefaultChar. In other words, if you convert Unicode to multi-byte characters and then convert back again, you will not necessarily get the same Unicode characters, because the default characters may be used in the meantime. This option can be used alone or in conjunction with other options.
WC_COMPOSITECHECK converts composite characters into premade characters. It can be used in combination with any of the last three options, and if not combined with any of them, is the same as the option WC_SEPCHARS.
WC_ERR_INVALID_CHARS This option will cause the function to fail and return when it encounters invalid characters, and GetLastError will return the error code ERROR_NO_UNICODE_TRANSLATION. Otherwise the function will automatically discard illegal characters. This option can only be used with UTF8.
WC_DISCARDNS discards characters that do not occupy space during conversion, used together with WC_COMPOSITECHECK
WC_SEPCHARS generates separate characters during conversion, this is the default conversion option, used together with WC_COMPOSITECHECK
WC_DEFAULTCHAR uses default characters instead of exception characters when converting (the most common ones are '?'), used together with WC_COMPOSITECHECK.
When WC_COMPOSITECHECK is specified, the function converts composite characters into prefabricated characters. Composite characters consist of a base character and a character that does not occupy space (such as the phonetic symbols of European countries and Chinese Pinyin), each of which has a different character value. A prepared character has a single character value that represents a composite of a base character and a space-free character.
When specifying the WC_COMPOSITECHECK option, you can also use the last three options listed in the above table to customize the conversion rules for premade characters.
These options determine the behavior of the function when it encounters a wide string of composite characters that do not have a corresponding prefabricated character. They are used together with WC_COMPOSITECHECK. If neither is specified, the function defaults to WC_SEPCHARS.
For the following code pages, dwFlags must be 0, otherwise the function returns error code ERROR_INVALID_FLAGS.
50220 50221 50222 50225 50227 50229 52936 54936 57002 to 57011 65000(UTF7) 42(Symbol)
For UTF8, dwFlags must be 0 or WC_ERR_INVALID_CHARS, otherwise the function will fail and return Set the error code ERROR_INVALID_FLAGS, which you can get by calling GetLastError.
lpWideCharStr: The wide string to be converted.
cchWideChar: The length of the wide string to be converted, -1 means converting to the end of the string.
lpMultiByteStr: Receive the buffer that outputs the new string after conversion.
cbMultiByte: Output buffer size, if 0, lpMultiByteStr will be ignored and the function will return the required buffer size without using lpMultiByteStr.
lpDefaultChar: A pointer to a character. This character is used as the default character when the corresponding character cannot be found in the specified encoding. If NULL, the system default characters are used. If this parameter is used for dwFlags that require this parameter to be NULL, the function will fail and return and set the error code ERROR_INVALID_PARAMETER.
lpUsedDefaultChar: Pointer to the switch variable to indicate whether the default character has been used. If this parameter is used for dwFlags that require this parameter to be NULL, the function will fail and return and set the error code ERROR_INVALID_PARAMETER. If both lpDefaultChar and lpUsedDefaultChar are set to NULL, the function will be faster.
Return value: If the function is successful and cbMultiByte is not 0, return the number of bytes written to lpMultiByteStr (including null at the end of the string); if cbMultiByte is 0, return the number required for conversion
< p>Number of bytes. The function fails and returns 0.Note: Improper use of the function WideCharToMultiByte will affect the security of the program. Calling this function can easily cause a memory leak because the input buffer size pointed to by lpWideCharStr is the number of wide characters, while the output buffer size pointed to by lpMultiByteStr is the number of bytes. To avoid memory leaks, make sure you specify an appropriate size for your output buffer. My approach was to first call WideCharToMultiByte once to get the required buffer size with cbMultiByte 0, allocate space for the buffer, and then call WideCharToMultiByte again to fill the buffer, as detailed in the code below. Additionally, converting from Unicode UTF16 to a non-Unicode character set may result in data loss because the character set may not be able to find the characters that represent the specific Unicode data.
wchar_t* pwszUnicode = "Holle, word! Hello, China! ";
int iSize;
char* pszMultiByte;
iSize = WideCharToMultiByte(CP_ACP, 0, pwszUnicode, -1, NULL, 0, NULL, NULL);
pszMultiByte = (char*)malloc((iSize+1)/**sizeof(char) */);
WideCharToMultiByte(CP_ACP, 0, pwszUnicode, -1, pszMultiByte, iSize, NULL, NULL);
The second one is multi-byte character to wide character conversion Function, the function prototype is as follows:
> int MultiByteToWideChar(
UINT CodePage,
DWORD dwFlags,
LPCSTR lpMultiByteStr,
int cbMultiByte,
LPWSTR lpWideCharStr,
int cchWideChar
);
This function converts a multi-byte string into Convert to wide string (Unicode), the string to be converted is not necessarily multi-byte.
For the parameters, return value and precautions of this function, please refer to the description of the function WideCharToMultiByte above. Here is only a brief explanation of dwFlags.
dwFlags: Specifies whether to convert to premade characters or synthetic wide characters, whether to use hieroglyphs for control characters, and how to handle invalid characters.
MB_PRECOMPOSED always uses prefabricated characters, that is, when there is a single prefabricated character, decomposed base characters and non-space characters will not be used. This is the default option of the function and cannot be used together with MB_COMPOSITE
MB_COMPOSITE always uses decomposition characters, that is, always uses base characters + non-space characters
MB_ERR_INVALID_CHARS Set this option, The function fails when encountering illegal characters and returns the error code ERROR_NO_UNICODE_TRANSLATION, otherwise the illegal characters are discarded
MB_USEGLYPHCHARS uses hieroglyphic characters instead of control characters
For the following code pages, dwFlags must be 0, otherwise The function returns error code ERROR_INVALID_FLAGS.
50220 50221 50222 50225 50227 50229 52936 54936 57002 to 57011 65000(UTF7) 42(Symbol)
For UTF8, dwFlags must be 0 or MB_ERR_INVALID_CHARS, otherwise the function will all fail and return Error code ERROR_INVALID_FLAGS.
I have never used the following functions, so I will only briefly describe them.
int GetTextCharset( HDC hdc );
This function gets the character set of the currently selected device context table, which is equivalent to GetTextCharsetInfo(hdc, NULL, 0).
Return value: The character set identifier is returned successfully, and DEFAULT_CHARSET is returned upon failure