API.但我刚刚发现以下内容:I have a function that converts strings from various encodings into Unicode-16 used internally by Windows. For that I used MultiByteToWideChar API. But I just discovered that the following://See how much data do we need?//UNIT nCodePage = 1201; // just as an exampleUINT nchLen = ::MultiByteToWideChar(nCodePage, 0, pByteArrayToConvert, ncbSzByteArrayToConvert, NULL, 0);if(!nchLen){ //Failed}以下 Unicode 代码页失败,错误代码为 ERROR_INVALID_PARAMETER (87):fails for the following Unicode code pages with error code ERROR_INVALID_PARAMETER (87):> 1200 utf-16 Unicode UTF-16, little endian byte order> 1201 unicodeFFFE Unicode UTF-16, big endian byte order> 12000 utf-32 Unicode UTF-32, little endian byte order> 12001 utf-32BE Unicode UTF-32, big endian byte order知道为什么以及如何进行这些转换吗?Any idea why and how to do those conversions?推荐答案Windows 根本不支持 UTF-32,您必须手动实现.Windows does not support UTF-32 at all, you have to implement that manually.MultiByteToWideChar() 不支持从 UTF-16 或 UTF-32 转换.另一方面,对于代码页 1200 和 1201,您的输入数据已经采用 UTF-16.MultiByteToWideChar() 输出 UTF-16LE 数据,因此对于代码页 1200 只需按原样返回输入数据,对于代码页 1201 只需交换每个 UTF-16 代码单元的字节序.但是对于代码页 12000 和 12001,您必须手动转换数据(或者使用第三方库,或者如果您使用 C++11 或更高版本,则使用 STL 的内置 UTF-16/32 转换).MultiByteToWideChar() does not support conversions from UTF-16 or UTF-32. On the other hand, for codepages 1200 and 1201, your input data is already in UTF-16. MultiByteToWideChar() outputs UTF-16LE data, so for codepage 1200 just return the input data as-is, and for codepage 1201 simply swap the endian of each UTF-16 codeunit. But for codepages 12000 and 12001, you will have to convert the data manually (or use a 3rd party library, or the STL's built-in UTF-16/32 conversions if you are using C++11 or later).尝试这样的事情:UINT BytesToUTF16LE(UINT CodePage, LPCSTR lpMultiByteStr, int cbMultiByte, LPWSTR lpWideCharStr, int cchWideChar){ UINT nchLen; switch (nCodePage) { case 1200: // UTF-16LE case 1201: // UTF-16BE { if ((!lpMultiByteStr) || (cbMultiByte < 0) || (cchWideChar < 0)) { ::SetLastError(ERROR_INVALID_PARAMETER); return 0; } cbMultiByte /= 2; nchLen = cbMultiByte; if (lpWideCharStr) { if (cchWideChar < nchLen) { ::SetLastError(ERROR_INSUFFICIENT_BUFFER); return 0; } if (nCodePage == 1200) CopyMemory(lpWideCharStr, lpMultiByteStr, nchLen * 2); else { UINT16 pCodeUnits = (UINT16) lpMultiByteStr; for (int i = 0; i < cbMultiByte; ++i) { lpWideCharStr[i] = (WCHAR) ( ((pCodeUnits[i] << 8) & 0xFF00) | ((pCodeUnits[i] >> 8) & 0x00FF) ); } } } ::SetLastError(0); break; } case 12000: // UTF-32LE case 12001: // UTF-32BE { if ((!lpMultiByteStr) || (cbMultiByte < 0) || (cchWideChar < 0)) { ::SetLastError(ERROR_INVALID_PARAMETER); return 0; } PUINT32 pCodePoints = (PUINT32) lpMultiByteStr; cbMultiByte /= 4; nchLen = 0; for(int i = 0; i < cbMultiByte; ++i) { UINT32 CodePoint = pCodePoints[i]; if (nCodePage == 12001) { CodePoint = ( ((CodePoint >> 24) & 0x000000FF) | ((CodePoint >> 8 ) & 0x0000FF00) | ((CodePoint << 8 ) & 0x00FF0000) | ((CodePoint << 24) & 0xFF000000) ); } if (CodePoint < 0x10000) { if (lpWideCharStr) { if (cchWideChar < 1) { ::SetLastError(ERROR_INSUFFICIENT_BUFFER); return 0; } *lpWideCharStr++ = (WCHAR) (CodePoint & 0xFFFF); --cchWideChar; } ++nchLen; } else if (CodePoint <= 0x10FFFF) { if (lpWideCharStr) { if (cchWideChar < 2) { ::SetLastError(ERROR_INSUFFICIENT_BUFFER); return 0; } CodePoint -= 0x10000; *lpWideCharStr++ = (WCHAR) (0xD800 + ((CodePoint >> 10) & 0x3FF)); *lpWideCharStr++ = (WCHAR) (0xDC00 + (CodePoint & 0x3FF)); cchWideChar -= 2; } nchLen += 2; } else { ::SetLastError(ERROR_NO_UNICODE_TRANSLATION); return 0; } } ::SetLastError(0); break; } default: nchLen = ::MultiByteToWideChar(nCodePage, 0, lpMultiByteStr, cbMultiByte, lpWideCharStr, cchWideChar); break; } return nchLen;}然后你可以这样做:UINT nchLen = BytesToUTF16LE(nCodePage, pByteArrayToConvert, ncbSzByteArrayToConvert, NULL, 0)if ((!nchLen) && (GetLastError() != 0)){ //Failed}...BytesToUTF16LE(nCodePage, pByteArrayToConvert, ncbSzByteArrayToConvert, ...) 这篇关于MultiByteToWideChar 用于 Unicode 代码页 1200、1201、12000、12001的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
10-27 10:09