将UnicodeString转换为AnsiString

本文介绍了将UnicodeString转换为AnsiString的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述在旧的时候，我有一个函数可以将 WideString 转换为指定代码的 AnsiString page： function WideStringToString（const Source：WideString; CodePage：UINT）：AnsiString; ... begin ... //使用代码页将源UTF-16字符串（WideString）转换到目标位置 strLen：= WideCharToMultiByte （CodePage，0， PWideChar（Source），Length（Source），// Source PAnsiChar（cpStr），strLen，// Destination nil，nil）; ... end; 一切都奏效。我传递函数a unicode 字符串（即UTF-16编码数据），并将其转换为 AnsiString ，理解为 AnsiString 表示指定代码页的字符。例如： TUnicodeHelper.WideStringToString（'Ŧĥεqùíçķƀřǭŵņfôxǰűmpεďōvêŗţħěłáƶÿďơǥ'，1252）; 将返回 Windows-1252 编码字符串： qùíçkbrownfôxjumpedovêrlázÿdog 但Windows WideChartoMultiByte 做得很好， - 映射; 现在以后的时间现在我们在后来的时间。 WideString 现在是一个pariah， UnicodeString 是善良。这是一个无关紧要的变化因为Windows功能只需要一个指针到一系列 WideChar （其中一个 UnicodeString 也是）。所以我们更改声明来使用 UnicodeString funtion WideStringToString const来源：UnicodeString; CodePage：UINT）：AnsiString; begin ... end; 现在我们来到返回值。我有一个 AnsiString ，其中包含字节： 54 68 65 20 71 F9 ED E7qùíç 6B 20 62 72 6F 77 6E 20 k brown 66 F4 78 20 6A 75 6D 70fôxjump 65 64 20 6F 76 EA 72 20 edovêr 74 68 65 20 6C E1 7A FF thelázÿ 20 64 6F 67 dog 在古老的时代，这是罚款。我跟踪了实际包含的 AnsiString 的代码页。我不得不记住，返回的 AnsiString 未使用计算机的区域设置进行编码（例如Windows 1258），而是使用另一个代码页进行编码（ CodePage 代码页）。但是在Delphi XE6中，一个 AnsiString 还秘密地包含代码页： codePage： 1258 长度 44 价值： qùíçkbrownfôxjumpedovêrthelázÿ狗此代码页错误。 Delphi正在指定我的电脑的代码页，而不是字符串的代码页。从技术上讲，这并不是一个问题，我一直都明白， AnsiString 在一个特定的代码页面中，我只需要确保传递这些信息。所以当我想解码字符串时，我不得不传递代码页： s：= TUnicodeHeper.StringToWideString（s，1252）; 与 function StringToWideString（s：AnsiString; CodePage：UINT）：UnicodeString; begin ... MultiByteToWideChar（...）; ... end; 然后一个人拧紧所有东西问题是，在旧的时候，我宣布一个类型叫做$ code> Utf8String ： type Utf8String = type AnsiString; 因为这是很常见的： function TUnicodeHelper.WideStringToUtf8（const s：UnicodeString）：Utf8String; begin 结果：= WideStringToString（s，CP_UTF8）; 结束相反： function TUnicodeHelper.Utf8ToWideString（const s：Utf8String）：UnicodeString; begin 结果：= StringToWideString（s，CP_UTF8）; 结束现在在XE6中，我有一个功能，需要 a Utf8String 。如果某些现有的代码采用UTF-8编码的 AnsiString ，并尝试使用 Utf8ToWideString 将其转换为UnicodeString将失败： s：AnsiString; s：= UnicodeStringToString（'Ŧĥεqùíçķƀřǭŵņfôxǰűmpεďōvêŗţħěłáƶÿďơǥ'，CP_UTF8）; ... ws：UnicodeString; ws：= Utf8ToWideString（s）; // Delphi会把它视为CP1252，并将其转换为UTF8 或者更糟的是，现有代码： s：Utf8String; s：= UnicodeStringToString（'Ŧĥεqùíçķƀřǭŵņfôxǰűmpεďōvêŗţħěłáƶÿďơǥ'，CP_UTF8）; 返回的字符串将变得完全失效：该函数返回 AnsiString（1252）（ AnsiString 代码页）返回结果存储在 AnsiString（65001） string（ Utf8String ） Delphi将UTF-8编码的字符串转换为UTF-8，就好像是1252。如何前进理想情况下，我的 UnicodeStringToString（string，codePage） function返回一个 AnsiString ）可以在字符串中设置 CodePage ，以使用类似 SetCodePage ： function UnicodeStringToString（s：UnicodeString; CodePage：UINT）：AnsiString; begin ... WideCharToMultiByte（...）; ... //调整AnsiString中包含的代码页来匹配现实 // SetCodePage（Result，CodePage，False）; SetCodePage仅适用于RawByteString 如果Length（Result）> 0然后 PStrRec（PByte（Result） - SizeOf（StrRec））。codePage：= CodePage; 结束除了手动加载内部结构 AnsiString 是非常危险的。那么返回 RawByteString ？已经说过，很多不是我的人， RawByteString 意在成为通用的收件人;它不是作为一个返回参数： function UnicodeStringToString（s：UnicodeString; CodePage：UINT）：RawByteString; begin ... WideCharToMultiByte（...）; ... //调整AnsiString中包含的代码页来匹配现实 SetCodePage（Result，CodePage，False）; SetCodePage仅适用于RawByteString end; 这具有能够使用支持和记录的 SetCodePage 。但是，如果我们要跨越一行，并开始返回 RawByteString ，当然Delphi已经有一个函数可以将 UnicodeString 转换为 RawByteString 字符串，反之亦然： function WideStringToString（const s：UnicodeString; CodePage：UINT）：RawByteString; begin 结果：= SysUtils.Something（s，CodePage）; 结束函数StringToWideString（const s：RawByteString; CodePage：UINT）：UnicodeString; begin 结果：= SysUtils.SomethingElse（s，CodePage）; 结束但是是什么？或我还应该做些什么？这是一个琐碎问题的长篇大论。当然，真正的问题是，我该怎么做？有很多代码取决于 UnicodeStringToString 而不是相反的。 tl; dr： / h1> 我可以通过执行以下操作将 UnicodeString 转换为UTF： Utf8Encode（'Ŧĥεqùíçķƀřǭŵņfôxǰűmpεďōvêŗţħěłáƶÿďơǥ'）; ，我可以将 UnicodeString 转换为目前的代码页使用： AnsiString（'Ŧĥεqùíçķƀřǭŵņfôxǰűmpεďōvêŗţħěłáƶÿďơǥ'）; 但是，如何将 UnicodeString 转换为一个任意的（未指定的）代码页？我的感觉是，因为一切真的是一个 AnsiString ： Utf8String = AnsiString（65001）; RawByteString = AnsiString（65535）; 我应该咬住子弹，胸围打开 AnsiString 结构，并在其中插入正确的代码页： function StringToAnsi（const s：UnicodeString; CodePage：UINT）：AnsiString; begin LocaleCharsFromUnicode（CodePage，...，s，...）; ... 如果长度（结果）> 0然后 PStrRec（PByte（Result） - SizeOf（StrRec））。codePage：= CodePage; 结束然后VCL的其余部分将落在一起。解决方案在这种特殊情况下，使用 RawByteString 是一个适当的解决方案： function WideStringToString（const Source：UnicodeString; CodePage：UINT）：RawByteString; var strLen：Integer; begin strLen：= LocaleCharsFromUnicode（CodePage，0，PWideChar（Source），Length（Source），nil，0，nil，nil））; 如果strLen> 0然后 begin SetLength（Result，strLen）; LocaleCharsFromUnicode（CodePage，0，PWideChar（Source），Length（Source），PAnsiChar（Result），strLen，nil，nil））; SetCodePage（Result，CodePage，False）; 结束结束这样， RawByteString 保存代码页，并将 RawByteString 分配给任何其他字符串类型，无论是 AnsiString 还是 UTF8String 或任何，将允许RTL自动将 RawByteString 数据从当前代码页转换为目标字符串的代码页（其中包括转换为 UnicodeString ）。如果你绝对必须返回一个 AnsiString 不推荐），您仍然可以通过类型转换使用 SetCodePage（）： function WideStringToString（const Source：UnicodeString; CodePage：UINT）：AnsiString; var strLen：Integer; begin strLen：= LocaleCharsFromUnicode（CodePage，0，PWideChar（Source），Length（Source），nil，0，nil，nil））; 如果strLen> 0然后 begin SetLength（Result，strLen）; LocaleCharsFromUnicode（CodePage，0，PWideChar（Source），Length（Source），PAnsiChar（Result），strLen，nil，nil））; SetCodePage（PRawByteString（@Result）^，CodePage，False）; 结束结束相反是更容易，只需使用已存储在（ Ansi | RawByte）String （只需确保这些代码总是准确的），因为RTL已经知道如何检索和使用代码页：函数StringToWideString（const Source：AnsiString）：UnicodeString; begin 结果：= UnicodeString（Source）; 结束 function StringToWideString（const Source：RawByteString）：UnicodeString; begin 结果：= UnicodeString（Source）; 结束就是说，我建议完全删除帮助函数，而只是使用类型的字符串。让RTL处理您的转换：键入 Win1252String = type AnsiString（1252）; var s：UnicodeString; a：Win1252String; begin s：='Ŧĥεqùíçķƀřǭŵņfôxǰűmpεďōvêŗţħěłáƶÿďơǥ'; a：= Win1252String（s）; s：= UnicodeString（a）; 结束 var s：UnicodeString; u：UTF8String; begin s：='Ŧĥεqùíçķƀřǭŵņfôxǰűmpεďōvêŗţħěłáƶÿďơǥ'; u：= UTF8String（s）; s：= UnicodeString（u）; 结束 In the olden times, i had a function that would convert a WideString to an AnsiString of the specified code-page:function WideStringToString(const Source: WideString; CodePage: UINT): AnsiString;...begin ... // Convert source UTF-16 string (WideString) to the destination using the code-page strLen := WideCharToMultiByte(CodePage, 0, PWideChar(Source), Length(Source), //Source PAnsiChar(cpStr), strLen, //Destination nil, nil); ...end;And everything worked. I passed the function a unicode string (i.e. UTF-16 encoded data) and converted it to an AnsiString, with the understanding that the bytes in the AnsiString represented characters from the specified code-page.For example:TUnicodeHelper.WideStringToString('Ŧĥε ｑùíçķ ƀřǭŵņ fôｘ ǰűmpεď ōvêŗ ţħě łáƶÿ ďơǥ', 1252);would return the Windows-1252 encoded string:The qùíçk brown fôx jumped ovêr the lázÿ dogBut the Windows WideChartoMultiByte does a pretty good job of best-fit mapping; as it is designed to do.Now the after timesNow we are in the after times. WideString is now a pariah, with UnicodeString being the goodness. It's an inconsequential change; as the Windows function only needed a pointer to a series of WideChar anyway (which a UnicodeString also is). So we change the declaration to use UnicodeString instead:funtion WideStringToString(const Source: UnicodeString; CodePage: UINT): AnsiString;begin ...end;Now we come to the return value. i have an AnsiString that contains the bytes:54 68 65 20 71 F9 ED E7 The qùíç6B 20 62 72 6F 77 6E 20 k brown 66 F4 78 20 6A 75 6D 70 fôx jump65 64 20 6F 76 EA 72 20 ed ovêr 74 68 65 20 6C E1 7A FF the lázÿ20 64 6F 67 dogIn the olden times that was fine. I kept track of what code-page the AnsiString actually contained; i had to remember that the returned AnsiString was not encoded using the computer's locale (e.g. Windows 1258), but instead is encoded using another code-page (the CodePage code page).But in Delphi XE6 an AnsiString also secretly contains the codepage:codePage: 1258length: 44value: The qùíçk brown fôx jumped ovêr the lázÿ dogThis code-page is wrong. Delphi is specifying the code-page of my computer, rather than the code-page that the string is. Technically this is not a problem, i always understood that the AnsiString was in a particular code-page, i just had to be sure to pass that information along.So when i wanted to decode the string, i had to pass along the code-page with it:s := TUnicodeHeper.StringToWideString(s, 1252);withfunction StringToWideString(s: AnsiString; CodePage: UINT): UnicodeString;begin ... MultiByteToWideChar(...); ...end;Then one person screws everything upThe problem was that in the olden times i declared a type called Utf8String:type Utf8String = type AnsiString;Because it was common enough to have:function TUnicodeHelper.WideStringToUtf8(const s: UnicodeString): Utf8String;begin Result := WideStringToString(s, CP_UTF8);end;and the reverse:function TUnicodeHelper.Utf8ToWideString(const s: Utf8String): UnicodeString;begin Result := StringToWideString(s, CP_UTF8);end;Now in XE6 i have a function that takes a Utf8String. If some existing code somewhere were take a UTF-8 encoded AnsiString, and try to convert it to UnicodeString using Utf8ToWideString it would fail:s: AnsiString;s := UnicodeStringToString('Ŧĥε ｑùíçķ ƀřǭŵņ fôｘ ǰűmpεď ōvêŗ ţħě łáƶÿ ďơǥ', CP_UTF8);... ws: UnicodeString; ws := Utf8ToWideString(s); //Delphi will treat s an CP1252, and convert it to UTF8Or worse, is the breadth of existing code that does:s: Utf8String;s := UnicodeStringToString('Ŧĥε ｑùíçķ ƀřǭŵņ fôｘ ǰűmpεď ōvêŗ ţħě łáƶÿ ďơǥ', CP_UTF8);The returned string will become totally mangled:the function returns AnsiString(1252) (AnsiString tagged as encoded using the current codepage)the return result is being stored in an AnsiString(65001) string (Utf8String)Delphi converts the UTF-8 encoded string into UTF-8 as though it was 1252.How to move forwardIdeally my UnicodeStringToString(string, codePage) function (which returns an AnsiString) could set the CodePage inside the string to match the actual code-page using something like SetCodePage:function UnicodeStringToString(s: UnicodeString; CodePage: UINT): AnsiString;begin ... WideCharToMultiByte(...); ... //Adjust the codepage contained in the AnsiString to match reality //SetCodePage(Result, CodePage, False); SetCodePage only works on RawByteString if Length(Result) > 0 then PStrRec(PByte(Result) - SizeOf(StrRec)).codePage := CodePage;end;Except that manually mucking around with the internal structure of an AnsiString is horribly dangerous.So what about returning RawByteString?It has been said, over an over, by a lot of people who aren't me that RawByteString is meant to be the universal recipient; it wasn't meant to be as a return parameter:function UnicodeStringToString(s: UnicodeString; CodePage: UINT): RawByteString;begin ... WideCharToMultiByte(...); ... //Adjust the codepage contained in the AnsiString to match reality SetCodePage(Result, CodePage, False); SetCodePage only works on RawByteStringend;This has the virtue of being able to use the supported and documented SetCodePage.But if we're going to cross a line, and start returning RawByteString, surely Delphi already has a function that can convert a UnicodeString to a RawByteString string and vice versa:function WideStringToString(const s: UnicodeString; CodePage: UINT): RawByteString;begin Result := SysUtils.Something(s, CodePage);end;function StringToWideString(const s: RawByteString; CodePage: UINT): UnicodeString;begin Result := SysUtils.SomethingElse(s, CodePage); end;But what is it?Or what else should i do?This was a long-winded set of background for a trivial question. The real question is, of course, what should i be doing instead? There is a lot of code out there that depends on the UnicodeStringToString and the reverse.tl;dr:I can convert a UnicodeString to UTF by doing:Utf8Encode('Ŧĥε ｑùíçķ ƀřǭŵņ fôｘ ǰűmpεď ōvêŗ ţħě łáƶÿ ďơǥ');and i can convert a UnicodeString to the current code-page by using:AnsiString('Ŧĥε ｑùíçķ ƀřǭŵņ fôｘ ǰűmpεď ōvêŗ ţħě łáƶÿ ďơǥ');But how do i convert a UnicodeString to an arbitrary (unspecified) code-page?My feeling is that since everything really is an AnsiString:Utf8String = AnsiString(65001);RawByteString = AnsiString(65535);i should bite the bullet, bust open the AnsiString structure, and poke the correct code-page into it:function StringToAnsi(const s: UnicodeString; CodePage: UINT): AnsiString;begin LocaleCharsFromUnicode(CodePage, ..., s, ...); ... if Length(Result) > 0 then PStrRec(PByte(Result) - SizeOf(StrRec)).codePage := CodePage;end;Then the rest of the VCL will fall in line. 解决方案 In this particular case, using RawByteString is an appropriate solution:function WideStringToString(const Source: UnicodeString; CodePage: UINT): RawByteString;var strLen: Integer;begin strLen := LocaleCharsFromUnicode(CodePage, 0, PWideChar(Source), Length(Source), nil, 0, nil, nil)); if strLen > 0 then begin SetLength(Result, strLen); LocaleCharsFromUnicode(CodePage, 0, PWideChar(Source), Length(Source), PAnsiChar(Result), strLen, nil, nil)); SetCodePage(Result, CodePage, False); end;end;This way, the RawByteString holds the codepage, and assigning the RawByteString to any other string type, whether that be AnsiString or UTF8String or whatever, will allow the RTL to automatically convert the RawByteString data from its current codepage to the destination string's codepage (which includes conversions to UnicodeString).If you absolutely must return an AnsiString (which I do not recommend), you can still use SetCodePage() via a typecast:function WideStringToString(const Source: UnicodeString; CodePage: UINT): AnsiString;var strLen: Integer;begin strLen := LocaleCharsFromUnicode(CodePage, 0, PWideChar(Source), Length(Source), nil, 0, nil, nil)); if strLen > 0 then begin SetLength(Result, strLen); LocaleCharsFromUnicode(CodePage, 0, PWideChar(Source), Length(Source), PAnsiChar(Result), strLen, nil, nil)); SetCodePage(PRawByteString(@Result)^, CodePage, False); end;end;The reverse is much easier, just use the codepage already stored in a (Ansi|RawByte)String (just make sure those codepages are always accurate), since the RTL already knows how to retrieve and use the codepage for you:function StringToWideString(const Source: AnsiString): UnicodeString;begin Result := UnicodeString(Source);end;function StringToWideString(const Source: RawByteString): UnicodeString;begin Result := UnicodeString(Source);end;That being said, I would suggest dropping the helper functions altogether and just use typed strings instead. Let the RTL handle conversions for you:type Win1252String = type AnsiString(1252);var s: UnicodeString; a: Win1252String;begin s := 'Ŧĥε ｑùíçķ ƀřǭŵņ fôｘ ǰűmpεď ōvêŗ ţħě łáƶÿ ďơǥ'; a := Win1252String(s); s := UnicodeString(a);end;var s: UnicodeString; u: UTF8String;begin s := 'Ŧĥε ｑùíçķ ƀřǭŵņ fôｘ ǰűmpεď ōvêŗ ţħě łáƶÿ ďơǥ'; u := UTF8String(s); s := UnicodeString(u);end; 这篇关于将UnicodeString转换为AnsiString的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！