本文介绍了人物分类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

又是一个简单的问题:拥有一个std::string,根据用户的语言和区域设置(区域设置)确定其哪个字符是数字,符号,空格等.

The simple question again: having an std::string, determine which of its characters are digits, symbols, white spaces etc. with respect to the user's language and regional settings (locale).

我设法使用提升语言环境边界分析工具:

std::string text = u8"生きるか死ぬか";

boost::locale::boundary::segment_index<std::string::const_iterator> characters(
    boost::locale::boundary::character,
    text.begin(), text.end(),
    boost::locale::generator()("ja_JP.UTF-8"));

for (const auto& ch : characters) {
    // each 'ch' is a single character in japanese language
}

但是,我进一步看不到有什么方法可以确定ch是数字还是符号还是其他.有 boost字符串分类算法,但是这些似乎都无法使用..无论*segment_index::iterator是什么.

However, I further do not see any way to determine if ch is a digit or a symbol or anything else.There are boost string classification algorithms, but these don't seem to be working with.. whatever *segment_index::iterator is.

也不能使用 std::isalpha(std::locale) ,因为我不确定是否可以将增强段转换为charwchar_t.

是否有任何巧妙的方式来对符号进行分类?

Is there any neat way to classify symbols?

推荐答案

<locale>,但是...您提供的示例文本看起来像UTF-8,这是一种多字节编码,并且<locale>中的功能不适用于多字节编码.

There are a number of functions and objects supporting this in<locale> but... The example text you give looks like UTF-8,which is a multibyte encoding, and the functions in <locale>don't work with multibyte encodings.

我建议您使用ICU库.除其他外事物,它可以测试在中定义的所有属性Unicode字符数据库.它还具有宏或功能用于遍历字符串(或至少是char的数组),一次提取一个UTF_32代码点(这就是您想要的)要测试).

I'd suggest you get the ICU library, and use it. Amongst otherthings, it allows testing for all of the properties defined inthe Unicode Character Database. It also has macros or functionsfor iterating over a string (or at least an array of char),extracting one UTF_32 codepoint at a time (which is what you'dwant to test).

这篇关于人物分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-12 22:01