本文介绍了Windows的C运行时TOUPPER减缓区域集时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在诊断一个跨平台(Windows和Linux)应用程序,其中TOUPPER是Windows慢得多的边缘情况。我假设这是tolower的也一样。

I'm diagnosing an edge case in a cross platform (Windows and Linux) application where toupper is substantially slower on Windows. I'm assuming this is the same for tolower as well.

本来我对每一个简单的C程序测试了这个没有设置,甚至包括头文件的语言环境信息,并有非常小的性能差异。测试是一百万迭代循环调用的字符串各字符对在toupper()函数。

Originally I tested this with a simple C program on each without locale information set or even including the header file and there was very little performance difference. Test was a million iteration loop calling each character for a string to the toupper() function.

包括头文件,包括线下它的慢得多,并调用了大量的MS C运行时库区域设置特定的功能后。这是不错,但对性能的影响是非常糟糕的。在Linux上,这并不似乎有任何影响到所有的性能。

After including the header file and including the line below it's much slower and calls a lot of the MS C runtime library locale specific functions. This is fine but the performance hit is really bad. On Linux this doesn't appear to have any affect at all on performance.

setlocale(LC_ALL, ""); // system default locale

如果我将它设置为运行速度是Linux,但确实出现跳过所有的语言环境的功能。以下

If I set the following it runs as fast as linux but does appear to skip all the locale functions.

setlocale(LC_ALL, NULL); // should be interpreted as the same as below?
OR
setlocale(LC_ALL, "C"); 

请注意:
的Visual Studio 2015年为Windows 10
G ++中针对运行Linux OS美分

Note:Visual Studio 2015 for Windows 10G++ for Linux running Cent OS

试过荷兰设置设置和同样的结局,慢在Windows Linux上没有速度差。

Have tried dutch settings settings and same outcome, slow on Windows no speed difference on Linux.

我做得不对或有在Windows上的区域设置错误或者是它在那里Linux是没有做它应该在其他的方式?
因为我不熟悉linux的所以不知道到底什么它做内部我没有做过调试在Linux应用程序。
我应该测试下一步怎么排序了这一点?

Am I doing something wrong or is there a bug with the locale settings on Windows or is it the other way where linux isn't doing what it should?I haven't done a debug on the linux app as I'm not as familiar with linux so do not know exactly what it's doing internally.What should I test next to sort this out?

code以下测试(Linux)的:

Code below for testing (Linux):

// C++ is only used for timing.  The original program is in C.
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <chrono>
#include <locale.h>

using namespace std::chrono;

void strToUpper(char *strVal);

int main()
{

    typedef high_resolution_clock Clock;
    high_resolution_clock::time_point t1 = Clock::now();

    // set locale
    //setlocale(LC_ALL,"nl_NL");
    setlocale(LC_ALL,"en_US");

    // testing string
    char str[] = "the quick brown fox jumps over the lazy dog";

    for (int i = 0; i < 1000000; i++)
    {
        strToUpper(str);
    }

    high_resolution_clock::time_point t2 = Clock::now();
    duration<double> time_span = duration_cast<duration<double>>(t2 - t1);
    printf("chrono time %2.6f:\n",time_span.count());
}

void strToUpper(char *strVal)
{
    unsigned char *t;
    t = (unsigned char *)strVal;

    while (*t)
    {
        *t = toupper(*t);
        *t++;
    }
}

有关窗口更改本地信息:

For windows change the local information to:

// set locale
//setlocale(LC_ALL,"nld_nld");
setlocale(LC_ALL, "english_us");

您可以看到从分离器中的时间的语言环境变化完成后,句号逗号VS

You can see the locale change from the separator in the time completed, full stop vs comma.

编辑 - 分析数据

正如你可以在上面看到的大部分儿童系统所花费的时间从_toupper_l调用。
如果没有区域信息设置TOUPPER调用不会调用子_toupper_l这使得它非常快。

EDIT - Profiling dataAs you can see above most of the time spent in child system calls from _toupper_l. Without the locale information set the toupper call does NOT call the child _toupper_l which makes it very quick.

推荐答案

相同的(和相当不错),性能LANG = C与LANG =什么都有望为Linux使用glibc的实现。

Identical (and fairly good) performance with LANG=C vs. LANG=anything else is expected for the glibc implementation used by Linux.

您的Linux结果是有意义的。您的测试方法可能是好的。使用分析器看到你的微基准的Windows函数里面花多少时间。如果Windows实现不变成是这个问题,也许有,可以转换整个字符串,如C ++ <$c$c>boost::to_upper_copy<std::string> (除非这是更慢,见下文)。

Your Linux results make sense. Your testing method is probably ok. Use a profiler to see how much time your microbenchmark spends inside the Windows functions. If the Windows implementation does turn out to be the problem, maybe there's a Windows function that can convert whole strings, like the C++ boost::to_upper_copy<std::string> (unless that's even slower, see below).

另外请注意, upcasing ASCII字符串可以SIMD矢量pretty有效即可。我写了一个情况下翻转功能单一的vector in另一个答案,采用C上证所内部函数;它可以适于upcase代替flipcase。这应该是一个巨大的提速,如果你花了很多是超过16个字节长的时间upcasing串,那你知道是ASCII。

Also note that upcasing ASCII strings can be SIMD vectorized pretty efficiently. I wrote a case-flip function for a single vector in another answer, using C SSE intrinsics; it can be adapted to upcase instead of flipcase. This should be a huge speedup if you spend a lot of time upcasing strings that are more than 16 bytes long, and that you know are ASCII.

其实,Boost的。请参阅链接,我的矢量 strtoupper(DST,SRC),这是ASCII只,但可以与检测非ASCII SRC字节时回退延长。

Actually, Boost's to_upper_copy() appears to compile to extremely slow code, like 10x slower than toupper. See that link for my vectorized strtoupper(dst,src), which is ASCII-only but could be extended with a fallback when non-ASCII src bytes are detected.

请问你目前的code处理UTF-8?这里没有支持非ASCII语言环境,如果你认为所有字符都是单字节多少收获。 IIRC,Windows使用UTF-16对大多数的东西,这是不幸的,因为事实证明,世界上想超过2 ^ 16 codepoints。 UTF-16是单向code的可变长度编码,如UTF-8,但没有阅读ASCII的优势。固定宽度有很多优势,但不幸的是你不能用UTF-16假设偶数。 Java的犯了这个错误,也和被套牢UTF-16。

How does your current code handle UTF-8? There's not much gain in supporting non-ASCII locales if you assume that all characters are a single byte. IIRC, Windows uses UTF-16 for most stuff, which is unfortunate because it turned out that the world wanted more than 2^16 codepoints. UTF-16 is a variable-length encoding of Unicode, like UTF-8 but without the advantage of reading ASCII. Fixed-width has a lot of advantage, but unfortunately you can't assume that even with UTF-16. Java made this mistake, too, and is stuck with UTF-16.

是:

#define __ctype_toupper \
     ((int32_t *) _NL_CURRENT (LC_CTYPE, _NL_CTYPE_TOUPPER) + 128)
int toupper (int c) {
    return c >= -128 && c < 256 ? __ctype_toupper[c] : c;
}

从X86-64了ASM的Ubuntu 15.10的 /lib/x86_64-linux-gnu/libc.so.6 是:

## disassembly from  objconv -fyasm -v2 /lib/x86_64-linux-gnu/libc.so.6 /dev/stdout 2>&1
toupper:
    lea     edx, [rdi+80H]                          ; 0002E300 _ 8D. 97, 00000080
    movsxd  rax, edi                                ; 0002E306 _ 48: 63. C7
    cmp     edx, 383                                ; 0002E309 _ 81. FA, 0000017F
    ja      ?_01766                                 ; 0002E30F _ 77, 19
    mov     rdx, qword [rel ?_37923]                ; 0002E311 _ 48: 8B. 15, 00395AA8(rel)
    sub     rax, -128                               ; 0002E318 _ 48: 83. E8, 80
    mov     rdx, qword [fs:rdx]                     ; 0002E31C _ 64 48: 8B. 12
    mov     rdx, qword [rdx]                        ; 0002E320 _ 48: 8B. 12
    mov     rdx, qword [rdx+48H]                    ; 0002E323 _ 48: 8B. 52, 48
    mov     eax, dword [rdx+rax*4]                  ; 0002E327 _ 8B. 04 82   ## the final table lookup, indexing an array of 4B ints
?_01766:
    rep ret                                         ; actual objconv output shows the prefix on a separate line

因此​​,需要及早出,如果arg是不是在0 - 0xFF的范围(因此该分支应该predict完全不采取),否则它找到的表中为当前的区域,这涉及三个指针引用:从全球的一个负载,一个线程本地,还有一解引用。然后,它实际上将指标256项表。

So it takes an early-out if the arg isn't in the 0 - 0xFF range (so this branch should predict perfectly not-taken), otherwise it finds the table for the current locale, which involves three pointer dereferences: one load from a global, and one thread-local, and one more dereference. Then it actually indexes into the 256-entry table.

这是整个库函数;在拆卸 TOUPPER 标签是你的code调用。 (好吧,穿过因为动态链接的PLT一个间接层,但在第一次调用触发延迟符号查找后,这是你的$ C $之间只是一个额外的 JMP 指令c和库中的那些11的insn。)

This is the entire library function; the toupper label in the disassembly is what your code calls. (Well, through a layer of indirection through the PLT because of dynamic linking, but after the first call triggers lazy symbol lookup, it's just one extra jmp instruction between your code and those 11 insns in the library.)

这篇关于Windows的C运行时TOUPPER减缓区域集时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-02 23:42