寻找一个精确的方式微基准小代码路径用C ++编写和运行在Linux / OSX

本文介绍了寻找一个精确的方式微基准小代码路径用C ++编写和运行在Linux / OSX的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找一些非常基本的微基准的小代码路径，如紧凑循环，我用C ++编写。我在Linux和OSX上运行，并使用GCC。什么设施有亚毫秒的准确性？我想一个简单的运行代码路径测试多次（几千万？）将给我足够的一致性，以获得良好的阅读。

I'm looking to do some very basic micro benchmarking of small code paths, such as tight loops, that I've written in C++. I'm running on Linux and OSX, and using GCC. What facilities are there for sub millisecond accuracy? I am thinking a simple test of running the code path many times (several tens of millions?) will give me enough consistency to get a good reading. If anyone knows of preferable methods, please feel free to suggest them.

推荐答案

您可以使用rdtsc x86 / x86_64上的处理器指令。对于多核系统，检查CPUID（linux中的/ proc / cpuinfo）中的constant_tsc功能 - 这意味着所有内核都使用相同的刻度计数器，即使动态频率更改和休眠也是如此。

You can use "rdtsc" processor instruction on x86/x86_64. For multicore systems check the "constant_tsc" capability in CPUID (/proc/cpuinfo in linux) - it will mean that all cores uses the same tick counter, even with dynamic freq changing and sleeping.

如果你的处理器不支持constant_tsc，一定要绑定你的程序到核心（ taskset 在Linux中的实用程序）。

If you processor does not support constant_tsc, be sure to bind you programm to the core (taskset utility in Linux).

在乱序CPU上使用rdtsc时（除了Intel Atom以外，可能还有其他低端cpus），在之前添加一个ordering指令，例如cpuid - 它将临时禁用指令重新排序。

When using rdtsc on out-of-order CPUs (All besides Intel Atom, may be some other low-end cpus), add an "ordering" instruction before, e.g. "cpuid" - it will temporary disable instruction reordering.

此外，MacOsX还有Shark可以测量代码中的一些硬件事件。

Also, MacOsX have "Shark" which can measure some hardware events in your code.

RDTSC和无序cpus。这个伟大的Fog手册的第18节（主要网站是）

RDTSC and out-of-order cpus. Section 18 of this great Fog's manual ( main site of it is http://www.agner.org/optimize/ )

这篇关于寻找一个精确的方式微基准小代码路径用C ++编写和运行在Linux / OSX的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！