本文介绍了为什么在C#中使用结构Vector3I而不是三个整数要慢得多?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

限时删除!!

我正在3D网格中处理大量数据,因此我想实现一个简单的迭代器,而不是三个嵌套循环.但是,我遇到了一个性能问题:首先,我仅使用int x,y和z变量实现了一个简单的循环.然后,我实现了Vector3I结构并使用它-计算时间加倍.现在,我正在为这个问题而苦苦挣扎-为什么呢?我做错了什么?

I'm processing lots of data in a 3D grid so I wanted to implement a simple iterator instead of three nested loops. However, I encountered a performance problem: first, I implemented a simple loop using only int x, y and z variables. Then I implemented a Vector3I structure and used that - and the calculation time doubled. Now I'm struggling with the question - why is that? What did I do wrong?

复制示例:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Runtime.CompilerServices;

public struct Vector2I
{
    public int X;
    public int Y;
    public int Z;

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public Vector2I(int x, int y, int z)
    {
        this.X = x;
        this.Y = y;
        this.Z = z;
    }
}

public class IterationTests
{
    private readonly int _countX;
    private readonly int _countY;
    private readonly int _countZ;
    private Vector2I _Vector = new Vector2I(0, 0, 0);


    public IterationTests()
    {
        _countX = 64;
        _countY = 64;
        _countZ = 64;
    }

    [Benchmark]
    public void NestedLoops()
    {
        int countX = _countX;
        int countY = _countY;
        int countZ = _countZ;

        int result = 0;

        for (int x = 0; x < countX; ++x)
        {
            for (int y = 0; y < countY; ++y)
            {
                for (int z = 0; z < countZ; ++z)
                {
                    result += ((x ^ y) ^ (~z));
                }
            }
        }
    }

    [Benchmark]
    public void IteratedVariables()
    {
        int countX = _countX;
        int countY = _countY;
        int countZ = _countZ;

        int result = 0;

        int x = 0, y = 0, z = 0;
        while (true)
        {
            result += ((x ^ y) ^ (~z));

            ++z;
            if (z >= countZ)
            {
                z = 0;
                ++y;

                if (y >= countY)
                {
                    y = 0;
                    ++x;

                    if (x >= countX)
                    {
                        break;
                    }
                }
            }
        }
    }

    [Benchmark]
    public void IteratedVector()
    {
        int countX = _countX;
        int countY = _countY;
        int countZ = _countZ;

        int result = 0;

        Vector2I iter = new Vector2I(0, 0, 0);
        while (true)
        {
            result += ((iter.X ^ iter.Y) ^ (~iter.Z));

            ++iter.Z;
            if (iter.Z >= countZ)
            {
                iter.Z = 0;
                ++iter.Y;

                if (iter.Y >= countY)
                {
                    iter.Y = 0;
                    ++iter.X;

                    if (iter.X >= countX)
                    {
                        break;
                    }
                }
            }
        }
    }

    [Benchmark]
    public void IteratedVectorAvoidNew()
    {
        int countX = _countX;
        int countY = _countY;
        int countZ = _countZ;

        int result = 0;

        Vector2I iter = _Vector;

        iter.X = 0;
        iter.Y = 0;
        iter.Z = 0;
        while (true)
        {
            result += ((iter.X ^ iter.Y) ^ (~iter.Z));

            ++iter.Z;
            if (iter.Z >= countZ)
            {
                iter.Z = 0;
                ++iter.Y;

                if (iter.Y >= countY)
                {
                    iter.Y = 0;
                    ++iter.X;

                    if (iter.X >= countX)
                    {
                        break;
                    }
                }
            }
        }
    }
}

public static class Program
{
    public static void Main(string[] args)
    {
        BenchmarkRunner.Run<IterationTests>();
    }
}

我测量的结果:

                 Method |     Mean |     Error |    StdDev |
----------------------- |---------:|----------:|----------:|
            NestedLoops | 333.9 us | 4.6837 us | 4.3811 us |
      IteratedVariables | 291.0 us | 0.8792 us | 0.6864 us |
         IteratedVector | 702.1 us | 4.8590 us | 4.3073 us |
 IteratedVectorAvoidNew | 725.8 us | 6.4850 us | 6.0661 us |

注意:"IteratedVectorAvoidNew"之所以存在,是因为讨论可能是问题出在Vector3I的 new 运算符中-最初,我使用了自定义迭代循环并使用秒表进行了测量.

Note: the 'IteratedVectorAvoidNew' is there due to discussion that the problem might lie in the new operator of Vector3I - originally, I used a custom iteration loop and measured with a stopwatch.

另外,这是我在256×256×256区域上进行迭代的基准:

Additionally, a benchmark of when I iterate over a 256×256×256 area:

                 Method |     Mean |     Error |    StdDev |
----------------------- |---------:|----------:|----------:|
            NestedLoops | 18.67 ms | 0.0504 ms | 0.0446 ms |
      IteratedVariables | 18.80 ms | 0.2006 ms | 0.1877 ms |
         IteratedVector | 43.66 ms | 0.4525 ms | 0.4232 ms |
 IteratedVectorAvoidNew | 43.36 ms | 0.5316 ms | 0.4973 ms |

我的环境:

  • Intel(R)Core(TM)2四核CPU Q6600 @ 2.40GHz
  • Windows 10(64位)
  • Visual Studio 2017
  • 语言:C#
  • 是的,我选择了发布配置

注释:

我当前的任务是将现有代码重写为:a)支持更多功能,b)更快.另外,我正在处理大量数据-这是整个应用程序的当前瓶颈,因此,这不是过早的优化.

My current task is to rewrite existing code to a) support more features, b) be faster. Also I'm working on lots of data - this is the current bottleneck of the whole application so no, it's not a premature optimization.

将嵌套循环重写为一个-我并不想在那里进行优化.我只需要多次编写这样的迭代,因此只想简化代码,仅此而已.但是,由于它是代码的性能关键部分,因此我正在评估设计中的此类更改.现在,当我看到将三个变量存储到一个结构中时,我的处理时间就会增加一倍……我很害怕使用这样的结构……

Rewriting nested loops into one - I'm not trying to optimize there. I just need to write such iterations many times, so simply wanted to simplify the code, nothing more. But because it's a performance-critical part of the code, I'm measuring such changes in design. Now, when I see that simply by storing three variables into a struct I double the processing time... I'm quite scared of using structs like that...

推荐答案

这与内存访问和寄存器访问之间的区别有关.

This relates to the difference between a memory access and a register access.

TL; DR:
使用原始变量,所有内容都可以放入寄存器中,而使用struct,则必须从堆栈中访问所有内容,这就是内存访问.访问寄存器的速度明显快于访问内存的速度.

TL;DR:
With raw variables everything can be placed into registers, whereas with a struct everything has to be accessed from the stack, which is a memory access. Accessing a register is significantly faster than accessing memory.

现在,进入完整说明:

C#是在启动时通过JIT编译的(这与JVM稍有不同,但是现在不重要),因此,我们可以看到生成的实际程序集(请检查以查看它).

C# is JIT compiled at launch (this is slightly different from the JVM, but that isn't important right now), because of this we can see the actual assembly generated (check here for how to view it).

为此,我仅比较 IteratedVariables IteratedVector ,因为您将获得这些的基本要点.首先,我们有 IteratedVariables :

For this I am only comparing IteratedVariables and IteratedVector because you're going to get the general gist with just these. First we have IteratedVariables:

                    ; int countX = 64;
in   al, dx
push edi
push esi
push ebx
                    ; int result = 0;
xor ebx, ebx
                    ; int x = 0, y = 0, z = 0;
xor edi, edi
                    ; int x = 0, y = 0, z = 0;
xor ecx, ecx
xor esi, esi
                    ; while(true) {
                    ;     result += ((x ^ y) ^ (~z));
LOOP:
    mov eax, edi
    xor eax, ecx
    mov edx, esi
    not edx
    xor eax, edx
    add ebx, eax
                    ; ++z;
    inc esi
                    ; if(z >= countZ)
    cmp esi, 40h
    jl  LOOP
                    ; {
                    ;     z = 0;
    xor esi, esi
                    ; ++y;
    inc ecx
                    ; if(y >= countY)
    cmp ecx, 40h
    jl  LOOP
                    ; {
                    ;     y = 0;
    xor ecx, ecx
                    ; ++x;
    inc edi
                    ; if(x >= countX)
    cmp edi, 40h
    jl  LOOP
                    ; {
                    ;     break;
                    ; } } } }
                    ; return result;
mov eax, ebx
pop ebx
pop esi
pop edi
pop ebp
ret

我已经完成了一些清理代码的工作,所有注释(标有分号(; )的行)均来自实际的C#代码(这些代码是为我生成的),为了简洁起见,我已经对其进行了一些清理.您在这里应该注意的主要事情是,所有内容都在访问寄存器,没有原始内存访问(可以通过寄存器名称周围的 [] 来识别原始内存访问).

I've done a little work to clean up the code, all of the comments (lines marked with semicolons (;)) are from the actual C# code (these were generated for me), I've cleaned them up a bit for brevity. The primary thing you should notice here is that everything is accessing a register, there is no raw memory access (A raw memory access can be somewhat identified by [] around a register name).

在第二个示例( IteratedVector )中,我们将看到略有不同的代码段:

In the second example (IteratedVector) we will see a slightly different code piece:

                                    ; int countX = 64;
push ebp
mov  ebp, esp
sub  esp, 0Ch
xor  eax, eax
mov  dword ptr [ebp-0Ch], eax
mov  dword ptr [ebp-8],   eax
mov  dword ptr [ebp-4],   eax
                                    ; int result = 0;
xor ecx,ecx
                                    ; Vector3i iter = new Vector3i(0, 0, 0);
mov dword ptr [ebp-0Ch], ecx
mov dword ptr [ebp-8],   ecx
mov dword ptr [ebp-4],   ecx
                                    ; while(true) {
                                    ;     result += ((iter.X ^ iter.Y) ^ (~iter.Z));
LOOP:
    mov eax, dword ptr [ebp-0Ch]
    xor eax, dword ptr [ebp-8]
    mov edx, dword ptr [ebp-4]
    not edx
    xor eax, edx
    add ecx, eax
                                    ; ++iter.Z;
    lea eax, [ebp-4]
    inc dword ptr [eax]
                                    ; if(iter.Z >= countZ)
    cmp dword ptr [ebp-4], 40h
    jl  LOOP
                                    ; {
                                    ;     iter.Z = 0;
    xor edx, edx
    mov dword ptr [ebp-4], edx
                                    ; ++iter.Y;
    lea eax, [ebp-8]
    inc dword ptr [eax]
                                    ; if(iter.Y >= countY)
    cmp dword ptr [ebp-8], 40h
    jl  LOOP
                                    ; {
                                    ;     iter.Y = 0;
    xor edx, edx
    mov dword ptr [ebp-8], edx
                                    ; ++iter.X;
    lea eax, [ebp-0Ch]
    inc dword ptr [eax]
                                    ; if(iter.X >= countX)
    cmp dword ptr [ebp-0Ch], 40h
    jl  LOOP
                                    ; {
                                    ;     break;
                                    ; } } } }
                                    ; return result;
mov eax, ecx
mov esp, ebp
                                    ;  {
                                    ;      break;
                                    ;  } } } }
                                    ;  return result;
pop ebp
ret

在这里,您会清楚地注意到很多原始内存访问,它们用方括号( [] )标识,它们还带有标签 dword ptr ,不要不必担心这意味着什么,只需将其视为 Memory Access .您会注意到这里的代码充满了他们.它们无处不在发生从结构进行值访问的情况.

Here you will distinctly notice lot's of raw memory accesses, they are identified by the square brackets ([]), they also have the tag dword ptr, don't worry too much about what that means, but just think of it as Memory Access. You will notice that the code here is riddled with them. They are everywhere that a value access from the struct occurs.

这就是为什么结构代码这么慢,寄存器紧挨着CPU(实际上是在它里面),但是内存距离很远的原因,即使它位于CPU高速缓存中,它也仍然要慢得多.访问然后注册.

This is the reason why the struct code is so much slower, registers are right next to the CPU (literally inside it), but memory is far away, even if it is in the CPU cache it will still be significantly slower to access then registers.

这篇关于为什么在C#中使用结构Vector3I而不是三个整数要慢得多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

1403页,肝出来的..

09-06 10:14