原子操作和code一代海合会

本文介绍了原子操作和code一代海合会的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我curring看着由GCC原子操作产生了一些组件。我尝试以下短序列：

I am curring looking at some assembly generated for atomic operations by gcc. I tried the following short sequence:

int x1;
int x2;

int foo;

void test()
{
  __atomic_store_n( &x1, 1, __ATOMIC_SEQ_CST );
  if( __atomic_load_n( &x2  ,__ATOMIC_SEQ_CST ))
    return;

  foo = 4;
}

综观香草萨特的原子武器叨唠code一代，他提到，X86手动授权使用 XCHG 原子能商店和一个简单的 MOV 用于原子读取。所以我期待的线沿线的东西：

Looking at Herb Sutter's atomic weapons talk on code generation, he mentions that the X86 manual mandates to use xchg for atomic stores and a simple mov for atomic reads. So I was expecting something along the lines of:

test():
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    $1, %eax
    xchg    %eax, x1(%rip)
    movl    x2(%rip), %eax
    testl   %eax, %eax
    setne   %al
    testb   %al, %al
    je      .L2
    jmp     .L1
.L2:
    movl    $4, foo(%rip)
.L1:
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc

当内存栅栏是因为锁定 XCHG 指令的含蓄的。

但是，如果我编译这个使用 GCC -march = core2的-S test.cc 我收到以下内容：

However if I compile this using gcc -march=core2 -S test.cc I get the following:

test():
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    $1, %eax
    movl    %eax, x1(%rip)
    mfence
    movl    x2(%rip), %eax
    testl   %eax, %eax
    setne   %al
    testb   %al, %al
    je      .L2
    jmp     .L1
.L2:
    movl    $4, foo(%rip)
.L1:
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc

所以，而是采用了 XCHG 运行GCC在这里使用了 MOV + MFENCE 组合。什么是这里的原因code一代，它从一个又根据香草萨特？

So instead of using a xchg operation gcc here uses a mov + mfence combination. What is the reason here for this code generation, which differs from the one mandated by the x86 architecture according to Herb Sutter?

推荐答案

的 XCHG 指令暗示锁语义当目标是一个内存位置。这意味着你可以用原子内存位置的内容交换寄存器的内容。

The xchg instruction has implied lock semantics when the destination is a memory location. What this means is you can swap the contents of a register with the contents of a memory location atomically.

在这个问题的例子是做一个原子存储，而不是交换。 x86架构的内存模型保证在多处理器/由一个线程进行多核系统门店将按照这个顺序被其他线程看到...因此内存的举动就足够了。话虽如此，也有旧的英特尔CPU和一些克隆那里有这方面的错误，以及 XCHG 要求作为这些CPU的解决方法。查看该自旋锁维基百科文章的显着优化部分：

The example in the question is doing an atomic store, not a swap. The x86 architecture memory model guarantees that in a multi-processor/multi-core system stores done by one thread will be seen in that order by other threads... therefore a memory move is sufficient. Having said that, there are older Intel CPUs and some clones where there are bugs in this area, and an xchg is required as a workaround on those CPUs. See the Significant optimizations section of this wikipedia article on spinlocks:

其中规定

简单实现上述使用x86架构的所有CPU的作品。然而，一些性能优化是可能的：

在以后的x86架构的实现，spin_unlock可以安全地使用解锁MOV，而不是速度较慢的锁定XCHG。这是由于支持这种微妙的内存排序规则，即使MOV不是一个完整的内存屏障。然而，有些处理器（一些的Cyrix处理器，英特尔奔腾Pro的某些版本（由于错误），以及更早的Pentium和i486的SMP系统）会做错误的事情，并由锁保护的数据可能会损坏。在大多数非x86架构，外显记忆障碍或原子操作（如例子）必须使用。在一些系统上，比如IA-64，还有一些提供所需的内存排序特殊的解锁的说明。

On later implementations of the x86 architecture, spin_unlock can safely use an unlocked MOV instead of the slower locked XCHG. This is due to subtle memory ordering rules which support this, even though MOV is not a full memory barrier. However, some processors (some Cyrix processors, some revisions of the Intel Pentium Pro (due to bugs), and earlier Pentium and i486 SMP systems) will do the wrong thing and data protected by the lock could be corrupted. On most non-x86 architectures, explicit memory barrier or atomic instructions (as in the example) must be used. On some systems, such as IA-64, there are special "unlock" instructions which provide the needed memory ordering.

内存屏障， MFENCE ，确保所有门店已完成（在CPU核心存储缓冲区是空的，存储在缓存或内存的值），这也保证了没有未来负载运行不正常。

The memory barrier, mfence, ensures that all stores have completed (store buffers in the CPU core are empty and values stored in the cache or memory), it also ensures that no future loads execute out of order.

一个事实，MOV足以解锁互斥（无需序列化或内存屏障）的正式澄清由英特尔建筑师Linus Torvalds公司的答复早在1999年

The fact a MOV is sufficient to unlock the mutex (no serialization or memory barrier required) was "officially" clarified in a reply to Linus Torvalds by an Intel architect back in 1999

我想这是后来发现并没有为一些较旧的x86处理器的工作。

I guess it was later discovered that didn't work for some older x86 processors.

这篇关于原子操作和code一代海合会的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！