本文介绍了为什么GCC不使用LOAD(无栅栏)和STORE + SFENCE用于顺序一致性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是四种在x86 / x86_64中实现顺序一致性的方法:

Here are four approaches to make Sequential Consistency in x86/x86_64:


  1. LOAD(无栅栏)和STORE + MFENCE

  2. LOAD(无围栏)和LOCK XCHG

  3. MFENCE + LOAD和STORE(无围栏)

  4. LOCK XADD 0)and STORE(without fence)

  1. LOAD(without fence) and STORE+MFENCE
  2. LOAD(without fence) and LOCK XCHG
  3. MFENCE+LOAD and STORE(without fence)
  4. LOCK XADD(0) and STORE(without fence)

正如这里所写:


  • 加载Seq_Cst:MOV from memory)

  • Store Seq Cst:(LOCK)XCHG //
    alternative:MOV(into memory),MFENCE

注意:有一个C / C ++ 11到x86的替代映射,而不是锁定(或屏蔽)Seq Cst存储锁/ Seq Cst加载:

Note: there is an alternative mapping of C/C++11 to x86, which instead of locking (or fencing) the Seq Cst store locks/fences the Seq Cst load:


$ b b

GCC 4.8.2(x86_64中的GDB)对,即LOAD(无栅栏)和STORE + MFENCE:

GCC 4.8.2(GDB in x86_64) uses first(1) approach for C++11-std::memory_order_seq_cst, i.e. LOAD(without fence) and STORE+MFENCE:

std::atomic<int> a;
int temp = 0;
a.store(temp, std::memory_order_seq_cst);
0x4613e8  <+0x0058>         mov    0x38(%rsp),%eax
0x4613ec  <+0x005c>         mov    %eax,0x20(%rsp)
0x4613f0  <+0x0060>         mfence

我们知道,MFENCE = LFENCE + SFENCE。然后这个代码我们可以改写为: LOAD(无围栏)和STORE + LFENCE + SFENCE

As we know, that MFENCE = LFENCE+SFENCE. Then this code we can rewrite to this: LOAD(without fence) and STORE+LFENCE+SFENCE

问题:


  1. 为什么我们不需要在LOAD之前使用LFENCE,并且需要在STORE之后使用LFENCE

  2. 为什么GCC不使用方法:LOAD(无栅栏)和STORE + SFENCE for std :: memory_order_seq_cst?


推荐答案

std :: atomic< int> :: store 到编译器内在 __ atomic_store_n 。 (此和其他原子操作内在函数在此处记录:。) _n 后缀使它类型通用;后端实际上实现了特定大小的变体,以字节为单位。 int 在x86是AFAIK总是32位长,所以这意味着我们正在寻找 __ atomic_store_4 的定义。 说, __ atomic_store 操作对应于名为 atomic_store 模式 的机器描述模式;对应于4字节整数的模式是SI(,特别是这一位:

std::atomic<int>::store is mapped to the compiler intrinsic __atomic_store_n. (This and other atomic-operation intrinsics are documented here: Built-in functions for memory model aware atomic operations.) The _n suffix makes it type-generic; the back-end actually implements variants for specific sizes in bytes. int on x86 is AFAIK always 32 bits long, so that means we're looking for the definition of __atomic_store_4. The internals manual for this version of GCC says that the __atomic_store operations correspond to machine description patterns named atomic_store‌mode; the mode corresponding to a 4-byte integer is "SI" (that's documented here), so we are looking for something called "atomic_storesi" in the x86 machine description. And that brings us to config/i386/sync.md, specifically this bit:

(define_expand "atomic_store<mode>"
  [(set (match_operand:ATOMIC 0 "memory_operand")
        (unspec:ATOMIC [(match_operand:ATOMIC 1 "register_operand")
                        (match_operand:SI 2 "const_int_operand")]
                       UNSPEC_MOVA))]
  ""
{
  enum memmodel model = (enum memmodel) (INTVAL (operands[2]) & MEMMODEL_MASK);

  if (<MODE>mode == DImode && !TARGET_64BIT)
    {
      /* For DImode on 32-bit, we can use the FPU to perform the store.  */
      /* Note that while we could perform a cmpxchg8b loop, that turns
         out to be significantly larger than this plus a barrier.  */
      emit_insn (gen_atomic_storedi_fpu
                 (operands[0], operands[1],
                  assign_386_stack_local (DImode, SLOT_TEMP)));
    }
  else
    {
      /* For seq-cst stores, when we lack MFENCE, use XCHG.  */
      if (model == MEMMODEL_SEQ_CST && !(TARGET_64BIT || TARGET_SSE2))
        {
          emit_insn (gen_atomic_exchange<mode> (gen_reg_rtx (<MODE>mode),
                                                operands[0], operands[1],
                                                operands[2]));
          DONE;
        }

      /* Otherwise use a store.  */
      emit_insn (gen_atomic_store<mode>_1 (operands[0], operands[1],
                                           operands[2]));
    }
  /* ... followed by an MFENCE, if required.  */
  if (model == MEMMODEL_SEQ_CST)
    emit_insn (gen_mem_thread_fence (operands[2]));
  DONE;
})

没有大量的细节, C函数体,将调用该函数生成低级中间表示的原子存储操作。当您的示例代码调用时,< MODE> mode!= DImode model == MEMMODEL_SEQ_CST code> TARGET_SSE2 为真,因此它将调用 gen_atomic_store< mode> _1 ,然后 gen_mem_thread_fence 。后一个函数总是生成 mfence 。 (在这个文件中有代码产生 sfence ,但我相信它只用于显式编码 _mm_sfence < xmmintrin.h> )。)

Without going into a great deal of detail, the bulk of this is a C function body that will be called to generate the low-level "RTL" intermediate representation of the atomic store operation. When it's invoked by your example code, <MODE>mode != DImode, model == MEMMODEL_SEQ_CST, and TARGET_SSE2 is true, so it will call gen_atomic_store<mode>_1 and then gen_mem_thread_fence. The latter function always generates mfence. (There is code in this file to produce sfence, but I believe it is only used for explicitly-coded _mm_sfence (from <xmmintrin.h>).)

评论表明在这种情况下需要MFENCE。我总结说, 您错误地认为不需要加载栅栏,这是GCC中漏掉的优化bug。这是不是,例如,如何使用编译器的错误。

The comments suggest that someone thought MFENCE was required in this case. I conclude that either you are mistaken to think a load fence is not required, or this is a missed optimization bug in GCC. It is not, for instance, an error in how you are using the compiler.

这篇关于为什么GCC不使用LOAD(无栅栏)和STORE + SFENCE用于顺序一致性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-17 15:13