为什么GCC不使用LOAD（无栅栏）和STORE + SFENCE用于顺序一致性？

本文介绍了为什么GCC不使用LOAD（无栅栏）和STORE + SFENCE用于顺序一致性？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

以下是四种在x86 / x86_64中实现顺序一致性的方法：

Here are four approaches to make Sequential Consistency in x86/x86_64:

LOAD（无栅栏）和STORE + MFENCE

LOAD（无围栏）和LOCK XCHG

MFENCE + LOAD和STORE（无围栏）

LOCK XADD 0）and STORE（without fence）

LOAD(without fence) and STORE+MFENCE
LOAD(without fence) and LOCK XCHG
MFENCE+LOAD and STORE(without fence)
LOCK XADD(0) and STORE(without fence)

正如这里所写：

加载Seq_Cst：MOV from memory）

Store Seq Cst：（LOCK）XCHG //
alternative：MOV（into memory），MFENCE

注意：有一个C / C ++ 11到x86的替代映射，而不是锁定（或屏蔽）Seq Cst存储锁/ Seq Cst加载：

Note: there is an alternative mapping of C/C++11 to x86, which instead of locking (or fencing) the Seq Cst store locks/fences the Seq Cst load:

$ b b

GCC 4.8.2（x86_64中的GDB）对，即LOAD（无栅栏）和STORE + MFENCE：

GCC 4.8.2(GDB in x86_64) uses first(1) approach for C++11-std::memory_order_seq_cst, i.e. LOAD(without fence) and STORE+MFENCE:

std::atomic<int> a;
int temp = 0;
a.store(temp, std::memory_order_seq_cst);
0x4613e8  <+0x0058>         mov    0x38(%rsp),%eax
0x4613ec  <+0x005c>         mov    %eax,0x20(%rsp)
0x4613f0  <+0x0060>         mfence

我们知道，MFENCE = LFENCE + SFENCE。然后这个代码我们可以改写为： LOAD（无围栏）和STORE + LFENCE + SFENCE

As we know, that MFENCE = LFENCE+SFENCE. Then this code we can rewrite to this: LOAD(without fence) and STORE+LFENCE+SFENCE

问题：

为什么我们不需要在LOAD之前使用LFENCE，并且需要在STORE之后使用LFENCE

为什么GCC不使用方法：LOAD（无栅栏）和STORE + SFENCE for std :: memory_order_seq_cst？

推荐答案

std :: atomic< int> :: store 到编译器内在 __ atomic_store_n 。（此和其他原子操作内在函数在此处记录：。） _n 后缀使它类型通用;后端实际上实现了特定大小的变体，以字节为单位。 int 在x86是AFAIK总是32位长，所以这意味着我们正在寻找 __ atomic_store_4 的定义。说， __ atomic_store 操作对应于名为 atomic_store 模式 的机器描述模式;对应于4字节整数的模式是SI（，特别是这一位：

std::atomic<int>::store is mapped to the compiler intrinsic __atomic_store_n. (This and other atomic-operation intrinsics are documented here: Built-in functions for memory model aware atomic operations.) The _n suffix makes it type-generic; the back-end actually implements variants for specific sizes in bytes. int on x86 is AFAIK always 32 bits long, so that means we're looking for the definition of __atomic_store_4. The internals manual for this version of GCC says that the __atomic_store operations correspond to machine description patterns named atomic_store‌mode; the mode corresponding to a 4-byte integer is "SI" (that's documented here), so we are looking for something called "atomic_storesi" in the x86 machine description. And that brings us to config/i386/sync.md, specifically this bit:

(define_expand "atomic_store<mode>"
  [(set (match_operand:ATOMIC 0 "memory_operand")
        (unspec:ATOMIC [(match_operand:ATOMIC 1 "register_operand")
                        (match_operand:SI 2 "const_int_operand")]
                       UNSPEC_MOVA))]
  ""
{
  enum memmodel model = (enum memmodel) (INTVAL (operands[2]) & MEMMODEL_MASK);

  if (<MODE>mode == DImode && !TARGET_64BIT)
    {
      /* For DImode on 32-bit, we can use the FPU to perform the store.  */
      /* Note that while we could perform a cmpxchg8b loop, that turns
         out to be significantly larger than this plus a barrier.  */
      emit_insn (gen_atomic_storedi_fpu
                 (operands[0], operands[1],
                  assign_386_stack_local (DImode, SLOT_TEMP)));
    }
  else
    {
      /* For seq-cst stores, when we lack MFENCE, use XCHG.  */
      if (model == MEMMODEL_SEQ_CST && !(TARGET_64BIT || TARGET_SSE2))
        {
          emit_insn (gen_atomic_exchange<mode> (gen_reg_rtx (<MODE>mode),
                                                operands[0], operands[1],
                                                operands[2]));
          DONE;
        }

      /* Otherwise use a store.  */
      emit_insn (gen_atomic_store<mode>_1 (operands[0], operands[1],
                                           operands[2]));
    }
  /* ... followed by an MFENCE, if required.  */
  if (model == MEMMODEL_SEQ_CST)
    emit_insn (gen_mem_thread_fence (operands[2]));
  DONE;
})

没有大量的细节， C函数体，将调用该函数生成低级中间表示的原子存储操作。当您的示例代码调用时，< MODE> mode！= DImode ， model == MEMMODEL_SEQ_CST code> TARGET_SSE2 为真，因此它将调用 gen_atomic_store< mode> _1 ，然后 gen_mem_thread_fence 。后一个函数总是生成 mfence 。（在这个文件中有代码产生 sfence ，但我相信它只用于显式编码 _mm_sfence 从< xmmintrin.h> ）。）

Without going into a great deal of detail, the bulk of this is a C function body that will be called to generate the low-level "RTL" intermediate representation of the atomic store operation. When it's invoked by your example code, <MODE>mode != DImode, model == MEMMODEL_SEQ_CST, and TARGET_SSE2 is true, so it will call gen_atomic_store<mode>_1 and then gen_mem_thread_fence. The latter function always generates mfence. (There is code in this file to produce sfence, but I believe it is only used for explicitly-coded _mm_sfence (from <xmmintrin.h>).)

评论表明在这种情况下需要MFENCE。我总结说，您错误地认为不需要加载栅栏，或这是GCC中漏掉的优化bug。这是不是，例如，如何使用编译器的错误。

The comments suggest that someone thought MFENCE was required in this case. I conclude that either you are mistaken to think a load fence is not required, or this is a missed optimization bug in GCC. It is not, for instance, an error in how you are using the compiler.

这篇关于为什么GCC不使用LOAD（无栅栏）和STORE + SFENCE用于顺序一致性？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！