问题描述
以下是四种在x86 / x86_64中实现顺序一致性的方法:
Here are four approaches to make Sequential Consistency in x86/x86_64:
- LOAD(无栅栏)和STORE + MFENCE
- LOAD(无围栏)和LOCK XCHG
- MFENCE + LOAD和STORE(无围栏)
- LOCK XADD 0)and STORE(without fence)
- LOAD(without fence) and STORE+MFENCE
- LOAD(without fence) and LOCK XCHG
- MFENCE+LOAD and STORE(without fence)
- LOCK XADD(0) and STORE(without fence)
正如这里所写:
- 加载Seq_Cst:MOV from memory)
- Store Seq Cst:(LOCK)XCHG //
alternative:MOV(into memory),MFENCE
注意:有一个C / C ++ 11到x86的替代映射,而不是锁定(或屏蔽)Seq Cst存储锁/ Seq Cst加载:
Note: there is an alternative mapping of C/C++11 to x86, which instead of locking (or fencing) the Seq Cst store locks/fences the Seq Cst load:
$ b b
GCC 4.8.2(x86_64中的GDB)对,即LOAD(无栅栏)和STORE + MFENCE:
GCC 4.8.2(GDB in x86_64) uses first(1) approach for C++11-std::memory_order_seq_cst, i.e. LOAD(without fence) and STORE+MFENCE:
std::atomic<int> a;
int temp = 0;
a.store(temp, std::memory_order_seq_cst);
0x4613e8 <+0x0058> mov 0x38(%rsp),%eax
0x4613ec <+0x005c> mov %eax,0x20(%rsp)
0x4613f0 <+0x0060> mfence
我们知道,MFENCE = LFENCE + SFENCE。然后这个代码我们可以改写为: LOAD(无围栏)和STORE + LFENCE + SFENCE
As we know, that MFENCE = LFENCE+SFENCE. Then this code we can rewrite to this: LOAD(without fence) and STORE+LFENCE+SFENCE
问题:
- 为什么我们不需要在LOAD之前使用LFENCE,并且需要在STORE之后使用LFENCE
- 为什么GCC不使用方法:LOAD(无栅栏)和STORE + SFENCE for std :: memory_order_seq_cst?
推荐答案
std :: atomic< int> :: store
到编译器内在 __ atomic_store_n
。 (此和其他原子操作内在函数在此处记录:。) _n
后缀使它类型通用;后端实际上实现了特定大小的变体,以字节为单位。 int
在x86是AFAIK总是32位长,所以这意味着我们正在寻找 __ atomic_store_4
的定义。 说, __ atomic_store
操作对应于名为 atomic_store 模式
的机器描述模式;对应于4字节整数的模式是SI(,特别是这一位:
std::atomic<int>::store
is mapped to the compiler intrinsic __atomic_store_n
. (This and other atomic-operation intrinsics are documented here: Built-in functions for memory model aware atomic operations.) The _n
suffix makes it type-generic; the back-end actually implements variants for specific sizes in bytes. int
on x86 is AFAIK always 32 bits long, so that means we're looking for the definition of __atomic_store_4
. The internals manual for this version of GCC says that the __atomic_store
operations correspond to machine description patterns named atomic_storemode
; the mode corresponding to a 4-byte integer is "SI" (that's documented here), so we are looking for something called "atomic_storesi
" in the x86 machine description. And that brings us to config/i386/sync.md, specifically this bit:
(define_expand "atomic_store<mode>"
[(set (match_operand:ATOMIC 0 "memory_operand")
(unspec:ATOMIC [(match_operand:ATOMIC 1 "register_operand")
(match_operand:SI 2 "const_int_operand")]
UNSPEC_MOVA))]
""
{
enum memmodel model = (enum memmodel) (INTVAL (operands[2]) & MEMMODEL_MASK);
if (<MODE>mode == DImode && !TARGET_64BIT)
{
/* For DImode on 32-bit, we can use the FPU to perform the store. */
/* Note that while we could perform a cmpxchg8b loop, that turns
out to be significantly larger than this plus a barrier. */
emit_insn (gen_atomic_storedi_fpu
(operands[0], operands[1],
assign_386_stack_local (DImode, SLOT_TEMP)));
}
else
{
/* For seq-cst stores, when we lack MFENCE, use XCHG. */
if (model == MEMMODEL_SEQ_CST && !(TARGET_64BIT || TARGET_SSE2))
{
emit_insn (gen_atomic_exchange<mode> (gen_reg_rtx (<MODE>mode),
operands[0], operands[1],
operands[2]));
DONE;
}
/* Otherwise use a store. */
emit_insn (gen_atomic_store<mode>_1 (operands[0], operands[1],
operands[2]));
}
/* ... followed by an MFENCE, if required. */
if (model == MEMMODEL_SEQ_CST)
emit_insn (gen_mem_thread_fence (operands[2]));
DONE;
})
没有大量的细节, C函数体,将调用该函数生成低级中间表示的原子存储操作。当您的示例代码调用时,< MODE> mode!= DImode
, model == MEMMODEL_SEQ_CST
code> TARGET_SSE2 为真,因此它将调用 gen_atomic_store< mode> _1
,然后 gen_mem_thread_fence
。后一个函数总是生成 mfence
。 (在这个文件中有代码产生 sfence
,但我相信它只用于显式编码 _mm_sfence
从< xmmintrin.h>
)。)
Without going into a great deal of detail, the bulk of this is a C function body that will be called to generate the low-level "RTL" intermediate representation of the atomic store operation. When it's invoked by your example code, <MODE>mode != DImode
, model == MEMMODEL_SEQ_CST
, and TARGET_SSE2
is true, so it will call gen_atomic_store<mode>_1
and then gen_mem_thread_fence
. The latter function always generates mfence
. (There is code in this file to produce sfence
, but I believe it is only used for explicitly-coded _mm_sfence
(from <xmmintrin.h>
).)
评论表明在这种情况下需要MFENCE。我总结说, 您错误地认为不需要加载栅栏,或这是GCC中漏掉的优化bug。这是不是,例如,如何使用编译器的错误。
The comments suggest that someone thought MFENCE was required in this case. I conclude that either you are mistaken to think a load fence is not required, or this is a missed optimization bug in GCC. It is not, for instance, an error in how you are using the compiler.
这篇关于为什么GCC不使用LOAD(无栅栏)和STORE + SFENCE用于顺序一致性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!