高速缓存一致性操作期间处理器是否停顿

本文介绍了高速缓存一致性操作期间处理器是否停顿的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

让我们假设变量a = 0

Let's assume that variable a = 0

Processor1: a = 1
Processor2: print(a)

Processor1首先执行它的指令，然后在下一个周期中，processor2读取变量以将其打印出来.也是如此:

Processor1 executes it's instruction first then in next cycle processor2 reads variable to print it. So is:

processor2将停顿，直到缓存一致性操作完成，它将打印1

processor2 gonna stall until cache coherence operation completes and it will print 1

P1:   |--a=1--|---cache--coherence---|----------------
P2:   ------|stalls due to coherence-|--print(a=1)---|
time: ----------------------------------------------->

processor2将在高速缓存一致性操作完成之前运行，并且在此之前将具有陈旧的内存视图.这样它将打印0?

processor2 will operate before cache coherence operation completes and it will have stale memory view until then. So it will print 0?

P1:   |--a=1--|---cache--coherence---|
P2:   ----------|---print(a=0)---|----
time: ------------------------------->

换句话说，在缓存一致性操作完成之前，处理器能否拥有陈旧的内存视图?

In other words can processor have stale memory view until cache coherence operations are completed?

推荐答案

所有现代ISA都使用 MESI 用于缓存一致性.这样可以在所有处理器都拥有的共享内存视图(通过缓存)的所有时间保持一致性.

All modern ISAs use (a variant of) MESI for cache coherency. This maintains coherency at all times of the shared view of memory (through cache) that all processors have.

例如，请参见我可以强制在一个多核x86 CPU?这是一个常见的误解，即存储进入缓存，而其他内核仍具有缓存行的旧副本，然后必须发生缓存一致性".

See for example Can I force cache coherency on a multicore x86 CPU? It's a common misconception that stores go into cache while other cores still have old copies of the cache line, and then "cache coherence" has to happen.

但是事实并非如此:要修改高速缓存行，CPU需要对该行具有排他所有权(MESI的已修改"或排他"状态).只有在收到对读取所有权"的响应(如果该响应之前处于共享或无效状态)后，该响应会使缓存行的所有其他副本无效，才有可能.参见 Will例如，其他线程是否总是以相同的顺序看到对不同线程中不同位置的两次原子写操作?

But that's not the case: to modify a cache line, a CPU needs to have exclusive ownership of the line (Modified or Exclusive state of MESI). This is only possible after receiving responses to a Read For Ownership that invalidates all other copies of the cache line, if it was in Shared or Invalid state before. See Will two atomic writes to different locations in different threads always be seen in the same order by other threads? for example.

但是，内存模型允许对存储和加载进行本地重新排序.顺序一致性太慢，因此CPU总是至少允许StoreLoad重新排序.另请参见 mov + mfence在NUMA上安全吗?x86上使用的TSO(总存储订单)内存模型.许多其他的ISA使用甚至更弱的模型.

However, memory models allow local reordering of stores and loads. Sequential consistency would be too slow, so CPUs always allow at least StoreLoad reordering. See also Is mov + mfence safe on NUMA? for lots of details about the TSO (total store order) memory model used on x86. Many other ISAs use an even weaker model.

对于这种情况下的非同步读取器，如果两者都运行在单独的内核上，则存在三种可能性

For an unsynchronized reader in this case, there are three possibilities if both are running on separate cores

load(a)在缓存行无效之前在core#2上发生，因此它读取旧值，从而有效地发生在 a = 1 存储在全球秩序.负载可以达到L1d缓存.
load(a)在core#1将存储提交到其L1d高速缓存后发生，并且尚未写回.Core#2的读取请求触发Core#2回写以共享共享级别的缓存(例如L3)，并将该行置于Shared状态.肯定会在L1d中丢失负载.
load(a)在写回内存或至少已经发生L3之后发生，因此它不必等待core#1写回.该负载将在L1d中丢失，除非硬件预取由于某种原因将其重新带回.但是通常，这仅是顺序访问(例如，对数组的访问)的一部分.

load(a) happens on core#2 before the cache line is invalidated, so it reads the old value and thus effectively happens before the a=1 store in the global order. The load can hit in L1d cache.
load(a) happens after core#1 has committed the store to its L1d cache, and hasn't written back yet. Core#2's read request triggers Core#2 to write-back to shared a shared level of cache (e.g. L3), and puts the line into Shared state. The load will definitely miss in L1d.
load(a) happens after write-back to memory or at least L3 has already happened, so it doesn't have to wait for core#1 to write-back. The load will miss in L1d unless hardware prefetch has brought it back in for some reason. But usually that only happens as part of sequential accesses (e.g. to an array).

因此，是的，如果另一个内核在尝试加载该内核之前已经将其提交给缓存，则该加载将停止.

另请参见大小硬件上的存储缓冲区的数量?存储缓冲区到底是什么?详细了解存储缓冲区对所有事物的影响，包括内存重新排序.

See also Size of store buffers on Intel hardware? What exactly is a store buffer? for more about the effect of the store buffer on everything, including memory reordering.

这并不重要，因为您有一个只写的生产者和一个只读的使用者.生产者核心无需等待其商店在全球范围内可见就可以继续操作，它可以在全局可见之前立即看到自己的商店.当您让每个线程查看由另一个线程完成的存储时，这很重要.那么您需要障碍或顺序一致的原子操作(编译器使用障碍来实现).请参见 https://preshing.com/20120515/memory-重新订购实际操作

It doesn't matter here because you havea write-only producer and a read-only consumer. The producer core doesn't wait for its store to become globally visible before continuing, and it can see its own store right away, before it becomes globally visible. It does matter when you have each thread looking at stores done by the other thread; then you need barriers, or sequentially-consistent atomic operations (which compilers implement with barriers). See https://preshing.com/20120515/memory-reordering-caught-in-the-act

另请参阅 num ++是否对于'int num'是原子的?与MESI合作，对理解该概念很有帮助.(例如，原子RMW可以通过将内核挂接到处于修改"状态的高速缓存行上，然后延迟对RFO的响应或请求共享它直到RMW的写入部分已提交来起作用.)

See alsoCan num++ be atomic for 'int num'? for how atomic RMW works with MESI, that's instructive to understanding the concept. (e.g. that an atomic RMW can work by having a core hang on to a cache line in Modified state, and delay responding to RFO or requests to share it until the write part of the RMW has committed.)

这篇关于高速缓存一致性操作期间处理器是否停顿的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

L3