本文介绍了MESI在Intel 64和IA-32上的意义是什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


  • MESI的目的是保留共享内存系统的概念。

  • 但是,有了存储缓冲区,事情变得很复杂:

  • 一旦数据命中了实现MESI的缓存,内存下游便是一致的。

  • 但是,在此上游,每个内核可能对内存位置X中的内容不一致。

  • 这样,从每个内核的角度来看,内存的状态似乎是不同的-它不是一致的。

  • 那么,为什么我们要部分地与MESI加强一致性?

  • The point of MESI is to retain a notion of a shared memory system.
  • However, with store buffers, things are complicated:
  • Memory is coherent downstream of once the data hits the MESI-implementing caches.
  • However, upstream of that, each core may disagree on what is in memory location X, dependent on what is in each core's local store buffer.
  • As such, it seems like, from the viewpoint of each core, that the state of memory is different - it is not coherent.
  • So, why do we bother "partially" enforcing coherency with MESI?

编辑:在进一步缩小真正使我感到困惑的地方之后,进行了实质性的编辑。我试图使问题的一般概念保持不变,以保持收到的好答案的相关性。

A substantial edit was made, after some further narrowing of what was really confusing me. I have tried to keep the general notion of the question the same, to preserve the relevance of the great answers received.

推荐答案

x86上的MESI点几乎与任何多核/ CPU系统上的相同:增强缓存一致性。在x86上,方程式的缓存一致性部分没有使用部分一致性:缓存完全一致。因此,可能的重新排序是由于一致性缓存系统以及与核心本地组件(例如加载/存储子系统(尤其是存储缓冲区)和其他乱序机制)的交互作用的结果。

The point of MESI on x86 is the same as on pretty much any multiple core/CPU system: to enforce cache consistency. There is no "partial coherency" used for the cache coherency part of the equation on x86: the caches are fully coherent. The possible re-orderings, then, are a result of both the coherent caching system and the interaction with core-local components such as the load/store subsystem (especially store buffers) and other out-of-order machinery.

这种交互的结果是x86提供的架构强大的内存模型,仅进行了有限的重新排序。没有连贯的缓存,您将根本无法合理地实现此模型,也几乎无法实现除完全弱的以外的任何模型。

The result of that interaction is the architected strong memory model that x86 provides, with only limited re-ordering. Without coherent caches, you couldn't reasonably implement this model at all, or almost any model that was anything other than completely weak.

您的问题似乎嵌入了这样的假设,即只有可能的状态是连贯的和其他所有状态。此外,还有缓存一致性(主要专门处理缓存,并且大多是隐藏的细节)和内存一致性模型的思想混合在一起。在架构上定义,并将由每个架构实施。 Wikipedia 缓存一致性和内存一致性之间的一个区别是,前者的规则仅适用一次到一个位置,而一致性规则适用于所有位置。实际上,更重要的区别是内存一致性模型是唯一在架构上记录的模型。

Your question seems to embed the assumption that there are only possible states "coherent" and "everything every else". Also, there is some mixing of the ideas of cache coherency (which mostly deals with the caches specifically, and is mostly a hidden detail), and the memory consistency model which is architecturally defined and will be implemented by each architecture. Wikipedia explains that one difference between cache coherency and memory consistency is that the rules for the former applies only to one location at a time, whereas consistency rules apply across locations. In practice, the more important distinction is that the memory consistency model is the only architecturally documented one.

简而言之,英特尔(和AMD同样)定义特定的内存一致性模型,-就内存模型而言,它相对较强,但仍比。与顺序一致性相比,主要行为受到削弱:

Briefly, Intel (and AMD likewise) define a specific memory consistency model, x86-TSO - which is relatively strong as far as memory models go, but is still weaker than sequential consistency. The primary behaviors weakened compared to sequential consistency are:


  • 以后的加载可以通过早期的存储。

  • 可以从与总商店顺序不同的顺序看到商店,但只能由执行其中一个商店的核心看到。

为了实现此内存模型,必须通过规则的各个部分来实现它。在所有最新的x86上,这意味着有序的加载和存储缓冲区,避免了不允许的重新排序。使用存储缓冲区会导致上面提到的两个重新排序:如果不允许这些重新排序,则实现将非常受限制,并且可能会慢得多。在实践中,这也意味着完全一致的数据缓存,因为如果没有这些保证,许多保证(例如,没有负载-负载重新排序)将很难实现。

To order to implement this memory model, various parts must play by the rules to achieve it. On all recent x86, this means ordered load and store buffers, which avoid the disallowed re-orderings. The use of a store buffer results in the two re-orderings mentioned above: without allowing those, the implementation would be very restricted and probably much slower. In practice it also means fully coherent data caches, since many of the guarantees (e.g., no load-load reordering) would be very difficult to implement without that.

一切都结束了:


  • 内存一致性与缓存一致性不同:前者是文档中的内容,并且是编程模型的一部分。
  • Memory consistency is different than cache coherency: the former is what is documented and forms part of the programming model.
  • In practice, x86 implementations have fully coherent caches, which helps them implement their x86-TSO memory model, which is fairly strong but weaker than sequential consistency.
  • Finally, perhaps the answer you were looking for, in different words: a memory model weaker than sequential consistency is still very useful since you can program against it, and in the case you need sequential consistency for some particular operations(s) you insert the right memory barriers.
  • If you program against a language supplied memory model, such as Java's or C++11's you don't need to worry about the hardware specifics, but rather than language memory model, and the compiler inserts the barriers required to match the language memory model semantics to the hardware one. The stronger the hardware model, the fewer the barriers required.

如果您的内存模型是完全薄弱,也就是说,实际上并没有对跨核重新排序施加任何限制,我想您可以以便宜的方式将其直接在非缓存一致性系统上直接实现,以实现正常操作,但是这样的话,内存屏障可能会变得非常昂贵

If your memory model was completely weak, i.e., not really placing any restrictions on cross-core reordering, I suppose you could implement it directly on a non-cache coherent system in a cheap way for normal operations, but then memory barriers potentially become very expensive since they would need to flush a potentially large part of the local private cache.

各种芯片可能在内部实现的方式不同,尤其是它们可能需要刷新本地私有缓存的很大一部分。有些芯片可能实现比模型更强的语义(即,永远不会观察到某些允许的重新排序),但是缺少错误的代码都不会实现较弱的错误。

Various chips may implement in differently internally, and in particular some chips may implement stronger semantics than the model (i.e., some allowed re-orderings can never be observed), but absent bugs none will implement a weaker one.

这是该文件中给它提供的名称,我之所以使用它是因为Intel自己没有给它起一个名字,并且该文件是比一个Intel更正式的定义。给出了一系列石蕊测试的非正式模型。

This is the name given to it in that paper, which I used because Intel themselves doesn't give it a name, and the paper is a more formal definition than the one Intel gives a less formal model as a series of litmus tests.

它p在x86上,通常使用锁定指令(使用 lock 前缀)而不是单独的障碍,尽管也存在独立的障碍。在这里,我将只使用术语 barries 来指代独立的障碍和锁定指令中嵌入的障碍语义。

It practice on x86 you usually use locked instructions (using the lock prefix) rather than separate barriers, although standalone barriers exist also. Here's I'll just use the term barries to refer to both standalone barriers and the barrier semantics embedded into locked instructions.

这篇关于MESI在Intel 64和IA-32上的意义是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-17 15:11