本文介绍了Azure Service Fabric可靠的集合和内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我正在5个D1类(1个内核,3.5GB RAM,50GB SSD)VM上运行Service Fabric群集.并且我正在此集群上运行2个可靠的服务,一个是无状态的,一个是有状态的.假设副本目标为3.

Let's say I'm running a Service Fabric cluster on 5 D1 class (1 core, 3.5GB RAM, 50GB SSD) VMs. and that I'm running 2 reliable services on this cluster, one stateless and one stateful. Let's assume that the replica target is 3.

  1. 如何计算我的可靠收藏品可以容纳多少?

  1. How to calculate how much can my reliable collections hold?

比方说,我添加了一个或多个有状态服务.由于我真的不知道框架如何分布服务,因此我需要采取最保守的方法,并假设一个节点可以在单个节点上运行我的所有有状态服务,并且它们的累积内存需要低于服务器上可用的RAM.单机?

Let's say I add one or more stateful services. Since I don't really know how the framework distributes services do I need to take most conservative approach and assume that a node may run all of my stateful services on a single node and that their cumulative memory needs to be below the RAM available on a single machine?

推荐答案

TLDR-估计群集的预期容量既是艺术,也是科学.您可能会获得一个较好的下限,您可以将其下推至更高,但在大多数情况下,在工作负载条件下部署,运行它们并收集数据是回答此问题的最佳方法.

TLDR - Estimating the expected capacity of a cluster is part art, part science. You can likely get a good lower bound which you may be able to push higher, but for the most part deploying things, running them, and collecting data under your workload's conditions is the best way to answer this question.

1)通常,给定计算机上的集合受节点上可用内存量或节点上可用磁盘空间量(以较低者为准)的边界.今天,我们将集合中的所有数据保留在内存中,并将其持久保存到磁盘中.因此,整个群集中的集合可以容纳的最大数量通常为(群集中的可用内存量)/(目标副本集大小).

1) In general, the collections on a given machine are bounded by the amount of available memory or the amount of available disk space on a node, whichever is lower. Today we keep all data in the collections in memory and also persist it to disk. So the maximum amount that your collections across the cluster can hold is generally (Amount of available memory in the cluster) / (Target Replica Set Size).

请注意,可用内存"是计算机(包括操作系统)上运行的其他代码剩下的内容.在上面的示例中,尽管您没有跨所有节点运行-您只能获得其中的3个节点.因此,(不切实际地)假设其他因素造成的开销为0,您可能希望能够在运行有状态的节点上的内存用尽之前,将3.5 GB的数据放入该有状态服务副本中.群集中仍然有2个节点为空.

Note that "Available Memory" is whatever is left over from other code running on the machines, including the OS. In your above example though you're not running across all of the nodes - you'll only be able to get 3 of them. So, (unrealistically) assuming 0 overhead from these other factors, you could expect to be able to put about 3.5 GB of data into that stateful service replica before you ran out of memory on the nodes on which it was running. There would still be 2 nodes in the cluster left empty.

再举一个例子.可以说,它与上面的示例大致相同,只是在这种情况下,您将有状态服务设置为要分区的.假设您选择的分区数为5,因此现在在每个节点上,您都有一个主副本和2个其他分区的辅助副本.在这种情况下,每个分区最多只能容纳大约1.16 GB的状态,但是现在您总体上可以将5.83 GB的状态打包到群集中(因为现在可以充分利用所有节点).顺便说一句,只是为了证明数学原理,即(每个节点3.5 GB的内存*群集中的5个节点)[17.5]/(目标副本集大小为3)= 5.83.

Let's take another example. Let's say that it is about the same as your example above, except in this case you set up the stateful service to be partitioned. Let's say you picked a partition count of 5. So now on each node, you have a primary replica and 2 secondary replicas from other partitions. In this case, each partition would only be able to hold a maximum of around 1.16 GB of state, but now overall you can pack 5.83 GB of state into the cluster (since all nodes can now be utilized fully). Incidentally, just to prove out the math works, that's (3.5 GB of memory per node * 5 nodes in the cluster) [17.5] / (target replica set size of 3) = 5.83.

在所有这些示例中,我们还假定所有分区和所有副本的内存消耗是相同的.在很多情况下,事实证明是不正确的(至少是暂时的)-有些分区最终可能要做或多或少的工作,因此资源消耗不均.我们还假定,次要数总是与主要数相同.在状态量的情况下,可以合理地假设这些状态会相当均匀地跟踪,尽管对于其他资源消耗来说可能并非如此(只是要记住一点).在使用量不均衡的情况下,Service Fabric的群集资源管理的其他部分确实可以为您提供帮助,因为我们可以了解不同副本的使用情况,并将它们高效地打包到群集中以利用可用空间.与馆藏状态相关的资源消耗的自动报告已在我们的雷达上,我们也想做些事情,因此在将来,这将是自动的,但是今天您必须自行报告此消耗.

In all of these examples, we've also assumed that memory consumption for all partitions and all replicas is the same. A lot of the time that turns out to not be true (at least temporarily) - some partitions can end up with more or less work to do and hence have uneven resource consumption. We also assumed that the secondaries were always the same as the primaries. In the case of the amount of state, it's probably fair to assume that these will track fairly evenly, though for other resource consumption it may not (just something to keep in mind). In the case of uneven consumption, this is really where the rest of Service Fabric's Cluster Resource Management will help, since we can come to know about the consumption of different replicas and pack them efficiently into the cluster to make use of the available space. Automatic reporting of consumption of resources related to state in the collections is on our radar and something we want to do, so in the future, this would be automatic but today you'd have to report this consumption on your own.

2)默认情况下,我们将根据默认指标对服务进行平衡(有关指标的更多信息,请参见此处).因此,默认情况下,这两种不同服务的不同副本可能最终出现在计算机上,但是在您的示例中,您将最终得到4个节点,其中一个服务具有1个副本,然后是1个节点,其中两个具有两个副本不同的服务.这意味着每个服务(按照您的示例,每个服务都具有1个分区)将只能消耗每个服务中的1.75 GB内存,而群集中总共需要3.5 GB.这又小于群集的总可用内存,因为您没有使用节点的某些部分.

2) By default, we will balance the services according to the default metrics (more about metrics is here). So by default, the different replicas of those two different services could end up on the machine, but in your example, you'll end up with 4 nodes with 1 replica from a service on it and then 1 node with two replicas from the two different services. This means that each service (each with 1 partition as per your example) would only be able to consume 1.75 GB of memory in each service for a total of 3.5 GB in the cluster. This is again less than the total available memory of the cluster since there are some portions of nodes that you're not utilizing.

请注意,这是最大可能的消耗量,并且假定在服务本身之外没有消耗量.建议不要将其作为最大值.您可能会出于多种原因而减少它,但是最实际的原因是要确保在存在升级和故障的情况下,集群中有足够的可用容量.例如,假设您有5个升级域和5个故障域.现在让我们说,当您在升级域中进行升级时,故障域的节点数量将失败.这意味着群集容量的40%(略少于)可以随时消失,并且您可能希望剩余节点上剩余的空间可以继续使用.这意味着,如果您的群集以前可以保存5.83 GB的状态(根据我们先前的计算),实际上您可能不希望在其中放入超过3.5 GB的状态,因为更多的服务可能无法才能恢复100%的正常运行状态(还请注意,我们不会立即构建替换副本,因此在遇到这种情况之前,必须为您的ReplicaRestartWaitDuration关闭节点). 这篇文章.

Note that this is the maximum possible consumption, and presuming no consumption outside the service itself. Taking this as your maximum is not advisable. You'll want to reduce it for several reasons, but the most practical reason is to ensure that in the presence of upgrades and failures that there's sufficient available capacity in the cluster. As an example, let's say that you have 5 Upgrade Domains and 5 Fault Domains. Now let's say that a fault domain's worth of nodes fails while you have an upgrade going on in an upgrade domain. This means that (a little less than) 40% of your cluster capacity can be gone at any time, and you probably want enough room left over on the remaining nodes to continue. This means that if your cluster previously could hold 5.83 GB of state (from our prior calculations), in reality you probably don't want to put more than about 3.5 GB of state in it since with more of that the service may not be able to get back to 100% healthy (note also that we don't build replacement replicas immediately so the nodes would have to be down for your ReplicaRestartWaitDuration before you ran into this case). There's a bunch more information about metrics, capacity, buffered capacity (which you can use to ensure that room is left on nodes for the failure cases) and fault and upgrade domains are covered in this article.

还有其他一些事情实际上会限制您能够存储的状态数量.您需要做几件事:

There are some other things that practically will limit the amount of state you'll be able to store. You'll want to do several things:

  • 估计数据的大小.您可以通过计算对象拥有的每个字段的大小来预先合理估计数据的大小.确保考虑64位引用.这将为您提供一个较低的起点.
  • 存储开销.您存储在集合中的每个对象都会带有一些用于存储该对象的开销.在取决于集合和当前运行中的操作(副本,枚举,更新等)的可靠集合中,此开销在集合中存储的每个项目(行)的开销可能介于100到700字节之间.还要知道我们一直在寻找减少引入的开销的方法.

我们还强烈建议您在一段时间内运行服务,并通过性能计数器来衡量实际资源消耗.模拟某种实际的工作负载,然后测量您关心的指标的实际使用情况,将为您提供很好的服务.我们特别推荐这样做的原因是,您将能够看到诸如对象最终放置在哪个CLR对象堆中,GC运行的频率,是否存在泄漏或类似的其他会影响数量的东西的消耗量.您可以真正利用的内存.

We also strongly recommend running your service over some period of time and measuring actual resource consumption via performance counters. Simulating some sort of real workload and then measuring the actual usage of the metrics you care about will serve you pretty well. The reason we recommend this in particular is that you will be able to see consumption from things like which CLR object heap your objects end up placed in, how often GC is running, if there's leaks, or other things like this which will impact the amount of memory you can actually utilize.

我知道这是一个很长的答案,但我希望您觉得它有用且完整.

I know that this has been a long answer but I hope you find it helpful and complete.

这篇关于Azure Service Fabric可靠的集合和内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-22 21:52