本文介绍了如何根据MapReduce2中的vcore和内存创建容器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个由1个主节点(namenode,secondarynamenode,resourcemanager)和2个从节点(datanode,nodemanager)组成的小群集。

master的site.xml:


  • yarn.scheduler.minimum-allocation-mb :512

  • yarn.scheduler.maximum-allocation-mb :1024

  • code> yarn.scheduler.minimum-allocation-vcores :1

  • yarn.scheduler.maximum-allocation-vcores :2



我在奴隶的yarn-site.xml中设置了:




  • yarn.nodemanager.resource.memory-mb :2048

  • yarn.nodemanager.resource.cpu-vcores :4



然后在master中,我已经在mapred-site.xml中设置了:


  • mapreduce.map.memory.mb :512

  • mapreduce.map.java.opts :-Xmx500m
  • mapreduce.map.cpu.vcores :1

  • mapreduce.reduce.memory.mb :512

  • mapreduce.reduce.java.opts :-Xmx500m

  • mapreduce.reduce.cpu.vcores :1



因此,我的理解是,在运行作业时,mapreduce ApplicationMaster将尝试创建512 Mb和1个vCore在两个从站上,每个从站只有2048 Mb和4个vCore,为每个从站提供4个容器的空间。这正是我的工作上发生的事情,所以目前没有问题。



然而,当我增加 mapreduce.map.cpu.vcores 和 mapreduce.reduce.cpu.vcores 从1到2,理论上应该只有足够的vCores可用于为每个从属创建2个容器。但是,不,我仍然有每个奴隶4个容器。

然后我试图增加 mapreduce.map.memory.mb 和 mapreduce.reduce.memory.mb 从512到768.这留下2个容器的空间(2048/768 = 2)。

vPore对于mappers和reducer是否设置为1或2并不重要,每个奴隶总是会产生2个容器,768mb和4个容器512MB。那么vCores是什么? ApplicationMaster似乎并不在意。

另外,将内存设置为768和vCores为2时,我在nodemanager UI上显示此信息一个映射器容器:



768 Mb已变成1024 TotalMemoryNeeded,并且2个vCore将被忽略并显示为1 TotalVCoresNeeded。

因此,要将它是如何工作的问题分解为多个问题:


  1. 是否仅使用了内存(并忽略了vCore)来计算容器的数量?
  2. mapreduce.map.memory.mb 值只是计算容器数量的一个完全抽象的值(这就是为什么它可以四舍五入到2的下一个幂)?或者它代表真正的内存分配?

  3. 为什么我们在 mapreduce.map.java.opts ?为什么不使用 mapreduce.map.memory.mb 中的值来为容器分配内存?

  4. 什么是TotalVCoresNeeded为什么它总是等于1?我尝试在所有节点(主节点和从属节点)中更改 mapreduce.map.cpu.vcores ,但它永远不会更改。

$我将回答这个问题,假定调度程序使用 CapacityScheduler

CapacityScheduler 使用 ResourceCalculator 来计算应用程序所需的资源。有两种类型的资源计算器:


  1. DefaultResourceCalculator :考虑到只有内存用于资源计算(即用于计算容器数)

  2. DominantResourceCalculator :考虑到内存和CPU资源计算

默认情况下,CapacityScheduler使用 DefaultResourceCalculator 。如果您想使用 DominantResourceCalculator ,那么您需要在 capacity-scheduler.xml 文件中设置以下属性:

 <属性> 
< name> yarn.scheduler.capacity.resource-calculator< / name>
< value> org.apache.hadoop.yarn.util.resource.DominantResourceCalculator< / value>
< / property>

现在,回答您的问题:


  1. 如果使用 DominantResourceCalculator ,那么将计算容器数量时考虑内存和VCore


  2. mapreduce.map.memory.mb 不是一个抽象值。在计算资源时将其考虑在内。 DominantResourceCalculator 类具有normalize()函数,该函数使资源请求标准化,使用minimumResouce(由config yarn.scheduler .minimum-allocation-mb ),maximumresource(由config yarn.scheduler.maximum-allocation-mb 决定)和一个步骤因子(由config yarn.scheduler.minimum -allocation-MB )。



    正常化内存的代码如下所示(Check org.apache.hadoop.yarn.util.resource.DominantResourceCalculator.java):

      int normalizedMemory = Math.min(roundUp(
    Math.max(r.getMemory(),minimumResource.getMemory()),
    stepFactor.getMemory()),maximumResource.getMemory());


其中:

r =请求的内存



逻辑的工作原理如下:

a。取最大(请求资源和最小资源)= max(768,512)= 768

b。 Roundup(768,StepFactor)= roundUp(768,512)== 1279(大约)

  (512 -1))/ 512)* 512 

c。 min(roundup(512,stepFactor),maximumresource)= min(1279,1024)= 1024

因此最后,分配的内存是1024 MB,这就是你

为简单起见,你可以说这个综合报告以512 MB(这是一个最小资源)的步骤递增需求量。
$ b


  1. 因为Mapper是一个java进程,所以 mapreduce.map.java.opts 用于指定堆大小映射器。

其中 mapreduce.map.memory.mb 是容器使用的总内存。



mapreduce.map.java.opts 的值应小于 mapreduce.map.memory.mb



这里的答案解释说: Apache Hadoop YARN中'mapreduce.map.memory.mb'和'mapred.map.child.java.opts'之间的关系是什么? 使用 DominantResourceCalculator 时,它使用normalize()函数以计算所需的vCores。



代码如下(类似于内存的标准化):

  int normalizedCores = Math.min(roundUp 
`Math.max(r.getVirtualCores(),minimumResource.getVirtualCores()),
stepFactor.getVirtualCores()),maximumResource.getVirtualCores ());



I have a tiny cluster composed of 1 master (namenode, secondarynamenode, resourcemanager) and 2 slaves (datanode, nodemanager).

I have set in the yarn-site.xml of the master :

  • yarn.scheduler.minimum-allocation-mb : 512
  • yarn.scheduler.maximum-allocation-mb : 1024
  • yarn.scheduler.minimum-allocation-vcores : 1
  • yarn.scheduler.maximum-allocation-vcores : 2

I have set in the yarn-site.xml of the slaves :

  • yarn.nodemanager.resource.memory-mb : 2048
  • yarn.nodemanager.resource.cpu-vcores : 4

Then in the master, I have set in mapred-site.xml :

  • mapreduce.map.memory.mb : 512
  • mapreduce.map.java.opts : -Xmx500m
  • mapreduce.map.cpu.vcores : 1
  • mapreduce.reduce.memory.mb : 512
  • mapreduce.reduce.java.opts : -Xmx500m
  • mapreduce.reduce.cpu.vcores : 1

So it is my understanding that when running a job, the mapreduce ApplicationMaster will try to create as many containers of 512 Mb and 1 vCore on both slaves, which have only 2048 Mb and 4 vCores available each, which gives space for 4 containers on each slave. This is precisely what is happening on my jobs, so no problem so far.

However, when i increment the mapreduce.map.cpu.vcores and mapreduce.reduce.cpu.vcores from 1 to 2, there should theoretically be only enough vCores available for creating 2 containers per slave right ? But no, I still have 4 containers per slave.

I then tried to increase the mapreduce.map.memory.mb and mapreduce.reduce.memory.mb from 512 to 768. This leaves space for 2 containers (2048/768=2).

It doesn't matter if the vCores are set to 1 or 2 for mappers and reducers, this will always produce 2 containers per slave with 768mb and 4 containers with 512mb. So what are vCores for ? The ApplicationMaster doesn't seem to care.

Also, when setting the memory to 768 and vCores to 2, I have this info displayed on nodemanager UI for a mapper container :

The 768 Mb has turned into 1024 TotalMemoryNeeded, and the 2 vCores are ignored and displayed as 1 TotalVCoresNeeded.

So to break down the "how does it work" question into multiple questions :

  1. Is only memory used (and vCores ignored) to calculate the number of containers ?
  2. Is the mapreduce.map.memory.mb value only a completely abstract value for calculating the number of containers (and that's why it can be rounded up to the next power of 2) ? Or does it represent real memory allocation in some way ?
  3. Why do we specify some -Xmx value in mapreduce.map.java.opts ? Why doesn't yarn use the value from mapreduce.map.memory.mb to allocate memory to the container ?
  4. What is TotalVCoresNeeded and why is it always equal to 1 ? I tried to change mapreduce.map.cpu.vcores in all nodes (master and slaves) but it never changes.
解决方案

I will answer this question, on the assumption that the scheduler used is, CapacityScheduler.

CapacityScheduler uses ResourceCalculator for calculating the resources needed for an application. There are 2 types of resource calculators:

  1. DefaultResourceCalculator: Takes into account, only memory for doing the resource calculations (i.e. for calculating number of containers)
  2. DominantResourceCalculator: Takes into account, both memory and CPU for resource calculations

By default, the CapacityScheduler uses DefaultResourceCalculator. If you want to use the DominantResourceCalculator, then you need to set following property in "capacity-scheduler.xml" file:

  <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
  </property>

Now, to answer your questions:

  1. If DominantResourceCalculator is used, then both memory and VCores are taken into account for calculating the number of containers

  2. mapreduce.map.memory.mb is not an abstract value. It is taken into consideration while calculating the resources.

    The DominantResourceCalculator class has a normalize() function, which normalizes the resource request, using minimumResouce (determined by config yarn.scheduler.minimum-allocation-mb), maximumresource (determined by config yarn.scheduler.maximum-allocation-mb) and a step factor (determined by config yarn.scheduler.minimum-allocation-mb).

    The code for normalizing memory looks like below (Check org.apache.hadoop.yarn.util.resource.DominantResourceCalculator.java):

    int normalizedMemory = Math.min(roundUp(
    Math.max(r.getMemory(), minimumResource.getMemory()),
    stepFactor.getMemory()),maximumResource.getMemory());
    

Where:

r = Requested memory

The logic works like below:

a. Take max of(requested resource and minimum resource) = max(768, 512) = 768

b. roundup(768, StepFactor) = roundUp (768, 512) == 1279 (Approximately)

Roundup does : ((768 + (512 -1)) / 512) * 512

c. min(roundup(512, stepFactor), maximumresource) = min(1279, 1024) = 1024

So finally, the allotted memory is 1024 MB, which is what you are getting.

For the sake of simplicity, you can say that roundup, increments the demand in the steps of 512 MB (which is a minimumresource)

  1. Since Mapper is a java process, mapreduce.map.java.opts is used for specifying the heap size for the mapper.

Where as mapreduce.map.memory.mb is total memory used by the container.

Value of mapreduce.map.java.opts should be lesser than mapreduce.map.memory.mb

The answer here explains that: What is the relation between 'mapreduce.map.memory.mb' and 'mapred.map.child.java.opts' in Apache Hadoop YARN?

  1. When you use DominantResourceCalculator, it uses normalize() function to calculate vCores needed.

    The code for that is (similar to normalization of memory):

      int normalizedCores = Math.min(roundUp
    `   Math.max(r.getVirtualCores(), minimumResource.getVirtualCores()),
        stepFactor.getVirtualCores()), maximumResource.getVirtualCores());
    

这篇关于如何根据MapReduce2中的vcore和内存创建容器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 04:12