


I need to disable parallel execution of YARN applications in hadoop cluster. Now, YARN has default settings, so several jobs can run in parallel. I see no advantages of this, because both jobs run slower.


I found this setting yarn.scheduler.capacity.maximum-applications which limits maximum number of applications, but it affects both submitted and running apps (as stated in docs). I'd like to keep submitted apps in queue until current running application is not finished. How can this be done?



1) Change Scheduler to FairScheduler


Hadoop distributions use CapacityScheduler by default (Cloudera uses FairScheduler as default Scheduler). Add this property to yarn-site.xml



2) Set default Queue

Fair Scheduler为每个用户创建一个队列.即,如果三个不同的用户提交作业,则将创建三个单独的队列,并且资源将在三个队列之间共享.通过在yarn-site.xml

Fair Scheduler creates a queue per user. I.E., if three different users submit jobs then three individual queues will be created and the resources will be shared among the three queues. Disable it by adding this property in yarn-site.xml


这可确保所有作业进入单个 default 队列.

This assures that all the jobs go into a single default queue.


现在,作业队列已被限制为一个default队列.将可以在该队列中运行的应用程序的最大数量限制为 1 .

Now that the job queue has been limited to one default queue. Restrict the maximum number of applications to 1 that can be run in that queue.


Create a file named fair-scheduler.xml under the $HADOOP_CONF_DIR and add these entries



Also, add this property in yarn-site.xml




Restart YARN services after adding these properties.


On submitting multiple applications, the application ACCEPTED first will be considered as the Active application and the remaining will be queued as Pending applications. These pending applications will continue to be in ACCEPTED state until the RUNNING application is FINISHED. The Active application will be allowed to utilise all the available resources.

参考: Hadoop:公平调度程序


09-15 21:32