Posts tagged with: yarn

Queues and capacity scheduler

Capacity scheduler allows you to create some queues at different levels, with allocating different ratio of usage.

At the beginning, you have only one queue, which is root.
All of the following is defined in conf/capacity-scheduler.xml (/etc/hadoop/conf.empty/capacity-scheduler.xml in my HDP 2.1.3) or in YARN Configs/Scheduler in Ambari.


Let’s start with two queues : a “production” and a “development” queues, which are all root subqueues

Queues definition


Now, maybe we have 2 teams in the dev department : Enginers and Datascientists.
Let split the dev queue in two sub-queues  :,ds

Queues capacity

We now have to define the percentage capacity of all these queues; Note that the total must be 100 (the root capacity), if not you won’t be able to start that scheduler.


So prod will have 70% of the cluster resources and dev will have 30%.
Not really, infact ! If a job is run in dev and there’s no use of prod, then dev will take 100% of the cluster.
This make sense, because we don’t want the cluster to be under-utilized.

As you can imagine, eng will take 60% of dev capacity, and is able to reach 100% of dev if ds is empty.

We may want to limit dev to a maximum extended capacity (default is so 100%) because we never want this queue to use too much resources.
For that purpose, use the max-capacity parameter

 Queues status

Must be set to RUNNING; If set to STOPPED then you won’t be able to submit new jobs to that queue.


Queues ACLs

The most important thing to understand is that ACLs are inherited. That means that you can’t restrain permissions, only extends them !
Most common mistake is ACLs set to * (meaning all users) on the root level : consequently, any user will be able to submit jobs to any queue : default is


ACLs format is a bit tricky :


Then, on each queue, you can set 3 parameters : acl_submit_applications, acl_administer_queue and acl_administer_jobs. dev dev* dev

Any user of dev group can submit jobs but only John an administer queue.

You can see the “real” authorizations in a terminal :

[vagrant@gw ~]$ su ambari-qa
[ambari-qa@gw ~]# mapred queue -showacls
Queue acls for user : ambari-qa
Queue Operations
[root@gw ~]#

Of course, yarn.acl.enable has to be set to true

Another thing is you don’t have to restart YARN for each scheduler modification, except for deleting existing queues; If you’re only adding queues or adjusting some settings, just type a simple

[root@gw ~]# kinit -kt /etc/security/keytabs/yarn.headless.keytab yarn@EXAMPLE.COM
[root@gw ~]# yarn rmadmin -refreshQueues

You can see the queues in two ways :
– in the CLI

[root@gw ~]# mapred queue -list
15/03/05 09:13:11 INFO client.RMProxy: Connecting to ResourceManager at
Queue Name : dev
Queue State : running
Scheduling Info : Capacity: 60.000004, MaximumCapacity: 100.0, CurrentCapacity: 0.0
Queue Name : ds
Queue State : running
Scheduling Info : Capacity: 30.000002, MaximumCapacity: 100.0, CurrentCapacity: 0.0
Queue Name : eng
Queue State : running
Scheduling Info : Capacity: 70.0, MaximumCapacity: 100.0, CurrentCapacity: 0.0
Queue Name : prod
Queue State : running
Scheduling Info : Capacity: 40.0, MaximumCapacity: 100.0, CurrentCapacity: 0.0

– in the UI : go to the ResourceManager UI (Ambari YARN/Quick links), then click on Scheduler :