In the Hadoop world, and especially in the YARN sub-world, it can be tricky to understand how to achieve memory settings.
Based on this blogpost, here are the main concepts to understand :
– YARN is based on containers
– containers can be MAP containers or REDUCE containers
– total memory for YARN on each node : yarn.nodemanager.resource.memory-mb=40960
– min memory used by a container : yarn.scheduler.minimum-allocation-mb=2048
the combination of the above will give us the max containers number : total memory / container memory = containers number.
– memory used by a (map|reduce) container : mapreduce.(map|reduce).memory.mb=4096
each (map|reduce) container will spawn JVMs for map|reduce tasks, so we limit the heapsize of the map|reduce JVM :
The last parameter is yarn.nodemanager.vmem-pmem-ratio (default 2.1) which is the ratio between physical memory and virtual memory.
Here we’ll have for a Map task :
- 4 GB RAM allocated
- 3 GB JVM heap space upper limit
- 4*2.1 = 8.2GB Virtual memory upper limit
You’ll find max number of containers based on Map or Reduce containers ! so here we’ll have 40/4 = 10 mappers