Posts tagged with: hadoop

containers, memory and sizing

In the Hadoop world, and especially in the YARN sub-world, it can be tricky to understand how to achieve memory settings.

Based on this blogpost, here are the main concepts to understand :

– YARN is based on containers

– containers can be MAP containers or REDUCE containers

– total memory for YARN on each node : yarn.nodemanager.resource.memory-mb=40960

– min memory used by a container : yarn.scheduler.minimum-allocation-mb=2048

the combination of the above will give us the max containers number : total memory / container memory = containers number.

– memory used by a (map|reduce) container : mapreduce.(map|reduce).memory.mb=4096

each (map|reduce) container will spawn JVMs for map|reduce tasks, so we limit the heapsize of the map|reduce JVM :


The last parameter is yarn.nodemanager.vmem-pmem-ratio (default 2.1) which is the ratio between physical memory and virtual memory.

Here we’ll have for a Map task :

  • 4 GB RAM allocated
  • 3 GB JVM heap space upper limit
  • 4*2.1 = 8.2GB Virtual memory upper limit


You’ll find max number of containers based on Map or Reduce containers ! so here we’ll have 40/4 = 10 mappers