Browsing posts in: tips & tricks

check file content with ascii codes

If you want to deeply check a file (spaces are spaces, commas, quotes, etc) you can have a look on the ascii codes as well with the hexdump command :

$ hexdump -C /etc/passwd
00000000 72 6f 6f 74 3a 78 3a 30 3a 30 3a 72 6f 6f 74 3a |root:x:0:0:root:|
00000010 2f 72 6f 6f 74 3a 2f 62 69 6e 2f 62 61 73 68 0a |/root:/bin/bash.|
00000020 64 61 65 6d 6f 6e 3a 78 3a 31 3a 31 3a 64 61 65 |daemon:x:1:1:dae|
00000030 6d 6f 6e 3a 2f 75 73 72 2f 73 62 69 6e 3a 2f 62 |mon:/usr/sbin:/b|
00000040 69 6e 2f 73 68 0a 62 69 6e 3a 78 3a 32 3a 32 3a |in/sh.bin:x:2:2:|
00000050 62 69 6e 3a 2f 62 69 6e 3a 2f 62 69 6e 2f 73 68 |bin:/bin:/bin/sh|

test your Hadoop mapping rules

You may hit some impersonation issues because of some wrong auth-to-local rules.

These rules translates your principal to a user short name, and you may want to be sure that – for example – hive/worker01@REALM correctly translates to hive.

to do that :

[root@worker ~]# hadoop \

Name: HTTP/worker01fqdn@REALM to HTTP

HDFS ls and Out of Memory (GC Overhead limit)

If you have an error when doing a ls like

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.StringBuffer.toString(
at org.apache.hadoop.fs.Path.makeQualified(

You might increase the client memory :

HADOOP_CLIENT_OPTS="-Xmx4g" hdfs dfs -ls -R /

set date and time in VirtualBox

If you set date manually in your VirtualBox VM, datetime reset to the host date.

This behaviour is caused by VirtualBox Guest Additions, so you first need to stop that service :

sudo service vboxadd-service stop

You’ll then be able to change the date

date --set="8 Apr 2016 18:00:00"

adjust log4j log level for a class

In all adjustments you can do in log4j (and there’s a lot), you may want to adjust the verbosity level of a particular class.

For example, I wanted to decrease verbosity of CapacityScheduler because I had the ResourceManager log full of

2015-12-15 12:11:06,667 INFO capacity.CapacityScheduler ( - Null container completed...
2015-12-15 12:11:06,667 INFO capacity.CapacityScheduler ( - Null container completed...
2015-12-15 12:11:06,667 INFO capacity.CapacityScheduler ( - Null container completed...
2015-12-15 12:11:06,668 INFO capacity.CapacityScheduler ( - Null container completed...

I first found the CapacityScheduler “full name” length (meaning with the package) referenced under

Restarted ResourceManager, and voilà!


scp keeping rights and permissions with rsync

We all had once to scp something and keeping owner,group,permissions,etc.

There’s no option like that in scp, so you may want to use rsync for copying the localhost content directory on machine1

[root@localhost ~]# rsync -avI /etc/hadoop/ machine1:/etc/hadoop/

Here are the chosen options :

-a = archive mode (equals -rlptgoD)
-v = verbose
-p = preserve permissions
-o = preserve owner
-g = preserve group
-r = recurse into directories
-I = don’t skip files that has already been transferred

remount without noexec attribute

CentOS /var mount point is usually with noexec attribute. This is annoying when executing scripts on that mount point, like Ambari scripts !

SO if you have these “permission denied” on executing a script, simply remount your mount point :

$ sudo mount -o remount,exec /var

Don’t forget to modify accordingly your /etc/fstab file so that modification will be permanent and not loose at each reboot.

hdfs disk usage for humans

hdfs du is a powerful command, but could be not very handsome…

Here is a trick to have your subdirectories, sorted by size, in human-readable format

[root ~]# hdfs dfs -du -s -h "/*" | awk '{print $1 $2 " " $3}' | sort -h
39.8G /mr-history
216.9G /backup
362.5G /app-logs
20.0T /user
76.0T /tmp
138.6T /apps

bash profile for quick launch a VirtualBox instance

When testing Hadoop on virtual machines, you usually have to launch instances in VirtualBox and open a terminal to ssh in.

Since you only need a terminal and not these VirtualBox windows, you may update your .bash_profile like

$ cat ~/.bash_profile
function hdp22() {
if [[ $1 == "start" ]];
then VBoxManage startvm "Hortonworks Sandbox with HDP 2.2 Preview" --type headless && ssh
elif [[ $1 == "stop" ]];
then VBoxManage controlvm "Hortonworks Sandbox with HDP 2.2 Preview" savestate
else echo "Usage : hdp22 start|stop";


Type source ~/.bash_profile for loading aliases without need to reboot, and you’ll just then have to type hdp22 start to launch the VM and ssh into it.