Monthly Archives: July 2015

hdfs disk usage for humans

hdfs du is a powerful command, but could be not very handsome…

Here is a trick to have your subdirectories, sorted by size, in human-readable format

[root ~]# hdfs dfs -du -s -h "/*" | awk '{print $1 $2 " " $3}' | sort -h
39.8G /mr-history
216.9G /backup
362.5G /app-logs
20.0T /user
76.0T /tmp
138.6T /apps

bash profile for quick launch a VirtualBox instance

When testing Hadoop on virtual machines, you usually have to launch instances in VirtualBox and open a terminal to ssh in.

Since you only need a terminal and not these VirtualBox windows, you may update your .bash_profile like

$ cat ~/.bash_profile
function hdp22() {
if [[ $1 == "start" ]];
then VBoxManage startvm "Hortonworks Sandbox with HDP 2.2 Preview" --type headless && ssh
elif [[ $1 == "stop" ]];
then VBoxManage controlvm "Hortonworks Sandbox with HDP 2.2 Preview" savestate
else echo "Usage : hdp22 start|stop";


Type source ~/.bash_profile for loading aliases without need to reboot, and you’ll just then have to type hdp22 start to launch the VM and ssh into it.

quick create a sample hbase table

This is a quick and easy way to generate data in a HBase table.

First create your table in HBase shell :

create 't1', 'f1'

Then edit a hbase_load.txt file 

cat hbase_load.txt

for i in '1'..'10' do \
for j in '1'..'10' do \
for k in '1'..'10' do \
rnd=(0...64).map { (65 + rand(26)).chr }.join
put 't1', "#{i}-#{j}-#{k}", "f1:#{j}#{k}", "#{rnd}"
end \
end \

And generate 1000 rows :

cat hbase_load.txt |hbase shell