request Ambari database

Some of you may want to see what’s in Ambari database.
By default, data is stored in Postgres, and for those which are not familiar with, here are the basics : you shall connect with user ambari to DB ambari, default password is bigdata.

[vagrant@gw ~]$ # find password
[vagrant@gw ~]$ sudo cat /etc/ambari-server/conf/password.dat
bigdata

[vagrant@gw ~]$ # connect to ambari DB
[vagrant@gw ~]$ psql -U ambari ambari
Password for user ambari: bigdata
psql (8.4.20)
Type "help" for help.

ambari=> -- show all tables
ambari=> \d
List of relations
Schema | Name | Type | Owner
--------+-------------------------------+-------+----------
ambari | adminpermission | table | postgres
ambari | adminprincipal | table | postgres
ambari | adminprincipaltype | table | postgres
ambari | adminprivilege | table | postgres
--More--

ambari=> -- describe table hosts
ambari=> \d hosts
Table "ambari.hosts"
Column | Type | Modifiers
------------------------+--------------------------+-----------
host_name | character varying(255) | not null
cpu_count | integer | not null
ph_cpu_count | integer |
cpu_info | character varying(255) | not null
discovery_status | character varying(2000) | not null
host_attributes | character varying(20000) | not null
ipv4 | character varying(255) |
ipv6 | character varying(255) |
public_host_name | character varying(255) |
last_registration_time | bigint | not null
os_arch | character varying(255) | not null
os_info | character varying(1000) | not null
os_type | character varying(255) | not null
rack_info | character varying(255) | not null
total_mem | bigint | not null
Indexes:
"hosts_pkey" PRIMARY KEY, btree (host_name)
Referenced by:
TABLE "clusterhostmapping" CONSTRAINT "clusterhostmapping_cluster_id" FOREIGN KEY (host_name) REFERENCES hosts(host_name)
TABLE "configgrouphostmapping" CONSTRAINT "fk_cghm_hname" FOREIGN KEY (host_name) REFERENCES hosts(host_name)
--More--

ambari=> select host_name, ipv4, public_host_name, total_mem from hosts;
host_name | ipv4 | public_host_name | total_mem
-----------------+-----------+------------------+-----------
nn.example.com | 10.0.2.15 | nn.example.com | 1922680
dn1.example.com | 10.0.2.15 | dn1.example.com | 1922680
gw.example.com | 10.0.2.15 | gw.example.com | 1922680
(3 rows)

ambari=> -- quit
ambari=> \q
[vagrant@gw ~]$

Decommission and recommission in Ambari API

It’s pretty easy to find how to decommission something in Ambari API.

For example, let’s decommission a datanode :

curl -u admin:admin -i -H 'X-Requested-By: ambari' -X POST -d '{
"RequestInfo":{
"context":"Decommission DataNodes dn1,dn2",
"command":"DECOMMISSION",
"parameters":{
"slave_type":"DATANODE",
"excluded_hosts":"dn1.example.com,dn2.example.com"
},
"operation_level":{
"level":"HOST_COMPONENT",
"cluster_name":"MY_CLUSTER"
}
},
"Requests/resource_filters":[
{
"service_name":"HDFS",
"component_name":"NAMENODE"
}
]
}' http://gw.example.com:8080/api/v1/clusters/MY_CLUSTER/requests

 

I then tried to recommission these datanodes using the “RECOMMISSION” command, and it failed.

The thing here is to change excluded_hosts (which is the list of hosts to be decommissioned), for included_hosts.

 

curl -u admin:admin -i -H 'X-Requested-By: ambari' -X POST -d '{
"RequestInfo":{
"context":"Recommission DataNodes dn1,dn2",
"command":"DECOMMISSION",
"parameters":{
"slave_type":"DATANODE",
"included_hosts":"dn1.example.com,dn2.example.com"
},
"operation_level":{
"level":"HOST_COMPONENT",
"cluster_name":"MY_CLUSTER"
}
},
"Requests/resource_filters":[
{
"service_name":"HDFS",
"component_name":"NAMENODE"
}
]
}' http://gw.example.com:8080/api/v1/clusters/MY_CLUSTER/requests

get Hadoop metric with jmx

JMX is widely used in Java world to stream metrics and KPIs.

In Hadoop, the most used JMX is on the NameNode :

http://NAMENODE_FQDN:50070/jmx

This provides an overview of all the things accessible through JMX :

JMX Hadoop metrics JMX Overview

You can have a list of all metrics provided by JMX :

http://sandbox.hortonworks.com:50070/jmx?qry=Hadoop:*

JMX Hadoop metrics

And you can finally (which is very useful if you’re looking at something very specific) get a specific metric when specifying the bean name and the attribute :

http://NAMENODE_FQDN:50070/jmx?get=MXBeanName::AttributeName

For example, if we want to get the CapacityUsed found here :

JMX find metric

Then we’ll call JMX with MXBeanName = Hadoop:service=NameNode,name=FSNamesystemState and AttributeName set to CapacityUsed

So we call http://sandbox.hortonworks.com:50070/jmx?get=Hadoop:service=NameNode,name=FSNamesystemState::CapacityUsed

JMX CapacityUsed


hdfs disk usage for humans

hdfs du is a powerful command, but could be not very handsome…

Here is a trick to have your subdirectories, sorted by size, in human-readable format

[root ~]# hdfs dfs -du -s -h "/*" | awk '{print $1 $2 " " $3}' | sort -h
39.8G /mr-history
216.9G /backup
362.5G /app-logs
20.0T /user
76.0T /tmp
138.6T /apps


bash profile for quick launch a VirtualBox instance

When testing Hadoop on virtual machines, you usually have to launch instances in VirtualBox and open a terminal to ssh in.

Since you only need a terminal and not these VirtualBox windows, you may update your .bash_profile like

$ cat ~/.bash_profile
function hdp22() {
if [[ $1 == "start" ]];
then VBoxManage startvm "Hortonworks Sandbox with HDP 2.2 Preview" --type headless && ssh 127.0.0.1
elif [[ $1 == "stop" ]];
then VBoxManage controlvm "Hortonworks Sandbox with HDP 2.2 Preview" savestate
else echo "Usage : hdp22 start|stop";
fi
}

 

Type source ~/.bash_profile for loading aliases without need to reboot, and you’ll just then have to type hdp22 start to launch the VM and ssh into it.



quick create a sample hbase table

This is a quick and easy way to generate data in a HBase table.

First create your table in HBase shell :

create 't1', 'f1'

Then edit a hbase_load.txt file 

cat hbase_load.txt

for i in '1'..'10' do \
for j in '1'..'10' do \
for k in '1'..'10' do \
rnd=(0...64).map { (65 + rand(26)).chr }.join
put 't1', "#{i}-#{j}-#{k}", "f1:#{j}#{k}", "#{rnd}"
end \
end \
end

And generate 1000 rows :

cat hbase_load.txt |hbase shell

make Hive console verbose

If you have some failing request in your Hive CLI with as much details as FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask then you may want to add some verbosity… Simply launch hive with redirecting logger to console :

hive -hiveconf hive.root.logger=INFO,console