Browsing posts in: ambari

one-shot backup all config files with Ambari API

Before performing an upgrade for example, or in a more general manner, it’s a good thing to get backups of your config files.

In an upgrade situation, this allows you to perform a quick diff to see if there are parameters which had been reset, or new parameters to take care of.

Ambari API allows you to get all config files, and we’ll use the amazing configs.sh script provided by Hortonworks in HDP to perform backups.

First, config types : when looking at the output of http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER_NAME/, we can find them on Clusters / desired_config.

Ambari API result

So let’s dig into that part : go to http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER_NAME/?fields=Clusters/desired_configs

Ambari API desired_configs

Now let’s get only file names :

[root@sandbox ~]# curl -s -u admin:admin http://sandbox.hortonworks.com:8080/api/v1/clusters/Sandbox/?fields=Clusters/desired_configs | grep '" : {' | grep -v Clusters | grep -v desired_configs | cut -d'"' -f2
ams-env
ams-hbase-env
ams-hbase-log4j
ams-hbase-policy
ams-hbase-security-site
ams-hbase-site
ams-log4j
ams-site
capacity-scheduler
cluster-env
core-site
...

Now for each file we can get a backup with the following command :

[root@sandbox ~]# /var/lib/ambari-server/resources/scripts/configs.sh -u admin -p admin -port 8080 get $AMBARI_HOST $CLUSTER_NAME $CONFIG_TYPE | grep '^"' | grep -v '^"properties" : {'

where CONFIG_TYPE = core-site for example.

So we can now have a complete backup:

#!/bin/bash
AMBARI_HOST=sandbox.hortonworks.com
CLUSTER_NAME=Sandbox
AMBARI_USER=admin
AMBARI_PASSWORD=admin
AMBARI_PORT=8080
timeNow=`date +%Y%m%d_%H%M%S`
RESULT_DIR=/root/migrationHDP/configs.sh/$timeNow
mkdir -p $RESULT_DIR
for CONFIG_TYPE in `curl -s -u $AMBARI_USER:$AMBARI_PASSWORD http://$AMBARI_HOST:$AMBARI_PORT/api/v1/clusters/$CLUSTER_NAME/?fields=Clusters/desired_configs | grep '" : {' | grep -v Clusters | grep -v desired_configs | cut -d'"' -f2`; do
echo "backuping $CONFIG_TYPE"
/var/lib/ambari-server/resources/scripts/configs.sh -u $AMBARI_USER -p $AMBARI_PASSWORD -port $AMBARI_PORT get $AMBARI_HOST $CLUSTER_NAME $CONFIG_TYPE | grep '^"' | grep -v '^"properties" : {' > $RESULT_DIR/$CONFIG_TYPE.conf
done

 

Note than you can also output to a single file to make the diff easier, adding the CONFIG_TYPE to have a better view :

#!/bin/bash
AMBARI_HOST=sandbox.hortonworks.com
CLUSTER_NAME=Sandbox
AMBARI_USER=admin
AMBARI_PASSWORD=admin
AMBARI_PORT=8080
timeNow=`date +%Y%m%d_%H%M%S`
RESULT_DIR=/root/migrationHDP/configs.sh/$timeNow
mkdir -p $RESULT_DIR
for CONFIG_TYPE in `curl -s -u $AMBARI_USER:$AMBARI_PASSWORD http://$AMBARI_HOST:$AMBARI_PORT/api/v1/clusters/$CLUSTER_NAME/?fields=Clusters/desired_configs | grep '" : {' | grep -v Clusters | grep -v desired_configs | cut -d'"' -f2`; do
echo "backuping $CONFIG_TYPE"
/var/lib/ambari-server/resources/scripts/configs.sh -u $AMBARI_USER -p $AMBARI_PASSWORD -port $AMBARI_PORT get $AMBARI_HOST $CLUSTER_NAME $CONFIG_TYPE | grep '^"' | grep -v '^"properties" : {' | sed "1i ##### $config_type #####" >> $RESULT_DIR/all.conf
done

request Ambari database

Some of you may want to see what’s in Ambari database.
By default, data is stored in Postgres, and for those which are not familiar with, here are the basics : you shall connect with user ambari to DB ambari, default password is bigdata.

[vagrant@gw ~]$ # find password
[vagrant@gw ~]$ sudo cat /etc/ambari-server/conf/password.dat
bigdata

[vagrant@gw ~]$ # connect to ambari DB
[vagrant@gw ~]$ psql -U ambari ambari
Password for user ambari: bigdata
psql (8.4.20)
Type "help" for help.

ambari=> -- show all tables
ambari=> \d
List of relations
Schema | Name | Type | Owner
--------+-------------------------------+-------+----------
ambari | adminpermission | table | postgres
ambari | adminprincipal | table | postgres
ambari | adminprincipaltype | table | postgres
ambari | adminprivilege | table | postgres
--More--

ambari=> -- describe table hosts
ambari=> \d hosts
Table "ambari.hosts"
Column | Type | Modifiers
------------------------+--------------------------+-----------
host_name | character varying(255) | not null
cpu_count | integer | not null
ph_cpu_count | integer |
cpu_info | character varying(255) | not null
discovery_status | character varying(2000) | not null
host_attributes | character varying(20000) | not null
ipv4 | character varying(255) |
ipv6 | character varying(255) |
public_host_name | character varying(255) |
last_registration_time | bigint | not null
os_arch | character varying(255) | not null
os_info | character varying(1000) | not null
os_type | character varying(255) | not null
rack_info | character varying(255) | not null
total_mem | bigint | not null
Indexes:
"hosts_pkey" PRIMARY KEY, btree (host_name)
Referenced by:
TABLE "clusterhostmapping" CONSTRAINT "clusterhostmapping_cluster_id" FOREIGN KEY (host_name) REFERENCES hosts(host_name)
TABLE "configgrouphostmapping" CONSTRAINT "fk_cghm_hname" FOREIGN KEY (host_name) REFERENCES hosts(host_name)
--More--

ambari=> select host_name, ipv4, public_host_name, total_mem from hosts;
host_name | ipv4 | public_host_name | total_mem
-----------------+-----------+------------------+-----------
nn.example.com | 10.0.2.15 | nn.example.com | 1922680
dn1.example.com | 10.0.2.15 | dn1.example.com | 1922680
gw.example.com | 10.0.2.15 | gw.example.com | 1922680
(3 rows)

ambari=> -- quit
ambari=> \q
[vagrant@gw ~]$

Decommission and recommission in Ambari API

It’s pretty easy to find how to decommission something in Ambari API.

For example, let’s decommission a datanode :

curl -u admin:admin -i -H 'X-Requested-By: ambari' -X POST -d '{
"RequestInfo":{
"context":"Decommission DataNodes dn1,dn2",
"command":"DECOMMISSION",
"parameters":{
"slave_type":"DATANODE",
"excluded_hosts":"dn1.example.com,dn2.example.com"
},
"operation_level":{
"level":"HOST_COMPONENT",
"cluster_name":"MY_CLUSTER"
}
},
"Requests/resource_filters":[
{
"service_name":"HDFS",
"component_name":"NAMENODE"
}
]
}' http://gw.example.com:8080/api/v1/clusters/MY_CLUSTER/requests

 

I then tried to recommission these datanodes using the “RECOMMISSION” command, and it failed.

The thing here is to change excluded_hosts (which is the list of hosts to be decommissioned), for included_hosts.

 

curl -u admin:admin -i -H 'X-Requested-By: ambari' -X POST -d '{
"RequestInfo":{
"context":"Recommission DataNodes dn1,dn2",
"command":"DECOMMISSION",
"parameters":{
"slave_type":"DATANODE",
"included_hosts":"dn1.example.com,dn2.example.com"
},
"operation_level":{
"level":"HOST_COMPONENT",
"cluster_name":"MY_CLUSTER"
}
},
"Requests/resource_filters":[
{
"service_name":"HDFS",
"component_name":"NAMENODE"
}
]
}' http://gw.example.com:8080/api/v1/clusters/MY_CLUSTER/requests

get metrics with Ambari API

[vagrant@gw ~]$ curl -u admin:admin -X GET http://gw.example.com:8080/api/v1/clusters/hdp-cluster/hosts/nn.example.com/host_components/NAMENODE?fields=metrics/jvm
{
 "href" : "http://gw.example.com:8080/api/v1/clusters/hdp-cluster/hosts/nn.example.com/host_components/NAMENODE?fields=metrics/jvm",
 "HostRoles" : {
 "cluster_name" : "hdp-cluster",
 "component_name" : "NAMENODE",
 "host_name" : "nn.example.com"
 },
 "host" : {
 "href" : "http://gw.example.com:8080/api/v1/clusters/hdp-cluster/hosts/nn.example.com"
 },
 "metrics" : {
 "jvm" : {
 "HeapMemoryMax" : 1052770304,
 "HeapMemoryUsed" : 56104392,
 "NonHeapMemoryMax" : 318767104,
 "NonHeapMemoryUsed" : 49148216,
 "gcCount" : 190,
 "gcTimeMillis" : 4599,
 "logError" : 0,
 "logFatal" : 0,
 "logInfo" : 16574,
 "logWarn" : 2657,
 "memHeapCommittedM" : 1004.0,
 "memHeapUsedM" : 53.473206,
 "memMaxM" : 1004.0,
 "memNonHeapCommittedM" : 133.625,
 "memNonHeapUsedM" : 46.87139,
 "threadsBlocked" : 0,
 "threadsNew" : 0,
 "threadsRunnable" : 7,
 "threadsTerminated" : 0,
 "threadsTimedWaiting" : 54,
 "threadsWaiting" : 7
 }
 }
}

The metrics you may want to watch are HeapMemoryMax and HeapMemoryUsed


Host is in invalid state

I did have some “Host is in invalid state” message from Ambari; In that situation you cannot restart or do anything.

The last time it occured was on ZOOKEEPER_CLIENT, so here is the way of putting the component back to its original state :

to have the ZOOKEEPER_CLIENT status :

curl -u admin:admin -i -H 'X-Requested-By: ambari' -X GET http://$AMBARI_HOST:8080/api/v1/clusters/$MYCLUSTER/hosts/$ZOOKEEPER_CLIENT_HOST/host_components/ZOOKEEPER_CLIENT

if it’s not in INSTALLED state :

curl -u admin:admin -i -H 'X-Requested-By: ambari' -X PUT -d '{"HostRoles": {"state": "INSTALLED"}}' http://$AMBARI_HOST:8080/api/v1/clusters/$MYCLUSTER/hosts/$ZOOKEEPER_CLIENT_HOST/host_components/ZOOKEEPER_CLIENT

 


Ambari tips & tricks

Restarting some components

(including clients, which you can’t put to another state than “INSTALLED”) :

curl -uadmin:admin -H 'X-Requested-By: ambari' -X POST -d '
{
"RequestInfo":{
"command":"RESTART",
"context":"Restart ZooKeeper Client and HDFS Client",
"operation_level":{
"level":"HOST",
"cluster_name":"hdp-cluster"
}
},
"Requests/resource_filters":[
{
"service_name":"ZOOKEEPER",
"component_name":"ZOOKEEPER_CLIENT",
"hosts":"gw.example.com"
},
{
"service_name":"ZOOKEEPER",
"component_name":"ZOOKEEPER_SERVER",
"hosts":"gw.example.com,nn.example.com,dn1.example.com"
}
]
}' http://gw.example.com:8080/api/v1/clusters/hdp-cluster/requests

As indicated in the wiki, the RESTART refreshs the configs.

 

Delete a host from Ambari

// get all COMPONENTS for the host

[root@uabdfes03 ~]# curl -u admin:admin -H "X-Requested-By:ambari" -i -X GET http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/hosts/$HOSTNAME/host_components

// delete all COMPONENTS for this HOST
[root@host ~]# for COMPONENT in ZOOKEEPER_CLIENT YARN_CLIENT PIG OOZIE_CLIENT NODEMANAGER MAPREDUCE2_CLIENT HIVE_CLIENT HDFS_CLIENT HCAT HBASE_REGIONSERVER HBASE_CLIENT GANGLIA_MONITOR DATANODE; do curl -u admin:admin -H "X-Requested-By:ambari" -i -X DELETE http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/hosts/$HOSTNAME/host_components/$COMPONENT; done
// delete HOST
[root@host ~]# curl -u admin:admin -H "X-Requested-By:ambari" -i -X DELETE http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/hosts/$HOSTNAME

Delete a service

(for example STORM)

// get the components for that service
[vagrant@gw ~]$ curl -u admin:admin -X GET  http://gw.example.com:8080/api/v1/clusters/hdp-cluster/services/STORM
// stop the service
[vagrant@gw ~]$ curl -u admin:admin -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo":{"context":"Stop Service"},"Body":{"ServiceInfo":{"state":"INSTALLED"}}}' http://gw.example.com:8080/api/v1/clusters/hdp-cluster/services/STORM
//stop each component on each host
[vagrant@gw ~]$ for COMPONENT_NAME in DRPC_SERVER NIMBUS STORM_REST_API STORM_UI_SERVER SUPERVISOR; do curl -u admin:admin -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo":{"context":"Stop Component"},"Body":{"HostRoles":{"state":"INSTALLED"}}}' http://gw.example.com:8080/api/v1/clusters/hdp-cluster/hosts/gw.example.com/host_components/${COMPONENT_NAME}; done
// stop service components
[vagrant@gw ~]$ for COMPONENT_NAME in DRPC_SERVER NIMBUS STORM_REST_API STORM_UI_SERVER SUPERVISOR; do curl -u admin:admin -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo":{"context":"Stop All Components"},"Body":{"ServiceComponentInfo":{"state":"INSTALLED"}}}' http://gw.example.com:8080/api/v1/clusters/hdp-cluster/services/STORM/components/${COMPONENT_NAME}; done
// delete the service
[vagrant@gw ~]$ curl -u admin:admin -H 'X-Requested-By: ambari' -X DELETE http://gw.example.com:8080/api/v1/clusters/hdp-cluster/services/STORM

 Add a component

For example we want to add a HBase RegionServer

// add the component
[vagrant@gw ~]$ curl -u admin:admin -H "X-Requested-By:ambari" -i -X POST http://gw.example.com:8080/api/v1/clusters/hdp-cluster/hosts/gw.example.com/host_components/HBASE_REGIONSERVER

// then install
[vagrant@gw ~]$ curl -u admin:admin -H "X-Requested-By:ambari" -i -X PUT -d '{"RequestInfo": {"context": "Install RegionServer","query":"HostRoles/component_name.in('HBASE_REGIONSERVER')"}, "Body":{"HostRoles": {"state": "INSTALLED"}}}' http://gw.example.com:8080/api/v1/clusters/hdp-cluster/hosts/gw.example.com/host_components

 Get host components for a service

[vagrant@gw ~]$ curl -u admin:admin -H "X-Requested-By:ambari" -i -X GET http://gw.example.com:8080/api/v1/clusters/hdp-cluster/hosts?host_components/HostRoles/service_name=HDFS&fields=host_components/HostRoles/service_name

 


custom Nagios alert in Ambari

The exercise here is to make a very simple Nagios plugin to be integrated in Ambari webUI.

We’ll check if the cluster is in safe mode or not, and put that alert into Ambari.

First let’s make the plugin, in the same directory you’ll find all scripts used by Ambari which you can duplicate and adapt.

[vagrant@gw ~]$ sudo vi /usr/lib64/nagios/plugins/check_safemode.sh
#!/bin/bash
ret=$(hadoop dfsadmin -safemode get)
if [[ $ret == *OFF ]]; then
echo "OK: $ret"
exit 0
fi
echo "KO : $ret"
exit 1

Notice that you have to echo something before every exit in the plugin, else Nagios will give you an alert.

Now define the command to execute the plugin :

[vagrant@gw ~]$ sudo vi /etc/nagios/objects/hadoop-commands.cfg

...
define command{
command_name check_safemode
command_line $USER1$/check_wrapper.sh $USER1$/check_safemode.sh -H $HOSTADDRESS$
}

Get the hostgroup name (/etc/nagios/objects/hadoop-hostgroups.cfg) in which the plugin will be executed, for example nagios-server (only one server since it’s a HDFS check !)

In /etc/nagios/objects/hadoop-servicegroups.cfg, get the service the plugin will run into.
Here, we’ll put this alert in the HDFS service.

Now the alert entry :

[vagrant@gw ~]$ sudo vi /etc/nagios/objects/hadoop-services.cfg
...
# NAGIOS SERVER HDFS Checks
...
define service {
hostgroup_name nagios-server
use hadoop-service
service_description HDFS::Is Cluster in Safe Mode
servicegroups HDFS
check_command check_safemode
normal_check_interval 2
retry_check_interval 1
max_check_attempts 1
}

Notice that normal_check_interval is minutes between checks.

Then restart Nagios :

[vagrant@gw ~]$ sudo service nagios restart

The alert will appears in Ambari :
nagios safe mode off

To test, let put the cluster in safe mode :

[vagrant@gw ~]$ sudo su hdfs
[hdfs@gw vagrant]$ kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs
[hdfs@gw vagrant]$ hadoop dfsadmin -safemode enter
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Safe mode is ON

Now you’ll see in about a minute that the alert is on :

nagios safe mode on

Then you can leave safemode to be ok !

[hdfs@gw vagrant]$ hadoop dfsadmin -safemode leave
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Safe mode is OFF

Note that this is just for demonstration purpose : the plugin is not implementing Kerberos for example, like in the check_nodemanager_health plugin.

You may also note that Nagios is writing its output to the /var/nagios/status.dat file which is collected and read by Ambari to display its information.

Adapted from Hortonworks documentation

 


My Ambari Notes

When trying to enable HA NameNode wizard, I encountered this weird exception in Install JournalNodes step.

When looking in the logs, I noticed this

07:57:09,617 ERROR [qtp1251571107-6336] AbstractResourceProvider:244 - Caught AmbariException when creating a resource
org.apache.ambari.server.ServiceComponentNotFoundException: ServiceComponent not found, clusterName=hdp-cluster, serviceName=HDFS, serviceComponentName=JOURNALNODE

exception in the ambari-server.log

So the solution was to “manually” install the ServiceComponent with the Ambari API :

[vagrant@gw ~]$ curl -u admin:admin -H "X-Requested-By:ambari" -i -X POST http://gw.example.com:8080/api/v1/clusters/hdp-cluster/services/HDFS/components/JOURNALNODE

HTTP/1.1 201 Created
Set-Cookie: AMBARISESSIONID=1frjfvu8yjb5fylf4w5m449c7;Path=/
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Content-Type: text/plain
Content-Length: 0
Server: Jetty(7.6.7.v20120910)