Monthly Archives: March 2015

Host is in invalid state

I did have some “Host is in invalid state” message from Ambari; In that situation you cannot restart or do anything.

The last time it occured was on ZOOKEEPER_CLIENT, so here is the way of putting the component back to its original state :

to have the ZOOKEEPER_CLIENT status :

curl -u admin:admin -i -H 'X-Requested-By: ambari' -X GET http://$AMBARI_HOST:8080/api/v1/clusters/$MYCLUSTER/hosts/$ZOOKEEPER_CLIENT_HOST/host_components/ZOOKEEPER_CLIENT

if it’s not in INSTALLED state :

curl -u admin:admin -i -H 'X-Requested-By: ambari' -X PUT -d '{"HostRoles": {"state": "INSTALLED"}}' http://$AMBARI_HOST:8080/api/v1/clusters/$MYCLUSTER/hosts/$ZOOKEEPER_CLIENT_HOST/host_components/ZOOKEEPER_CLIENT


HBase tips @ tricks

Activate compression :

ALTER TABLE 'test', {NAME=>'mycolumnfamily', COMPRESSION=>'SNAPPY'}


Data block encoding of keys/values

ALTER TABLE 'test', {NAME=>'mycolumnfamily', DATA_BLOCK_ENCODING => 'FAST_DIFF'}


Change Split policy for a table (for Hbase 0.94+ the default Split policy changed from ConstantSizeRegionSplitPolicy (based on hbase.hregion.max.filesize) to IncreasingToUpperBoundRegionSplitPolicy)

alter 'access_demo', {METHOD => 'table_att', CONFIGURATION => {'SPLIT_POLICY' => 'org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy'}}

Remember split will occur if the data size of a ColumnFamily gets bigger than the number defined by the policy.


Hadoop HDFS commands

Leaving SafeMode :

$ bin/hadoop dfsadmin -safemode leave


Failover NameNode :

find HDFS option

So is nn1

[root@nn ~]# hdfs haadmin -getServiceState nn1

Force transition for a NN to Active

[root@nn ~]# hdfs haadmin -transitionToActive --forcemanual nn1


Failures : what I learned

in my HA cluster, Namenodes failed to start with the following :

2015-03-16 15:11:44,724 ERROR namenode.EditLogInputStream ( - caught exception initializing
org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$LogHeaderCorruptException: Unexpected version of the file system log file: -620756992. Current version = -56.
015-03-16 15:11:45,057 FATAL namenode.NameNode ( - Exception in namenode join
org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error replaying edit log at offset 0. Expected transaction ID was 88798

at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(
Caused by: org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException: got premature end-of-file at txid 88618; expected file to go up to 14352384
at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(
at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(
at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.skipUntil(
at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(
at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(
... 12 more
2015-03-16 15:11:45,066 INFO util.ExitUtil ( - Exiting with status 1
2015-03-16 15:11:45,079 INFO namenode.NameNode ( - SHUTDOWN_MSG:
SHUTDOWN_MSG: Shutting down NameNode at

procedure is to use the recover mode :

[root@dn1 ~]# kinit -kt /etc/security/keytabs/nn.service.keytab nn/

[root@dn1 ~]# /usr/bin/hadoop namenode -recover

15/03/16 15:33:01 ERROR namenode.MetaRecoveryContext: We failed to read txId 88798
15/03/16 15:33:01 INFO namenode.MetaRecoveryContext:
Enter 'c' to continue, skipping the bad section in the log
Enter 's' to stop reading the edit log here, abandoning any later edits
Enter 'q' to quit without saving
Enter 'a' to always select the first choice in the future without prompting. (c/s/q/a)

15/03/16 15:33:13 INFO namenode.MetaRecoveryContext: Continuing
15/03/16 15:33:13 INFO namenode.FSImage: Edits file of size 1048576 edits # 0 loaded in 11 seconds
15/03/16 15:33:13 INFO namenode.FSNamesystem: Need to save fs image? false (staleImage=true, haEnabled=true, isRollingUpgrade=false)
15/03/16 15:33:13 INFO namenode.NameCache: initialized with 9 entries 212 lookups
15/03/16 15:33:13 INFO namenode.FSNamesystem: Finished loading FSImage in 18865 msecs
15/03/16 15:33:13 INFO namenode.FSImage: Save namespace ...
15/03/16 15:33:14 INFO namenode.NNStorageRetentionManager: Going to retain 2 images with txid >= 88618
15/03/16 15:33:14 INFO namenode.NNStorageRetentionManager: Purging old image FSImageFile(file=/hadoop/hdfs/namenode/current/fsimage_0000000000000088386, cpktTxId=0000000000000088386)
15/03/16 15:33:15 INFO namenode.MetaRecoveryContext: RECOVERY COMPLETE
15/03/16 15:33:15 INFO namenode.FSNamesystem: Stopping services started for active state
15/03/16 15:33:15 INFO namenode.FSNamesystem: Stopping services started for standby state
15/03/16 15:33:15 INFO namenode.NameNode: SHUTDOWN_MSG:
SHUTDOWN_MSG: Shutting down NameNode at

Then we just have to start NameNode (you can expect some missing blocks though)

Your virtual cluster on your machine

Playing with HDP sandbox is cool. This is a quick way to see how things are going, try some Hive requests, and so on.

At one time, you may want to build your own virtual cluster on your machine : several virtual machines, making a “real” cluster to have something closest to reality.

Of course you can start from scratch, but there’s a quickest and easiest way : playing with structor.

Structor is a project hosted by Hortonworks, based on Vagrant (basically a tool who plays with VMs on VirtualBox or VMWare, dealing with OS images) and there’s a lot of forks for its functionalities to grow.
I based my stuff on Andrew Miller’s fork, with some minor modifications.

Ok, let’s start with getting structor :

~$ mkdir Repository && cd Repository
Repository~ $ git clone

This will create a structor directory in which you’ll find basically a Vagrantfile which contains everything to build your cluster !
You just have to specify aa HDP version, an Ambari version and a profile name to build the corresponding cluster.

For building my 3-nodes cluster, I just type

structor~ $ ./ambari-cluster hdp-2.1.3 ambari-1.6.1 3node-full

In the ./profiles directory you’ll find some predefined profiles : 1node-min, 1node-full, 3node-min and 3node-full, feel free to create and submit some other profiles, but remember that your VMs will need some memory to be usable ! I put 2GB for each VM, so a 3-nodes cluster will take 6GB (7GB with a proxy VM)

I added some slight modifications for VirtualBox VMs to be named as their configuration so you can have multiple ambari-clusters in VirtualBox.

Things you have to know :

1. add the hosts on /etc/hosts on your Linux or Mac, or the equivalent on Windows (look at the doc on my github)

2. the private key is stored in each VM

3. to ssh in a machine, cd to your structor directory and do a vagrant ssh HOSTNAME

structor~ $ vagrant ssh gw
Loading profile /Users/ledel/Repository/structor/profiles/3node.profile
Last login: Tue Mar 10 12:12:38 2015 from
Welcome to your Vagrant-built virtual machine.

[vagrant@gw ~]$

4. to suspend the cluster, just

structor~ $ vagrant suspend


to wake the cluster up :

structor~ $ vagrant up



5. To copy files from/to your local machine, the current Vagrant folder is shared with each VM on /vagrant

Enjoy !

HBase regions merge

HBase writes data to multiple servers, called Region Servers.

Each region server contains one or several Regions, and data is allocated on these regions; Hbase will control which region server controls which region(s).

Regions number can be defined at the table creation level :

[hbase@gw vagrant]$ kinit -kt /etc/security/keytabs/hbase.headless.keytab hbase
[hbase@gw vagrant]$ hbase shell
hbase(main):001:0> create 'table2', 'columnfamily1', {NUMREGIONS => 5, SPLITALGO => 'HexStringSplit'}

We have previously defined that 5 regions would be accurate, regarding region servers number and desired regions size, and 2 basic algorithms are supplied, HexStringSplit and UniformSplit (but you can add yours).

You can provide your own splits :

hbase(main):001:0> create 'table2', 'columnfamily1', {NUMREGIONS => 5, SPLITS=> ['a', 'b', 'c']}

So this table2 has been created with our 5 regions, let’s go to HBase webUI to see what it looks like :

hbase01We do have our 5 regions, see the keys repartition, and we can see in the regions names : table_name,start_key,end_key,timestamp.ENCODED_REGIONNAME.

So now, if we want to merge regions, we can use the merge_region in hbase shell.
The regions have to be adjacent.

hbase(main):010:0> merge_region '234a12e83e203f2e3158c39e1da6b6e7', '89dd2d5a88e1b2b9787e3254b85b91d3'
0 row(s) in 0.0140 seconds


Notice that the ENCODED_REGIONNAME of the result region is a new one.

hbase(main):012:0> merge_region 'bfad503057fca37bd60b5a83109f7dc6','e37d7ab5513e06268459c76d5e7335e4'
0 row(s) in 0.0040 seconds

Let merge all regions, eventually !

hbase(main):013:0> merge_region '0f5fc22bf0beacbf83c1ad562324c778','af6d7af861f577ba456cff88bf5e5e38','3f1e029afd907bc62f5e5fb8b6e1b5cf','3f1e029afd907bc62f5e5fb8b6e1b5cf'
0 row(s) in 0.0290 seconds

Then we can see that only one region remains :



For the record, you can create a HBase table pre-splitted if you know the repartition of your keys : either by passing SPLITS, or by providing a SPLITS_FILE which contains the points of splitting (so lines number =regions -1)
Be aware of the order, SPLITS_FILE before {…} won’t work.

[hbase@gw vagrant]$ echo "a\nb\nc" > /tmp/splits.txt;
[hbase@gw vagrant]$ kinit -kt /etc/security/keytabs/hbase.headless.keytab hbase
[hbase@gw vagrant]$ hbase shell
hbase(main):011:0> create 'test_split', { NAME=> 'cf', VERSIONS => 1, TTL => 69200 }, SPLITS_FILE => '/tmp/splits.txt'

And the result :


Queues and capacity scheduler

Capacity scheduler allows you to create some queues at different levels, with allocating different ratio of usage.

At the beginning, you have only one queue, which is root.
All of the following is defined in conf/capacity-scheduler.xml (/etc/hadoop/conf.empty/capacity-scheduler.xml in my HDP 2.1.3) or in YARN Configs/Scheduler in Ambari.


Let’s start with two queues : a “production” and a “development” queues, which are all root subqueues

Queues definition


Now, maybe we have 2 teams in the dev department : Enginers and Datascientists.
Let split the dev queue in two sub-queues  :,ds

Queues capacity

We now have to define the percentage capacity of all these queues; Note that the total must be 100 (the root capacity), if not you won’t be able to start that scheduler.


So prod will have 70% of the cluster resources and dev will have 30%.
Not really, infact ! If a job is run in dev and there’s no use of prod, then dev will take 100% of the cluster.
This make sense, because we don’t want the cluster to be under-utilized.

As you can imagine, eng will take 60% of dev capacity, and is able to reach 100% of dev if ds is empty.

We may want to limit dev to a maximum extended capacity (default is so 100%) because we never want this queue to use too much resources.
For that purpose, use the max-capacity parameter

 Queues status

Must be set to RUNNING; If set to STOPPED then you won’t be able to submit new jobs to that queue.


Queues ACLs

The most important thing to understand is that ACLs are inherited. That means that you can’t restrain permissions, only extends them !
Most common mistake is ACLs set to * (meaning all users) on the root level : consequently, any user will be able to submit jobs to any queue : default is


ACLs format is a bit tricky :


Then, on each queue, you can set 3 parameters : acl_submit_applications, acl_administer_queue and acl_administer_jobs. dev dev* dev

Any user of dev group can submit jobs but only John an administer queue.

You can see the “real” authorizations in a terminal :

[vagrant@gw ~]$ su ambari-qa
[ambari-qa@gw ~]# mapred queue -showacls
Queue acls for user : ambari-qa
Queue Operations
[root@gw ~]#

Of course, yarn.acl.enable has to be set to true

Another thing is you don’t have to restart YARN for each scheduler modification, except for deleting existing queues; If you’re only adding queues or adjusting some settings, just type a simple

[root@gw ~]# kinit -kt /etc/security/keytabs/yarn.headless.keytab yarn@EXAMPLE.COM
[root@gw ~]# yarn rmadmin -refreshQueues

You can see the queues in two ways :
– in the CLI

[root@gw ~]# mapred queue -list
15/03/05 09:13:11 INFO client.RMProxy: Connecting to ResourceManager at
Queue Name : dev
Queue State : running
Scheduling Info : Capacity: 60.000004, MaximumCapacity: 100.0, CurrentCapacity: 0.0
Queue Name : ds
Queue State : running
Scheduling Info : Capacity: 30.000002, MaximumCapacity: 100.0, CurrentCapacity: 0.0
Queue Name : eng
Queue State : running
Scheduling Info : Capacity: 70.0, MaximumCapacity: 100.0, CurrentCapacity: 0.0
Queue Name : prod
Queue State : running
Scheduling Info : Capacity: 40.0, MaximumCapacity: 100.0, CurrentCapacity: 0.0

– in the UI : go to the ResourceManager UI (Ambari YARN/Quick links), then click on Scheduler :


Ambari tips & tricks

Restarting some components

(including clients, which you can’t put to another state than “INSTALLED”) :

curl -uadmin:admin -H 'X-Requested-By: ambari' -X POST -d '
"context":"Restart ZooKeeper Client and HDFS Client",

As indicated in the wiki, the RESTART refreshs the configs.


Delete a host from Ambari

// get all COMPONENTS for the host

[root@uabdfes03 ~]# curl -u admin:admin -H "X-Requested-By:ambari" -i -X GET http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/hosts/$HOSTNAME/host_components

// delete all COMPONENTS for this HOST
// delete HOST
[root@host ~]# curl -u admin:admin -H "X-Requested-By:ambari" -i -X DELETE http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER/hosts/$HOSTNAME

Delete a service

(for example STORM)

// get the components for that service
[vagrant@gw ~]$ curl -u admin:admin -X GET
// stop the service
[vagrant@gw ~]$ curl -u admin:admin -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo":{"context":"Stop Service"},"Body":{"ServiceInfo":{"state":"INSTALLED"}}}'
//stop each component on each host
[vagrant@gw ~]$ for COMPONENT_NAME in DRPC_SERVER NIMBUS STORM_REST_API STORM_UI_SERVER SUPERVISOR; do curl -u admin:admin -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo":{"context":"Stop Component"},"Body":{"HostRoles":{"state":"INSTALLED"}}}'${COMPONENT_NAME}; done
// stop service components
[vagrant@gw ~]$ for COMPONENT_NAME in DRPC_SERVER NIMBUS STORM_REST_API STORM_UI_SERVER SUPERVISOR; do curl -u admin:admin -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo":{"context":"Stop All Components"},"Body":{"ServiceComponentInfo":{"state":"INSTALLED"}}}'${COMPONENT_NAME}; done
// delete the service
[vagrant@gw ~]$ curl -u admin:admin -H 'X-Requested-By: ambari' -X DELETE

 Add a component

For example we want to add a HBase RegionServer

// add the component
[vagrant@gw ~]$ curl -u admin:admin -H "X-Requested-By:ambari" -i -X POST

// then install
[vagrant@gw ~]$ curl -u admin:admin -H "X-Requested-By:ambari" -i -X PUT -d '{"RequestInfo": {"context": "Install RegionServer","query":"HostRoles/'HBASE_REGIONSERVER')"}, "Body":{"HostRoles": {"state": "INSTALLED"}}}'

 Get host components for a service

[vagrant@gw ~]$ curl -u admin:admin -H "X-Requested-By:ambari" -i -X GET


Kerberos Tips & Tricks

Read a keytab to see principals :

[root@gw ~]# ktutil
ktutil: read_kt /etc/security/keytabs/nn.service.keytab
ktutil: list
slot KVNO Principal

---- ---- ---------------------------------------------------------------------
1 3 nn/
2 3 nn/
3 3 nn/
4 3 nn/
5 3 nn/
6 3 nn/
7 3 nn/
8 3 nn/
ktutil: quit
[root@gw ~]#

Service keytabs are for a service, so added for a specific machine.
Therefore, if you want to add an existing service to another node, you must create that service for that additional node.

[root@ ~]# ipa service-add zookeeper/newnode@MY_CLUSTER
[root@~]# ipa-getkeytab -s IPASERVER -p zookeeper/newnode@MY_CLUSTER -k zk.service.keytab.newnode
[root@~]# chmod 400 zk.service.keytab.newnode
[root@~]# scp zk.service.keytab.newnode NEWNODE:/etc/security/keytabs/.
[root@NEWNODE ~]# mv /etc/security/keytabs/zk.service.keytab{.newnode,}
[root@NEWNODE ~]# chown zookeeper:hadoop /etc/security/keytabs/zk.service.keytab

If you do the ipa-getkeytab on an existing keytab, it will add the service in the keytab, not replace it.


If for some reason IPA doesn’t work :

// adding principal
[root@gw ~]# kadmin.local -q "addprinc -randkey hbase/" -x ipa-setup-override-restrictions
// then get the keytab
[root@gw ~]# kadmin.local -q "xst -k /home/vagrant/tmp_keytabs/hbase.service.keytab.nn hbase/"


Hadoop CLI tips & tricks

Here are some Hadoop CLI tips & tricks

For manual switch between Active & Standby NameNodes, you have to take in consideration the ServiceIds, which are by default nn1 and nn2.

If nn1 is the Active and nn2 the Standby NameNode, switch nn2 to Active with

[vagrant@gw ~]$ sudo -u hdfs hdfs haadmin -failover nn1 nn2