check Ambari or any http-backed app with telnet

For some reason I wanted to check if Ambari was correctly working but didn’t have any browser access.

Check if it was listening on its 8080 port is easy with

$ netstat -anpe | grep 8080

If you really want to check if ambari is answering to requests, you can check that with telnet to the host and typing

GET / HTTP/1.1
Host: <AMBARI_FQDN>

 

localhost$ telnet ambari.mycluster.com 8080
Trying 10.195.196.48...
Connected to ambari.mycluster.com.
Escape character is '^]'.
GET / HTTP/1.1
Host: ambari.mycluster.com

HTTP/1.1 200 OK
Content-Type: text/html
Last-Modified: Fri, 02 Oct 2015 18:12:56 GMT
Accept-Ranges: bytes
Content-Length: 2012
Server: Jetty(8.1.17.v20150415)

<!--
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
-->

[...]

check if a port is opened without using telnet

On several machines in a corporate SI, telnet is not installed, so your usual telnet HOST PORT command is not working.

You can replace that with a netcat command :

[root@dn25 ~]# nc -z nn.fqdn.com 8020
Connection to nn.fqdn.com 8020 port [tcp/intu-ec-svcdisc] succeeded!

Hint : use the -u option to specify a UDP port.


one-shot backup all config files with Ambari API

Before performing an upgrade for example, or in a more general manner, it’s a good thing to get backups of your config files.

In an upgrade situation, this allows you to perform a quick diff to see if there are parameters which had been reset, or new parameters to take care of.

Ambari API allows you to get all config files, and we’ll use the amazing configs.sh script provided by Hortonworks in HDP to perform backups.

First, config types : when looking at the output of http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER_NAME/, we can find them on Clusters / desired_config.

Ambari API result

So let’s dig into that part : go to http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER_NAME/?fields=Clusters/desired_configs

Ambari API desired_configs

Now let’s get only file names :

[root@sandbox ~]# curl -s -u admin:admin http://sandbox.hortonworks.com:8080/api/v1/clusters/Sandbox/?fields=Clusters/desired_configs | grep '" : {' | grep -v Clusters | grep -v desired_configs | cut -d'"' -f2
ams-env
ams-hbase-env
ams-hbase-log4j
ams-hbase-policy
ams-hbase-security-site
ams-hbase-site
ams-log4j
ams-site
capacity-scheduler
cluster-env
core-site
...

Now for each file we can get a backup with the following command :

[root@sandbox ~]# /var/lib/ambari-server/resources/scripts/configs.sh -u admin -p admin -port 8080 get $AMBARI_HOST $CLUSTER_NAME $CONFIG_TYPE | grep '^"' | grep -v '^"properties" : {'

where CONFIG_TYPE = core-site for example.

So we can now have a complete backup:

#!/bin/bash
AMBARI_HOST=sandbox.hortonworks.com
CLUSTER_NAME=Sandbox
AMBARI_USER=admin
AMBARI_PASSWORD=admin
AMBARI_PORT=8080
timeNow=`date +%Y%m%d_%H%M%S`
RESULT_DIR=/root/migrationHDP/configs.sh/$timeNow
mkdir -p $RESULT_DIR
for CONFIG_TYPE in `curl -s -u $AMBARI_USER:$AMBARI_PASSWORD http://$AMBARI_HOST:$AMBARI_PORT/api/v1/clusters/$CLUSTER_NAME/?fields=Clusters/desired_configs | grep '" : {' | grep -v Clusters | grep -v desired_configs | cut -d'"' -f2`; do
echo "backuping $CONFIG_TYPE"
/var/lib/ambari-server/resources/scripts/configs.sh -u $AMBARI_USER -p $AMBARI_PASSWORD -port $AMBARI_PORT get $AMBARI_HOST $CLUSTER_NAME $CONFIG_TYPE | grep '^"' | grep -v '^"properties" : {' > $RESULT_DIR/$CONFIG_TYPE.conf
done

 

Note than you can also output to a single file to make the diff easier, adding the CONFIG_TYPE to have a better view :

#!/bin/bash
AMBARI_HOST=sandbox.hortonworks.com
CLUSTER_NAME=Sandbox
AMBARI_USER=admin
AMBARI_PASSWORD=admin
AMBARI_PORT=8080
timeNow=`date +%Y%m%d_%H%M%S`
RESULT_DIR=/root/migrationHDP/configs.sh/$timeNow
mkdir -p $RESULT_DIR
for CONFIG_TYPE in `curl -s -u $AMBARI_USER:$AMBARI_PASSWORD http://$AMBARI_HOST:$AMBARI_PORT/api/v1/clusters/$CLUSTER_NAME/?fields=Clusters/desired_configs | grep '" : {' | grep -v Clusters | grep -v desired_configs | cut -d'"' -f2`; do
echo "backuping $CONFIG_TYPE"
/var/lib/ambari-server/resources/scripts/configs.sh -u $AMBARI_USER -p $AMBARI_PASSWORD -port $AMBARI_PORT get $AMBARI_HOST $CLUSTER_NAME $CONFIG_TYPE | grep '^"' | grep -v '^"properties" : {' | sed "1i ##### $config_type #####" >> $RESULT_DIR/all.conf
done

adjust log4j log level for a class

In all adjustments you can do in log4j (and there’s a lot), you may want to adjust the verbosity level of a particular class.

For example, I wanted to decrease verbosity of CapacityScheduler because I had the ResourceManager log full of

2015-12-15 12:11:06,667 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed...
2015-12-15 12:11:06,667 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed...
2015-12-15 12:11:06,667 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed...
2015-12-15 12:11:06,668 INFO capacity.CapacityScheduler (CapacityScheduler.java:completedContainer(1190)) - Null container completed...

I first found the CapacityScheduler “full name” length (meaning with the package) referenced under 

log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler=WARN

Restarted ResourceManager, and voilà!

 


installing a NFS gateway on Sandbox

NFS gateway is a neat way to access HDFS without a HDFS client, HDFS would then be appears mounted on the local filesystem as any directory.

We have to start by saying NFS user to be able to inpersonate users which will access our cluster, so let’s add in HDFS/configs/custom core-site.xml hadoop.proxyuser.nfsserver.groups and hadoop.proxyuser.nfsserver.hosts

NFS proxyuser

 

Then we add on custom hdfs-site.xml our Kerberos credentials (of course, your Sandbox is kerberized, is it?)

NFS: Kerberos credentials

 

in the same custom hdfs-site.xml, add the following properties which will respectively indicates a spool temporary directory (to re-order sequential writes before writing to HDFS) and the access control policy (here anyone can read/write but you could use another policy represented by MACHINE_NAME RW_POLICY, the latest could be rw (read&write) or ro (read-only))

NFS: mount points

Of course we have to add principal and get keytab for our NFS gateway.
Notice I had to use dfs.nfs.keytab.file and dfs.nfs.kerberos.principal for nfs3 gateway to launch.

We have to launch portmap and nfs3 :

[root@sandbox ~]# hadoop-daemon.sh start portmap

[root@sandbox ~]# hadoop-daemon.sh start nfs3

and mount a new directory as the new mount point for accessing HDFS :

[root@sandbox ~]# mkdir -p /media/hdfs

[root@sandbox ~]# mount -t nfs -o vers=3,proto=tcp,nolock 10.0.2.15:/ /media/hdfs/

We can check NFS is functional :

[root@sandbox ~]# ls -l /media/hdfs/
total 5
drwxrwxrwx 3 yarn hadoop 96 2015-12-03 14:42 app-logs
drwxr-xr-x 5 hdfs hdfs 160 2015-04-24 15:11 apps
drwxr-xr-x 3 hdfs hdfs 96 2015-04-24 15:56 demo
drwxr-xr-x 3 hdfs hdfs 96 2015-04-24 14:53 hdp
drwxr-xr-x 3 mapred hdfs 96 2015-04-24 14:52 mapred
drwxrwxrwx 4 hdfs hdfs 128 2015-04-24 14:52 mr-history
drwxr-xr-x 3 hdfs hdfs 96 2015-04-24 15:41 ranger
drwxr-xr-x 3 hdfs hdfs 96 2015-04-24 14:57 system
drwxrwxrwx 14 hdfs hdfs 448 2015-12-14 15:24 tmp
drwxr-xr-x 11 hdfs hdfs 352 2015-04-24 15:33 user

[root@sandbox ~]# cp ./test01 /media/hdfs/tmp/
[root@sandbox ~]# ls -l /media/hdfs/tmp/
total 16
drwx------ 3 ambari-qa hdfs 96 2015-12-03 14:44 ambari-qa
drwx-wx-wx 6 ambari-qa hdfs 192 2015-04-24 15:32 hive
-rw-r--r-- 1 root hdfs 87 2015-12-14 16:24 test01
drwxrwxrwx 8 hdfs hdfs 256 2015-04-24 15:31 udfs
drwx------ 3 ambari-qa hdfs 96 2015-12-03 14:44 yarn

Perfect ! :)


scp keeping rights and permissions with rsync

We all had once to scp something and keeping owner,group,permissions,etc.

There’s no option like that in scp, so you may want to use rsync for copying the localhost 2.3.2.0-2950 content directory on machine1

[root@localhost ~]# rsync -avI /etc/hadoop/2.3.2.0-2950/ machine1:/etc/hadoop/2.3.2.0-2950

Here are the chosen options :

-a = archive mode (equals -rlptgoD)
-v = verbose
-p = preserve permissions
-o = preserve owner
-g = preserve group
-r = recurse into directories
-I = don’t skip files that has already been transferred


install a new ShareLib for Oozie

The latest versions of Oozie are dealing with a new schema concerning the ShareLib.
The ShareLib is used by Oozie to pickup the necessary JARs for the jobs to run in their containers; To be able to let old jobs finish while upgrading the ShareLib, Oozie now use a timestamped version of the ShareLib on HDFS.

[root@localhost ~]# su - oozie
[oozie@localhost ~]# export OOZIE_URL=http://OOZIE_SERVER_FQDN:11000/oozie
[oozie@localhost ~]# /usr/hdp/current/oozie-server/bin/oozie-setup.sh sharelib create -fs hdfs://MY_DEFAULT_FS_NAME

You’ll get the new ShareLib : 

–> the destination path for sharelib is: /user/oozie/share/lib/lib_20151207105251

Let’s have a look on what we got :

[root@localhost ~]# hdfs dfs -ls /user/oozie/share/lib
Found 11 items
drwxr-xr-x - oozie hadoop 0 2015-10-01 12:54 /user/oozie/share/lib/distcp
drwxr-xr-x - oozie hadoop 0 2015-10-05 19:05 /user/oozie/share/lib/hbase
drwxr-xr-x - oozie hadoop 0 2015-10-01 12:54 /user/oozie/share/lib/hcatalog
drwxr-xr-x - oozie hadoop 0 2015-10-01 12:54 /user/oozie/share/lib/hive
drwxr-xr-x - oozie hadoop 0 2015-11-26 19:31 /user/oozie/share/lib/lib_20151126193054
drwxrwx--- - oozie hadoop 0 2015-12-07 10:53 /user/oozie/share/lib/lib_20151207105251
drwxr-xr-x - oozie hadoop 0 2015-10-01 12:54 /user/oozie/share/lib/mapreduce-streaming
drwxr-xr-x - oozie hadoop 0 2015-10-19 14:41 /user/oozie/share/lib/oozie
drwxr-xr-x - oozie hadoop 0 2015-10-01 12:54 /user/oozie/share/lib/pig
-rwxr-xr-x 3 oozie hadoop 1393 2015-10-01 12:54 /user/oozie/share/lib/sharelib.properties
drwxr-xr-x - oozie hadoop 0 2015-10-12 15:28 /user/oozie/share/lib/sqoop

So we found our old ShareLib (lib_20151126193054) and our new ShareLib (lib_20151207105251) populated with the HDP current version.

Remember all modifications in the previous ShareLib are not automagically copied to the new ShareLib, so take a close look on the JARs.

Finally, you can tell Oozie to switch to the latest ShareLib (it will take the latest timestamp) :

[oozie@localhost ~]$ oozie admin -sharelibupdate

You can eventually check if the path has been updated by listing a jar to see its path :

[oozie@localhost ~]$ oozie admin -shareliblist hive | head


remount without noexec attribute

CentOS /var mount point is usually with noexec attribute. This is annoying when executing scripts on that mount point, like Ambari scripts !

SO if you have these “permission denied” on executing a script, simply remount your mount point :

$ sudo mount -o remount,exec /var

Don’t forget to modify accordingly your /etc/fstab file so that modification will be permanent and not loose at each reboot.


kill zombie dead regionservers

I had this dn24 RegionServer marked as dead in HBaseUI but this machine was decommissioned and removed from the cluster monthes ago.

After some digging, it appears that it stands here because it was still considered “active” by HBase, and the reason why had been found in HDFS :

[root@machine ~]# hdfs dfs -ls /apps/hbase/data/WALs/

drwxrwx--- - hbase hdfs 0 2015-11-08 00:33 /apps/hbase/data/WALs/dn17.test.fr,60020,1446939183416
drwxrwx--- - hbase hdfs 0 2015-11-08 00:33 /apps/hbase/data/WALs/dn18.test.fr,60020,1446939179122
drwxrwx--- - hbase hdfs 0 2015-11-08 00:33 /apps/hbase/data/WALs/dn19.test.fr,60020,1446939182213
drwxrwx--- - hbase hdfs 0 2015-11-08 00:33 /apps/hbase/data/WALs/dn20.test.fr,60020,1446939182925
drwxrwx--- - hbase hdfs 0 2015-11-08 00:33 /apps/hbase/data/WALs/dn21.test.fr,60020,1446939185744
drwxrwx--- - hbase hdfs 0 2015-11-08 00:33 /apps/hbase/data/WALs/dn22.test.fr,60020,1446939173931
drwxrwx--- - hbase hdfs 0 2015-11-08 00:33 /apps/hbase/data/WALs/dn24.test.fr,60020,1409665198801-splitting
drwxrwx--- - hbase hdfs 0 2015-11-08 00:33 /apps/hbase/data/WALs/dn25.test.fr,60020,1446939185856
drwxrwx--- - hbase hdfs 0 2015-11-08 00:33 /apps/hbase/data/WALs/dn26.test.fr,60020,1446939178831
drwxrwx--- - hbase hdfs 0 2015-11-08 00:33 /apps/hbase/data/WALs/dn27.test.fr,60020,1446939183921
drwxrwx--- - hbase hdfs 0 2015-11-08 00:33 /apps/hbase/data/WALs/dn28.test.fr,60020,1446939179838
drwxrwx--- - hbase hdfs 0 2015-11-08 00:33 /apps/hbase/data/WALs/dn29.test.fr,60020,1446939178499

 

Found ? The WAL (Write-Ahead Log) was still in HDFS in the “splitting” state, so from HBase perspective it’s not dead.

I removed the dn24 WAL directory in HDFS, restarted HBaseMaster (no downtime on HBase when restarting HBaseMaster), it did go away.


Kerberos client howto

If you need debugging on client side, Kerberos doesn’t do a lot of things for you.

You can then position the KRB5_TRACE environment variable, standard system out shall be enough for your needs !

[root@dn01 ~]# env KRB5_TRACE=/dev/stdout kinit -kt /etc/security/keytabs/hbase.headless.keytab hbase
[21232] 1444825930.913224: Getting initial credentials for hbase@REALM
[21232] 1444825930.914101: Looked up etypes in keytab: aes256-cts, aes128-cts, des3-cbc-sha1, rc4-hmac
[21232] 1444825930.914173: Sending request (198 bytes) to REALM
[21232] 1444825930.914599: Sending initial UDP request to dgram 192.168.1.88:88
[21232] 1444825930.918945: Received answer from dgram 192.168.1.88:88
[21232] 1444825930.919086: Response was from master KDC
[21232] 1444825930.919142: Received error from KDC: -1765328359/Additional pre-authentication required
[21232] 1444825930.919236: Processing preauth types: 136, 19, 2, 133
[21232] 1444825930.919257: Selected etype info: etype aes256-cts, salt "(null)", params ""
[21232] 1444825930.919265: Received cookie: MIT
[21232] 1444825930.919598: Retrieving hbase@REALM from FILE:/etc/security/keytabs/hbase.headless.keytab (vno 0, enctype aes256-cts) with result: 0/Success
[21232] 1444825930.919650: AS key obtained for encrypted timestamp: aes256-cts/0B3B
[21232] 1444825930.919754: Encrypted timestamp (for 1444825930.919656): plain 301AA011180F32303135313031343132333231305AA10502030E0868, encrypted DD77AE6EF5A9EFFA1A546BC34E964986BAFF339C5695B68A70689B84707503DB3FF2ECA23A30BFB5C4306E81EFFD445284E6328E9757501D
[21232] 1444825930.919778: Preauth module encrypted_timestamp (2) (flags=1) returned: 0/Success
[21232] 1444825930.919787: Produced preauth for next request: 133, 2
[21232] 1444825930.919817: Sending request (293 bytes) to REALM (master)
[21232] 1444825930.919977: Sending initial UDP request to dgram 192.168.1.88:88
[21232] 1444825930.927790: Received answer from dgram 192.168.1.88:88
[21232] 1444825930.927858: Processing preauth types: 19
[21232] 1444825930.927871: Selected etype info: etype aes256-cts, salt "(null)", params ""
[21232] 1444825930.927879: Produced preauth for next request: (empty)
[21232] 1444825930.927888: Salt derived from principal: REALMhbase
[21232] 1444825930.927903: AS key determined by preauth: aes256-cts/0B3B
[21232] 1444825930.928019: Decrypted AS reply; session key is: aes256-cts/4F13
[21232] 1444825930.928054: FAST negotiation: available
[21232] 1444825930.928099: Initializing FILE:/tmp/krb5cc_0 with default princ hbase@REALM
[21232] 1444825930.928497: Removing hbase@REALM -> krbtgt/REALM@REALM from FILE:/tmp/krb5cc_0
[21232] 1444825930.928516: Storing hbase@REALM -> krbtgt/REALM@REALM in FILE:/tmp/krb5cc_0
[21232] 1444825930.928687: Storing config in FILE:/tmp/krb5cc_0 for krbtgt/REALM@REALM: fast_avail: yes
[21232] 1444825930.928732: Removing hbase@REALM -> krb5_ccache_conf_data/fast_avail/krbtgt\/REALM\@REALM@X-CACHECONF: from FILE:/tmp/krb5cc_0
[21232] 1444825930.928748: Storing hbase@REALM -> krb5_ccache_conf_data/fast_avail/krbtgt\/REALM\@REALM@X-CACHECONF: in FILE:/tmp/krb5cc_0
[root@dn01 ~]#