lock tables in Hive

For enabling locking in Hive, you must first enable the LockManager by setting the two params in hive-site.xml :

<property>
<name>hive.zookeeper.quorum</name>
<value>sandbox.hortonworks.com:2181</value>
<description>The list of zookeeper servers to talk to.
This is only needed for read/write locks.</description>
</property>

<property>
<name>hive.support.concurrency</name>
<value>true</value>
<description>Whether Hive supports concurrency or not.
A Zookeeper instance must be up and running for the default
Hive lock manager to support read-write locks.</description>
</property>

After restarting Hive, here is how to use that :

hive> lock table my_table exclusive;
OK

You can see if there’s a lock on a table

hive> show locks my_table;
OK
default@my_table EXCLUSIVE
Time taken: 0.952 seconds, Fetched: 1 row(s)

When trying to access the table during this time :

hive> select count(*) from my_table;
conflicting lock present for default@my_table mode SHARED
conflicting lock present for default@my_table mode SHARED
...

and you can release the lock with

hive> unlock table my_table;
OK
Time taken: 1.126 seconds

then the previous request will be executed (by default the request is executed x times every 60 seconds (hive.lock.sleep.between.retries))
Locks (hive.lock.numretries) are tried 100 times, unlocks (hive.unlock.numretries) 10 times.


sandbox : AppTimelineServer webUI

When using the sandbox, you can be confronted to a port forwarding issue.

For example, if you try to access AppTimelineServer webUI on port 8188, you’ll have a “Connection refused”.

This is weird since the process is listening :

[root@sandbox ~]# nc -nz 127.0.0.1 8188
Connection to 127.0.0.1 8188 port [tcp/*] succeeded!

You’ll have to configure the VirtualBox port forwarding to add this port which has not been added :

Hortonworks_Sandbox_with_HDP_2_2_-_Network_and_Hortonworks_Sandbox_with_HDP_2_2__Running_

Hortonworks_Sandbox_with_HDP_2_2_-_Network


Create a sample table

Let’s quickly add a new table to be able to play with Hive.

We use the /etc/passwd file on our Linux system as a Hive table :

[vagrant@gw ~]$ hadoop fs -put /etc/passwd /tmp
[vagrant@gw ~]$ sudo su hdfs
[hdfs@gw vagrant]$ hive
hive> CREATE TABLE passwd (
user STRING,dummy STRING,uid INT,gid INT,name STRING,home STRING,shell STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ':'
STORED AS TEXTFILE;
hive> LOAD DATA INPATH '/tmp/passwd' OVERWRITE INTO TABLE passwd;
hive> select * from passwd;
OK
root x 0 0 root /root /bin/bash
bin x 1 1 bin /bin /sbin/nologin
daemon x 2 2 daemon /sbin /sbin/nologin
adm x 3 4 adm /var/adm /sbin/nologin
lp x 4 7 lp /var/spool/lpd /sbin/nologin
...

Voilà !


custom Nagios alert in Ambari

The exercise here is to make a very simple Nagios plugin to be integrated in Ambari webUI.

We’ll check if the cluster is in safe mode or not, and put that alert into Ambari.

First let’s make the plugin, in the same directory you’ll find all scripts used by Ambari which you can duplicate and adapt.

[vagrant@gw ~]$ sudo vi /usr/lib64/nagios/plugins/check_safemode.sh
#!/bin/bash
ret=$(hadoop dfsadmin -safemode get)
if [[ $ret == *OFF ]]; then
echo "OK: $ret"
exit 0
fi
echo "KO : $ret"
exit 1

Notice that you have to echo something before every exit in the plugin, else Nagios will give you an alert.

Now define the command to execute the plugin :

[vagrant@gw ~]$ sudo vi /etc/nagios/objects/hadoop-commands.cfg

...
define command{
command_name check_safemode
command_line $USER1$/check_wrapper.sh $USER1$/check_safemode.sh -H $HOSTADDRESS$
}

Get the hostgroup name (/etc/nagios/objects/hadoop-hostgroups.cfg) in which the plugin will be executed, for example nagios-server (only one server since it’s a HDFS check !)

In /etc/nagios/objects/hadoop-servicegroups.cfg, get the service the plugin will run into.
Here, we’ll put this alert in the HDFS service.

Now the alert entry :

[vagrant@gw ~]$ sudo vi /etc/nagios/objects/hadoop-services.cfg
...
# NAGIOS SERVER HDFS Checks
...
define service {
hostgroup_name nagios-server
use hadoop-service
service_description HDFS::Is Cluster in Safe Mode
servicegroups HDFS
check_command check_safemode
normal_check_interval 2
retry_check_interval 1
max_check_attempts 1
}

Notice that normal_check_interval is minutes between checks.

Then restart Nagios :

[vagrant@gw ~]$ sudo service nagios restart

The alert will appears in Ambari :
nagios safe mode off

To test, let put the cluster in safe mode :

[vagrant@gw ~]$ sudo su hdfs
[hdfs@gw vagrant]$ kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs
[hdfs@gw vagrant]$ hadoop dfsadmin -safemode enter
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Safe mode is ON

Now you’ll see in about a minute that the alert is on :

nagios safe mode on

Then you can leave safemode to be ok !

[hdfs@gw vagrant]$ hadoop dfsadmin -safemode leave
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Safe mode is OFF

Note that this is just for demonstration purpose : the plugin is not implementing Kerberos for example, like in the check_nodemanager_health plugin.

You may also note that Nagios is writing its output to the /var/nagios/status.dat file which is collected and read by Ambari to display its information.

Adapted from Hortonworks documentation

 


upload a file with WebHDFS

By default, WebHDFS is enabled on your cluster, allowing you to make any HDFS operation through this REST API.

If you want to upload a file to HDFS, this has to be done in 2 steps :
1. create the resource

[hdfs@gw vagrant]$ curl -i --negotiate -u : -X PUT "http://nn.example.com:50070/webhdfs/v1/tmp/testfile?op=CREATE&overwrite=true"
HTTP/1.1 401 Authentication required
Date: Fri, 13 Feb 2015 11:29:54 GMT
Pragma: no-cache
Date: Fri, 13 Feb 2015 11:29:54 GMT
Pragma: no-cache
WWW-Authenticate: Negotiate
Set-Cookie: hadoop.auth=; Path=/; Expires=Thu, 01-Jan-1970 00:00:00 GMT; HttpOnly
Content-Length: 0
Server: Jetty(6.1.26)
HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Fri, 13 Feb 2015 11:29:54 GMT
Date: Fri, 13 Feb 2015 11:29:54 GMT
Pragma: no-cache
Expires: Fri, 13 Feb 2015 11:29:54 GMT
Date: Fri, 13 Feb 2015 11:29:54 GMT
Pragma: no-cache
Content-Type: application/octet-stream
Set-Cookie: hadoop.auth=u=hdfs&p=hdfs@EXAMPLE.COM&t=kerberos&e=1423862994233&s=+gAWB/1q0QOKjK9Wf6W4Bl2B6BY=; Path=/; Expires=Fri, 13-Feb-2015 21:29:54 GMT; HttpOnly
Location: http://gw.example.com:1022/webhdfs/v1/tmp/testfile?op=CREATE&delegation=HAAEaGRmcwRoZGZzAIoBS4KzxDuKAUumwEg7CQgUs7isYeQ5F6u4cV-oSig--MQFgU8SV0VCSERGUyBkZWxlZ2F0aW9uDzI0MC4wLjAuMTE6ODAyMA&namenoderpcaddress=mycluster&overwrite=true
Content-Length: 0
Server: Jetty(6.1.26)

2. upload the file in that resource
Notice that we obtained a location in the last request result, with the datanode where the resource will be created.
Now we upload our file to that URL.

[hdfs@gw vagrant]$ curl -i -X PUT -T MY_LOCAL_FILE "http://gw.example.com:1022/webhdfs/v1/tmp/testfile?op=CREATE&delegation=HAAEaGRmcwRoZGZzAIoBS4KzxDuKAUumwEg7CQgUs7isYeQ5F6u4cV-oSig--MQFgU8SV0VCSERGUyBkZWxlZ2F0aW9uDzI0MC4wLjAuMTE6ODAyMA&namenoderpcaddress=mycluster&overwrite=true"
HTTP/1.1 100 Continue
HTTP/1.1 201 Created
Cache-Control: no-cache
Expires: Fri, 13 Feb 2015 11:30:20 GMT
Date: Fri, 13 Feb 2015 11:30:20 GMT
Pragma: no-cache
Expires: Fri, 13 Feb 2015 11:30:20 GMT
Date: Fri, 13 Feb 2015 11:30:20 GMT
Pragma: no-cache
Content-Type: application/octet-stream
Location: webhdfs://mycluster/tmp/testfile
Content-Length: 0
Server: Jetty(6.1.26)

Test :

[hdfs@gw vagrant]$ hdfs dfs -cat /tmp/testfile
This is a test

more info on the Hadoop WebHDFS page


My Ambari Notes

When trying to enable HA NameNode wizard, I encountered this weird exception in Install JournalNodes step.

When looking in the logs, I noticed this

07:57:09,617 ERROR [qtp1251571107-6336] AbstractResourceProvider:244 - Caught AmbariException when creating a resource
org.apache.ambari.server.ServiceComponentNotFoundException: ServiceComponent not found, clusterName=hdp-cluster, serviceName=HDFS, serviceComponentName=JOURNALNODE

exception in the ambari-server.log

So the solution was to “manually” install the ServiceComponent with the Ambari API :

[vagrant@gw ~]$ curl -u admin:admin -H "X-Requested-By:ambari" -i -X POST http://gw.example.com:8080/api/v1/clusters/hdp-cluster/services/HDFS/components/JOURNALNODE

HTTP/1.1 201 Created
Set-Cookie: AMBARISESSIONID=1frjfvu8yjb5fylf4w5m449c7;Path=/
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Content-Type: text/plain
Content-Length: 0
Server: Jetty(7.6.7.v20120910)

Kerberizing my cluster

In a customer perspective, security is not an option.

You can hear about cryptography, firewalling, etc. but no security will take place without the 2 main components of security : authentication and authorization.
The former takes place for insuring that you are the person you pretend to be, the latter that you have the right to access a resource.

In this post we’ll talk about authentication, and the best and only existing way to implement that is Kerberos, a 30+ years protocol from MIT.

First let’s talk about glossary :

– Client has its secret key
– Server has its secret key
– TGS (Ticket Granting System) has its secret key and knows the Server key
– KDC (Key Distribution Center) knows secret keys of Client and TGS

Steps are :

– Client identifies itself with KDC, which gives back a ticket authorizing him to request the TGS
– Client asks TGS for a ticket
– Client get his ticket and send his id with the ticket, sever checks the ticket validity and authorize the access.

 

Now, let’s get to work :)

On my 3 machines (vagrant powered) cluster, first let’s install free-ipa server, which is a really great/simple/robust way to have Kerberos on your system.
This cluster is under HDP 2.1.3 and Ambari 1.6.1.

The ipa-server package (provided by RedHat) should be in your repos.

On the KDC machine :

[vagrant@gw ~]$ sudo yum install -y ipa-server
[vagrant@gw ~]$ sudo ipa-server-install

Since my cluster is virtual machines with not “real” IP addresses and FQDN, I updated the /usr/lib/python2.6/site-packages/ipapython/ipautil.py to get rid of the IANA Ip address check.

So you can now add users & groups in IPA, after doing a kinit admin. Now on every other hosts, install the client IPA package :


$ sudo yum install -y ipa-client
$ sudo ipa-client-install --enable-dns-updates

Now let’s kerberize our cluster ! In Ambari, let’s go to Security
Enable Security

Let’s get started.
step 01

Click Next and provide the information requested. Basically, the only information you need to provide is the realm.
step 02

Now Ambari proposes a smart way to generate all keytabs : download a CSV file that will be used with a script, which will take care of all that stuff.
step 03

Download that CSV file and put it on your KDC machine. For using with IPA we have to make a slight modification into Hortonwork’s script : add the -x ipa-setup-override-restrictions parameter after the kadmin.local -q “addprinc -randkey $principal” command.
Now let’s make all that keytabs.

Please note that on this version there is no rm (ResourceManager) keytab generated, so add the following line (assuming host is nn.example.com) in generate_keytab.sh before executing it :
kadmin.local -q "addprinc -randkey rm/nn.example.com@EXAMPLE.COM" -x ipa-setup-override-restrictions


[vagrant@gw ~]$ /var/lib/ambari-server/resources/scripts/keytabs.sh /vagrant/host-principal-keytab-list.csv > ./generate_keytabs.sh
[vagrant@gw ~]$ chmod +x ./generate_keytabs.sh
[vagrant@gw ~]$ sudo su -
[root@gw ~]$ cd /home/vagrant
[root@gw ~]$ kinit admin
[root@gw ~]$ ./generate_keytabs.sh

This will generate all keytabs in a tar archive we copy to each host.

Then on each machine, extract & copy the keytabs

[vagrant@gw ~]$ sudo tar xpvf keytabs_gw.example.com.tar --strip=1
[vagrant@dn1 ~]$ sudo tar xpvf keytabs_dn1.example.com.tar --strip=1
[vagrant@nn ~]$ sudo tar xpvf keytabs_nn.example.com.tar --strip=1

Please note the p option to preserve ownership, and sudo to make that option works. The –strip=1 is to avoid the ./ extraction which will make the current directory unbrowsable. Now we have all keytabs in /etc/security/keytabs directory.

It’s now time to activate our kerberized cluster by clicking the Apply button in Ambari interface.

Note : Generating a keytab will invalidate all that related keytabs on the realm !
As an example, if you re-run the generate_keytabs.sh script, this will ask new keytabs so you’ll got to copy it on all the servers.
Note 2 : If you want to enable HA on your cluster, you’ll need new keytabs because of the new components. The easiest way is to redownload the csv and regenerate all the keytabs.


Pages:123456