HDFS permissions swept

I’ve got this strange behaviour in my HDP 2.2 preview sandbox :

[root@sandbox ~]# kinit guest

[root@sandbox ~]# hdfs dfs -ls -d /user/laurent
drwx------ - laurent readers 0 2015-06-18 13:47 /user/laurent

[root@sandbox ~]# hdfs dfs -touchz /user/laurent/guest02

[root@sandbox ~]# hdfs dfs -ls /user/laurent
-rw-r--r-- 1 guest readers 0 2015-06-18 13:47 /user/laurent/guest02

Ok. Weird, I can create a file in a directory theorically unreadable.

When suddenly:

[root@sandbox ~]# /etc/init.d/xapolicymgr stop
XAPolicyManager has been stopped.

[root@sandbox ~]# # restart HDFS service through Ambari as XA-Secure is a wrapper around HDFS processes

[root@sandbox ~]# hdfs dfs -touchz /user/laurent/guest03
touchz: Permission denied: user=guest, access=EXECUTE, inode="/user/laurent":laurent:readers:drwx------

Let the party begin !

So if you’re running into that case, please check xasecure (or argus, or ranger) are not active and then bypassing HDFS rights, for example /etc/init.d/xapolicymgr stop and /etc/init.d/argus-usersync stop


sticky bit

in Linux, the sticky bit is used on 2 levels : directories and files.

Directories

Set the sticky bit so that a user can only delete files which belongs to him.
For example, /tmp have 777 permissions, but you maybe don’t want users to be able to delete all the files in that folder (which is why /tmp has the sticky bit, by the way)

So let’s have a look : first we have a 777 directory and files with different users

[root@sandbox ~]# ls -ld /test
drwxrwxrwx 2 777 root 4096 Jun 11 08:31 /test
[root@sandbox ~]# ls -l /test
-rw-rw-r-- 1 guest guest 0 Jun 11 08:32 guest
-rw-r--r-- 1 hdfs hadoop 0 Jun 11 08:32 hdfs
-rw-r--r-- 1 hdfs hadoop 0 Jun 11 08:32 hdfs2
-rw-rw-r-- 1 hue hue 0 Jun 11 08:32 hue

As expected, hue user can delete hdfs file :

[root@sandbox ~]# su - hue -c "rm -f /test/hdfs2"

Now let’s set the sticky bit to the directory and retry to delete another user file

[root@sandbox ~]# chmod +t /test
[root@sandbox ~]# su - hue -c "rm -f /test/hdfs"
rm: cannot remove `/test/hdfs': Operation not permitted

You’ll notice the “t” in the permissions list :

[root@sandbox ~]# ls -ld /test
drwxrwxrwt 2 777 root 4096 Jun 11 08:51 /test

Files

The sticky bit itself isn’t used (or in very specific cases / os). We rather talk about setuid, which is generally called sticky bit as a misnomer.

When a file belongs to a user, it can also be set as setuid, meaning it’ll be executed with file owner rights, not the launcher.
As an example, passwd is setuid root (it is setuid and belongs to root): that means if launch it under “guest“, it will be executed as root so that it can write in /etc/passwd (which is writable only by root)

You can setuid with

chmod +s /etc/passwd


so your HBase is broken

HBase can be a little tricky to understand, especially when talking about fixing.

There are 2 basic ways to fix things in HBase :

Hbase hbck

First try to run hbase hbck to see if there are inconsistencies.

If so, run a simple

[root@sandbox ~]# sudo -u hbase hbase hbck -fix

will most of the time fix things up (regions assigments).

There are a lot of options hbase hbck -help, useful ones could be hbase hbck -repair (which goes with a lot of repairs options) and hbase hbck -fixTableLocks for fixing tables locked for a long time

Recovering .META

There’s a jar shipped with HBase which can helps recovering .META lost from fs only.

To do so :

[hbase@sandbox root]$ hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair -base /hadoop/hbase -details

HBase sample table

Let’s create an simple HBase table from scratch !

There are many ways of creating a HBase table and populate it : bulk load, hbase shell, hive with HBaseStorageHandler, etc.
Here we’ll gonna use the ImportTsv class which aims to parse .tsv file to insert it into an existing HBase table.

First, let’s grab some data !

Download access.tsv to any machine of your cluster : this is a 2Gb zipped file with sample tab-separated data, containing columns rowkey,date,refer-url and http-code, and put it on HDFS.

[root@sandbox ~]# gunzip access.tsv.gz
[root@sandbox ~]# hdfs dfs -copyFromLocal ./access.tsv /tmp/

Now we have to create the table in HBase shell; it will contain only one ColumnFamily for this example

[root@sandbox ~]# hbase shell
hbase(main):001:0> create 'access_demo','cf1'
0 row(s) in 14.2610 seconds

And start the import with the ad hoc class, select the columns (don’t forget the HBASE_ROW_KEY which could be any of the column, hence it’s the first here).
Syntax is hbase JAVA_CLASS -DPARAMETERS TABLE_NAME FILE

Notice that you can specify tsv separator ‘-Dimporttsv.separator=,’ and that you obviously can add different column families cf1:field1,cf1:field2,cf2:field3,cf2:field4

[root@sandbox ~]# hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,cf1:date,cf1:refer-url,cf1:http-code access_demo /tmp/access.tsv

2015-05-21 19:55:38,144 INFO [main] mapreduce.Job: Job job_1432235700898_0002 running in uber mode : false
2015-05-21 19:55:38,151 INFO [main] mapreduce.Job: map 0% reduce 0%
2015-05-21 19:56:00,718 INFO [main] mapreduce.Job: map 7% reduce 0%
2015-05-21 19:56:03,742 INFO [main] mapreduce.Job: map 21% reduce 0%
2015-05-21 19:56:06,785 INFO [main] mapreduce.Job: map 65% reduce 0%
2015-05-21 19:56:10,846 INFO [main] mapreduce.Job: map 95% reduce 0%
2015-05-21 19:56:11,855 INFO [main] mapreduce.Job: map 100% reduce 0%
2015-05-21 19:56:13,948 INFO [main] mapreduce.Job: Job job_1432235700898_0002 completed successfully

Let’s check :

[root@sandbox ~]# hbase shell
hbase(main):001:0> list
TABLE
access_demo
iemployee
sales_data
3 row(s) in 9.7180 seconds

=> ["access_demo", "iemployee", "sales_data"]
hbase(main):002:0> scan 'access_demo'
ROW COLUMN+CELL
# rowkey column=cf1:date, timestamp=1432238079103, value=date
# rowkey column=cf1:http-code, timestamp=1432238079103, value=http-code
# rowkey column=cf1:refer-url, timestamp=1432238079103, value=refer-url
74.201.80.25/san-rafael-ca/events/sho column=cf1:date, timestamp=1432238079103, value=2008-01-25 16:20:50
w/80343522-eckhart-tolle
74.201.80.25/san-rafael-ca/events/sho column=cf1:http-code, timestamp=1432238079103, value=200
w/80343522-eckhart-tolle
74.201.80.25/san-rafael-ca/events/sho column=cf1:refer-url, timestamp=1432238079103, value=www.google.com/search
w/80343522-eckhart-tolle
calendar.boston.com/ column=cf1:date, timestamp=1432238079103, value=2008-01-25 19:35:50
calendar.boston.com/ column=cf1:http-code, timestamp=1432238079103, value=200

This is it !


get metrics with Ambari API

[vagrant@gw ~]$ curl -u admin:admin -X GET http://gw.example.com:8080/api/v1/clusters/hdp-cluster/hosts/nn.example.com/host_components/NAMENODE?fields=metrics/jvm
{
 "href" : "http://gw.example.com:8080/api/v1/clusters/hdp-cluster/hosts/nn.example.com/host_components/NAMENODE?fields=metrics/jvm",
 "HostRoles" : {
 "cluster_name" : "hdp-cluster",
 "component_name" : "NAMENODE",
 "host_name" : "nn.example.com"
 },
 "host" : {
 "href" : "http://gw.example.com:8080/api/v1/clusters/hdp-cluster/hosts/nn.example.com"
 },
 "metrics" : {
 "jvm" : {
 "HeapMemoryMax" : 1052770304,
 "HeapMemoryUsed" : 56104392,
 "NonHeapMemoryMax" : 318767104,
 "NonHeapMemoryUsed" : 49148216,
 "gcCount" : 190,
 "gcTimeMillis" : 4599,
 "logError" : 0,
 "logFatal" : 0,
 "logInfo" : 16574,
 "logWarn" : 2657,
 "memHeapCommittedM" : 1004.0,
 "memHeapUsedM" : 53.473206,
 "memMaxM" : 1004.0,
 "memNonHeapCommittedM" : 133.625,
 "memNonHeapUsedM" : 46.87139,
 "threadsBlocked" : 0,
 "threadsNew" : 0,
 "threadsRunnable" : 7,
 "threadsTerminated" : 0,
 "threadsTimedWaiting" : 54,
 "threadsWaiting" : 7
 }
 }
}

The metrics you may want to watch are HeapMemoryMax and HeapMemoryUsed


Host is in invalid state

I did have some “Host is in invalid state” message from Ambari; In that situation you cannot restart or do anything.

The last time it occured was on ZOOKEEPER_CLIENT, so here is the way of putting the component back to its original state :

to have the ZOOKEEPER_CLIENT status :

curl -u admin:admin -i -H 'X-Requested-By: ambari' -X GET http://$AMBARI_HOST:8080/api/v1/clusters/$MYCLUSTER/hosts/$ZOOKEEPER_CLIENT_HOST/host_components/ZOOKEEPER_CLIENT

if it’s not in INSTALLED state :

curl -u admin:admin -i -H 'X-Requested-By: ambari' -X PUT -d '{"HostRoles": {"state": "INSTALLED"}}' http://$AMBARI_HOST:8080/api/v1/clusters/$MYCLUSTER/hosts/$ZOOKEEPER_CLIENT_HOST/host_components/ZOOKEEPER_CLIENT

 


HBase tips @ tricks

Activate compression :

ALTER TABLE 'test', {NAME=>'mycolumnfamily', COMPRESSION=>'SNAPPY'}

 

Data block encoding of keys/values

ALTER TABLE 'test', {NAME=>'mycolumnfamily', DATA_BLOCK_ENCODING => 'FAST_DIFF'}

 

Change Split policy for a table (for Hbase 0.94+ the default Split policy changed from ConstantSizeRegionSplitPolicy (based on hbase.hregion.max.filesize) to IncreasingToUpperBoundRegionSplitPolicy)

alter 'access_demo', {METHOD => 'table_att', CONFIGURATION => {'SPLIT_POLICY' => 'org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy'}}

Remember split will occur if the data size of a ColumnFamily gets bigger than the number defined by the policy.

 


Hadoop HDFS commands

Leaving SafeMode :

$ bin/hadoop dfsadmin -safemode leave

 

Failover NameNode :

find HDFS option

dfs.namenode.http-address.mycluster.nn1=nn.example.com:50070
dfs.namenode.http-address.mycluster.nn2=dn1.example.com:50070

So nn.example.com is nn1

[root@nn ~]# hdfs haadmin -getServiceState nn1
standby

Force transition for a NN to Active

[root@nn ~]# hdfs haadmin -transitionToActive --forcemanual nn1

 

Failures : what I learned

in my HA cluster, Namenodes failed to start with the following :

2015-03-16 15:11:44,724 ERROR namenode.EditLogInputStream (EditLogFileInputStream.java:nextOpImpl(198)) - caught exception initializing http://gw.example.com:8480/getJournal?jid=mycluster&segmentTxId=88798&storageInfo=-56%3A567324971%3A0%3ACID-7958b480-2d52-49dc-8d71-e0d14429dbce
org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$LogHeaderCorruptException: Unexpected version of the file system log file: -620756992. Current version = -56.
{...}
015-03-16 15:11:45,057 FATAL namenode.NameNode (NameNode.java:main(1400)) - Exception in namenode join
org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error replaying edit log at offset 0. Expected transaction ID was 88798

at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:193)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:665)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:272)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:891)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:638)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:480)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:536)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:695)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:680)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1329)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1395)
Caused by: org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException: got premature end-of-file at txid 88618; expected file to go up to 14352384
at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:194)
at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.skipUntil(EditLogInputStream.java:151)
at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:178)
at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:180)
... 12 more
2015-03-16 15:11:45,066 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2015-03-16 15:11:45,079 INFO namenode.NameNode (StringUtils.java:run(640)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at dn1.example.com/240.0.0.12
************************************************************/

procedure is to use the recover mode :

[root@dn1 ~]# kinit -kt /etc/security/keytabs/nn.service.keytab nn/dn1.example.com

[root@dn1 ~]# /usr/bin/hadoop namenode -recover

{...}
15/03/16 15:33:01 ERROR namenode.MetaRecoveryContext: We failed to read txId 88798
15/03/16 15:33:01 INFO namenode.MetaRecoveryContext:
Enter 'c' to continue, skipping the bad section in the log
Enter 's' to stop reading the edit log here, abandoning any later edits
Enter 'q' to quit without saving
Enter 'a' to always select the first choice in the future without prompting. (c/s/q/a)

c
15/03/16 15:33:13 INFO namenode.MetaRecoveryContext: Continuing
15/03/16 15:33:13 INFO namenode.FSImage: Edits file http://nn.example.com:8480/getJournal?jid=mycluster&segmentTxId=88798&storageInfo=-56%3A567324971%3A0%3ACID-7958b480-2d52-49dc-8d71-e0d14429dbce of size 1048576 edits # 0 loaded in 11 seconds
15/03/16 15:33:13 INFO namenode.FSNamesystem: Need to save fs image? false (staleImage=true, haEnabled=true, isRollingUpgrade=false)
15/03/16 15:33:13 INFO namenode.NameCache: initialized with 9 entries 212 lookups
15/03/16 15:33:13 INFO namenode.FSNamesystem: Finished loading FSImage in 18865 msecs
15/03/16 15:33:13 INFO namenode.FSImage: Save namespace ...
15/03/16 15:33:14 INFO namenode.NNStorageRetentionManager: Going to retain 2 images with txid >= 88618
15/03/16 15:33:14 INFO namenode.NNStorageRetentionManager: Purging old image FSImageFile(file=/hadoop/hdfs/namenode/current/fsimage_0000000000000088386, cpktTxId=0000000000000088386)
15/03/16 15:33:15 INFO namenode.MetaRecoveryContext: RECOVERY COMPLETE
15/03/16 15:33:15 INFO namenode.FSNamesystem: Stopping services started for active state
15/03/16 15:33:15 INFO namenode.FSNamesystem: Stopping services started for standby state
15/03/16 15:33:15 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at dn1.example.com/240.0.0.12
************************************************************/

Then we just have to start NameNode (you can expect some missing blocks though)