Posts tagged with: namenode

Failures : what I learned

in my HA cluster, Namenodes failed to start with the following :

2015-03-16 15:11:44,724 ERROR namenode.EditLogInputStream ( - caught exception initializing
org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$LogHeaderCorruptException: Unexpected version of the file system log file: -620756992. Current version = -56.
015-03-16 15:11:45,057 FATAL namenode.NameNode ( - Exception in namenode join
org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error replaying edit log at offset 0. Expected transaction ID was 88798

at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(
Caused by: org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException: got premature end-of-file at txid 88618; expected file to go up to 14352384
at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(
at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(
at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.skipUntil(
at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(
at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(
... 12 more
2015-03-16 15:11:45,066 INFO util.ExitUtil ( - Exiting with status 1
2015-03-16 15:11:45,079 INFO namenode.NameNode ( - SHUTDOWN_MSG:
SHUTDOWN_MSG: Shutting down NameNode at

procedure is to use the recover mode :

[root@dn1 ~]# kinit -kt /etc/security/keytabs/nn.service.keytab nn/

[root@dn1 ~]# /usr/bin/hadoop namenode -recover

15/03/16 15:33:01 ERROR namenode.MetaRecoveryContext: We failed to read txId 88798
15/03/16 15:33:01 INFO namenode.MetaRecoveryContext:
Enter 'c' to continue, skipping the bad section in the log
Enter 's' to stop reading the edit log here, abandoning any later edits
Enter 'q' to quit without saving
Enter 'a' to always select the first choice in the future without prompting. (c/s/q/a)

15/03/16 15:33:13 INFO namenode.MetaRecoveryContext: Continuing
15/03/16 15:33:13 INFO namenode.FSImage: Edits file of size 1048576 edits # 0 loaded in 11 seconds
15/03/16 15:33:13 INFO namenode.FSNamesystem: Need to save fs image? false (staleImage=true, haEnabled=true, isRollingUpgrade=false)
15/03/16 15:33:13 INFO namenode.NameCache: initialized with 9 entries 212 lookups
15/03/16 15:33:13 INFO namenode.FSNamesystem: Finished loading FSImage in 18865 msecs
15/03/16 15:33:13 INFO namenode.FSImage: Save namespace ...
15/03/16 15:33:14 INFO namenode.NNStorageRetentionManager: Going to retain 2 images with txid >= 88618
15/03/16 15:33:14 INFO namenode.NNStorageRetentionManager: Purging old image FSImageFile(file=/hadoop/hdfs/namenode/current/fsimage_0000000000000088386, cpktTxId=0000000000000088386)
15/03/16 15:33:15 INFO namenode.MetaRecoveryContext: RECOVERY COMPLETE
15/03/16 15:33:15 INFO namenode.FSNamesystem: Stopping services started for active state
15/03/16 15:33:15 INFO namenode.FSNamesystem: Stopping services started for standby state
15/03/16 15:33:15 INFO namenode.NameNode: SHUTDOWN_MSG:
SHUTDOWN_MSG: Shutting down NameNode at

Then we just have to start NameNode (you can expect some missing blocks though)

Hadoop CLI tips & tricks

Here are some Hadoop CLI tips & tricks

For manual switch between Active & Standby NameNodes, you have to take in consideration the ServiceIds, which are by default nn1 and nn2.

If nn1 is the Active and nn2 the Standby NameNode, switch nn2 to Active with

[vagrant@gw ~]$ sudo -u hdfs hdfs haadmin -failover nn1 nn2