Monthly Archives: June 2016

HDFS ls and Out of Memory (GC Overhead limit)

If you have an error when doing a ls like

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.StringBuffer.toString(
at org.apache.hadoop.fs.Path.makeQualified(

You might increase the client memory :

HADOOP_CLIENT_OPTS="-Xmx4g" hdfs dfs -ls -R /

Oozie 101 : your first workflow in 5 minutes

Oozie is the de facto scheduler for Hadoop, bundled in the main Hadoop distributions.

Its key concepts could be not so easy to get, so let’s do our first Oozie workflow.

You only have to consider that you’ll have Oozie clients submitting workflows to Oozie server.


Some examples are bundled in oozie-examples.tar.gz so let’s untar that to our local directory :

[root@sandbox ~]# su - ambari-qa
[ambari-qa@sandbox ~]$ tar -xzf /usr/hdp/

We’ll submit our first workflow, a shell action example, but first we have to modify some parameters in the file.
The 2 main (and mandatory) files are and workflow.xml : the former including parameters, the latter the definition itself.

[ambari-qa@sandbox ~]$ cat examples/apps/shell/

Here we have to modify the jobTracker to point to the ResourceManager, so it goes from localhost:8021 to

The nameNode goes from hdfs://localhost:8020 to hdfs://, the other parameters doesn’t need any change.

Now to be submitted, the workflow.xml needs to be put on HDFS, because Oozie server works only with files on HDFS (and this is an important point since it can lead you to further mistakes : for example custom hdfs-site.xml or hive-site.xml will need to be put somewhere on HDFS for Oozie to know them)

In the example we put that path to NN/user/ambari-qa/examples/apps/shell, so let’s make that :

[ambari-qa@sandbox ~]$ hdfs dfs -put examples/

Now let’s run that job !

[ambari-qa@sandbox ~]$ oozie job -oozie -config examples/apps/shell/ -run

you have noticed the is a local file : it’s submitted by the Oozie client, and any client could have any file

A common tip is to export the OOZIE_URL env variable to not have to put that on every Oozie command:

[ambari-qa@sandbox ~]$ export OOZIE_URL=

That replace the -oozie in every following command

Now let's submit our workflow:

[ambari-qa@sandbox ~]$ oozie job -config examples/apps/shell/ -run
job: 0000010-160617094106166-oozie-oozi-W

Now that we have the workflowID, let’s check its status:

[ambari-qa@sandbox ~]$ oozie job -info 0000010-160617094106166-oozie-oozi-W


and finally :

[ambari-qa@sandbox ~]$ oozie job -info 0000010-160617094106166-oozie-oozi-W


That’s all good ! You can now dig into Oozie and look at every action type (shell, Hive, etc) and more advanced features (forks, conditions, etc)