Oozie 101 : your first workflow in 5 minutes

Oozie is the de facto scheduler for Hadoop, bundled in the main Hadoop distributions.

Its key concepts could be not so easy to get, so let’s do our first Oozie workflow.

You only have to consider that you’ll have Oozie clients submitting workflows to Oozie server.

 

Some examples are bundled in oozie-examples.tar.gz so let’s untar that to our local directory :

[root@sandbox ~]# su - ambari-qa
[ambari-qa@sandbox ~]$ tar -xzf /usr/hdp/2.2.4.2-2/oozie/doc/oozie-examples.tar.gz

We’ll submit our first workflow, a shell action example, but first we have to modify some parameters in the job.properties file.
The 2 main (and mandatory) files are job.properties and workflow.xml : the former including parameters, the latter the definition itself.

[ambari-qa@sandbox ~]$ cat examples/apps/shell/job.properties
...
nameNode=hdfs://localhost:8020
jobTracker=localhost:8021
queueName=default
examplesRoot=examples

oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/shell

Here we have to modify the jobTracker to point to the ResourceManager, so it goes from localhost:8021 to sandbox.hortonworks.com:8050

The nameNode goes from hdfs://localhost:8020 to hdfs://sandbox.hortonworks.com:8020, the other parameters doesn’t need any change.

Now to be submitted, the workflow.xml needs to be put on HDFS, because Oozie server works only with files on HDFS (and this is an important point since it can lead you to further mistakes : for example custom hdfs-site.xml or hive-site.xml will need to be put somewhere on HDFS for Oozie to know them)

In the job.properties example we put that path to NN/user/ambari-qa/examples/apps/shell, so let’s make that :

[ambari-qa@sandbox ~]$ hdfs dfs -put examples/

Now let’s run that job !

[ambari-qa@sandbox ~]$ oozie job -oozie http://sandbox.hortonworks.com:11000/oozie -config examples/apps/shell/job.properties -run

you have noticed the job.properties is a local file : it’s submitted by the Oozie client, and any client could have any job.properties file

A common tip is to export the OOZIE_URL env variable to not have to put that on every Oozie command:

[ambari-qa@sandbox ~]$ export OOZIE_URL=http://sandbox.hortonworks.com:11000/oozie

That replace the -oozie http://sandbox.hortonworks.com:11000/oozie in every following command

Now let's submit our workflow:

[ambari-qa@sandbox ~]$ oozie job -config examples/apps/shell/job.properties -run
job: 0000010-160617094106166-oozie-oozi-W

Now that we have the workflowID, let’s check its status:

[ambari-qa@sandbox ~]$ oozie job -info 0000010-160617094106166-oozie-oozi-W

Oozie RUNNING

and finally :

[ambari-qa@sandbox ~]$ oozie job -info 0000010-160617094106166-oozie-oozi-W

Oozie COMPLETE

That’s all good ! You can now dig into Oozie and look at every action type (shell, Hive, etc) and more advanced features (forks, conditions, etc)


3 Comments

  • Reply hoda |

    I followed your tutorial but I get error JA006. It stays in running state with the JA006 error code. How can I fix that? I started my history server but still I get the same error. I am using hdfs 2.7.1.2.3 and oozie 4.2.0.2.3.

    • Reply administrator |

      Hi Hoda, you may check the logs to see the exact error. JA00x are kind of Oozie internal error codes which doesn’t give you real insights

So, what do you think ?