Browsing posts in: oozie

Oozie 101 : your first workflow in 5 minutes

Oozie is the de facto scheduler for Hadoop, bundled in the main Hadoop distributions.

Its key concepts could be not so easy to get, so let’s do our first Oozie workflow.

You only have to consider that you’ll have Oozie clients submitting workflows to Oozie server.

 

Some examples are bundled in oozie-examples.tar.gz so let’s untar that to our local directory :

[root@sandbox ~]# su - ambari-qa
[ambari-qa@sandbox ~]$ tar -xzf /usr/hdp/2.2.4.2-2/oozie/doc/oozie-examples.tar.gz

We’ll submit our first workflow, a shell action example, but first we have to modify some parameters in the job.properties file.
The 2 main (and mandatory) files are job.properties and workflow.xml : the former including parameters, the latter the definition itself.

[ambari-qa@sandbox ~]$ cat examples/apps/shell/job.properties
...
nameNode=hdfs://localhost:8020
jobTracker=localhost:8021
queueName=default
examplesRoot=examples

oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/shell

Here we have to modify the jobTracker to point to the ResourceManager, so it goes from localhost:8021 to sandbox.hortonworks.com:8050

The nameNode goes from hdfs://localhost:8020 to hdfs://sandbox.hortonworks.com:8020, the other parameters doesn’t need any change.

Now to be submitted, the workflow.xml needs to be put on HDFS, because Oozie server works only with files on HDFS (and this is an important point since it can lead you to further mistakes : for example custom hdfs-site.xml or hive-site.xml will need to be put somewhere on HDFS for Oozie to know them)

In the job.properties example we put that path to NN/user/ambari-qa/examples/apps/shell, so let’s make that :

[ambari-qa@sandbox ~]$ hdfs dfs -put examples/

Now let’s run that job !

[ambari-qa@sandbox ~]$ oozie job -oozie http://sandbox.hortonworks.com:11000/oozie -config examples/apps/shell/job.properties -run

you have noticed the job.properties is a local file : it’s submitted by the Oozie client, and any client could have any job.properties file

A common tip is to export the OOZIE_URL env variable to not have to put that on every Oozie command:

[ambari-qa@sandbox ~]$ export OOZIE_URL=http://sandbox.hortonworks.com:11000/oozie

That replace the -oozie http://sandbox.hortonworks.com:11000/oozie in every following command

Now let's submit our workflow:

[ambari-qa@sandbox ~]$ oozie job -config examples/apps/shell/job.properties -run
job: 0000010-160617094106166-oozie-oozi-W

Now that we have the workflowID, let’s check its status:

[ambari-qa@sandbox ~]$ oozie job -info 0000010-160617094106166-oozie-oozi-W

Oozie RUNNING

and finally :

[ambari-qa@sandbox ~]$ oozie job -info 0000010-160617094106166-oozie-oozi-W

Oozie COMPLETE

That’s all good ! You can now dig into Oozie and look at every action type (shell, Hive, etc) and more advanced features (forks, conditions, etc)


install a new ShareLib for Oozie

The latest versions of Oozie are dealing with a new schema concerning the ShareLib.
The ShareLib is used by Oozie to pickup the necessary JARs for the jobs to run in their containers; To be able to let old jobs finish while upgrading the ShareLib, Oozie now use a timestamped version of the ShareLib on HDFS.

[root@localhost ~]# su - oozie
[oozie@localhost ~]# export OOZIE_URL=http://OOZIE_SERVER_FQDN:11000/oozie
[oozie@localhost ~]# /usr/hdp/current/oozie-server/bin/oozie-setup.sh sharelib create -fs hdfs://MY_DEFAULT_FS_NAME

You’ll get the new ShareLib : 

–> the destination path for sharelib is: /user/oozie/share/lib/lib_20151207105251

Let’s have a look on what we got :

[root@localhost ~]# hdfs dfs -ls /user/oozie/share/lib
Found 11 items
drwxr-xr-x - oozie hadoop 0 2015-10-01 12:54 /user/oozie/share/lib/distcp
drwxr-xr-x - oozie hadoop 0 2015-10-05 19:05 /user/oozie/share/lib/hbase
drwxr-xr-x - oozie hadoop 0 2015-10-01 12:54 /user/oozie/share/lib/hcatalog
drwxr-xr-x - oozie hadoop 0 2015-10-01 12:54 /user/oozie/share/lib/hive
drwxr-xr-x - oozie hadoop 0 2015-11-26 19:31 /user/oozie/share/lib/lib_20151126193054
drwxrwx--- - oozie hadoop 0 2015-12-07 10:53 /user/oozie/share/lib/lib_20151207105251
drwxr-xr-x - oozie hadoop 0 2015-10-01 12:54 /user/oozie/share/lib/mapreduce-streaming
drwxr-xr-x - oozie hadoop 0 2015-10-19 14:41 /user/oozie/share/lib/oozie
drwxr-xr-x - oozie hadoop 0 2015-10-01 12:54 /user/oozie/share/lib/pig
-rwxr-xr-x 3 oozie hadoop 1393 2015-10-01 12:54 /user/oozie/share/lib/sharelib.properties
drwxr-xr-x - oozie hadoop 0 2015-10-12 15:28 /user/oozie/share/lib/sqoop

So we found our old ShareLib (lib_20151126193054) and our new ShareLib (lib_20151207105251) populated with the HDP current version.

Remember all modifications in the previous ShareLib are not automagically copied to the new ShareLib, so take a close look on the JARs.

Finally, you can tell Oozie to switch to the latest ShareLib (it will take the latest timestamp) :

[oozie@localhost ~]$ oozie admin -sharelibupdate

You can eventually check if the path has been updated by listing a jar to see its path :

[oozie@localhost ~]$ oozie admin -shareliblist hive | head