Problem with -libjars in hadoop

Solution 1:

Also worth to note subtle but important point: the way to specify additional JARs for JVMs running distributed map reduce tasks and for JVM running job client is very different.

  • -libjars makes Jars only available for JVMs running remote map and reduce task

  • To make these same JAR’s available to the client JVM (The JVM that’s created when you run the hadoop jar command) need to set HADOOP_CLASSPATH environment variable:

$ export LIBJARS=/path/jar1,/path/jar2
$ export HADOOP_CLASSPATH=/path/jar1:/path/jar2
$ hadoop jar my-example.jar com.example.MyTool -libjars ${LIBJARS} -mytoolopt value

See: http://grepalex.com/2013/02/25/hadoop-libjars/

Another cause of incorrect -libjars behaviour could be in wrong implementation and initialization of custom Job class.

  • Job class must implement Tool interface
  • Configuration class instance must be obtained by calling getConf() instead of creating new instance;

See: http://kickstarthadoop.blogspot.ca/2012/05/libjars-not-working-in-custom-mapreduce.html

Solution 2:

When you are specifying the -LIBJARS with the Hadoop jar command. First make sure that you edit your driver class as shown below:

    public class myDriverClass extends Configured implements Tool {

      public static void main(String[] args) throws Exception {
         int res = ToolRunner.run(new Configuration(), new myDriverClass(), args);
         System.exit(res);
      }

      public int run(String[] args) throws Exception
      {

        // Configuration processed by ToolRunner 
        Configuration conf = getConf();
        Job job = new Job(conf, "My Job");

        ...
        ...

        return job.waitForCompletion(true) ? 0 : 1;
    }
}

Now edit your "hadoop jar" command as shown below:

hadoop jar YourApplication.jar [myDriverClass] args -libjars path/to/jar/file

Now lets understand what happens underneath. Basically we are handling the new command line arguments by implementing the TOOL Interface. ToolRunner is used to run classes implementing Tool interface. It works in conjunction with GenericOptionsParser to parse the generic hadoop command line arguments and modifies the Configuration of the Tool.

Within our Main() we are calling ToolRunner.run(new Configuration(), new myDriverClass(), args) - this runs the given Tool by Tool.run(String[]), after parsing with the given generic arguments. It uses the given Configuration, or builds one if it's null and then sets the Tool's configuration with the possibly modified version of the conf.

Now within the run method, when we call getConf() we get the modified version of the Configuration. So make sure that you have the below line in your code. If you implement everything else and still make use of Configuration conf = new Configuration(), nothing would work.

Configuration conf = getConf();