Saturday, April 9, 2011

Debugging Hadoop Streaming using Eclipse

This post describes how I setup Eclipse so I could debug what was happening when I tried to run a Hadoop Streaming Job with the "-conf" option.

It is based on:
  1. Cloudera's screencast on setting up eclipse 
  2. Hadoop Wiki on Setting Up Eclipse

I'm not a java or Eclipse expert so if you see a way to do something better please let me know.

I'm using the Helios release of Eclipse and the Clourdera 3 Beta 4 release for Hadoop.

Setting Up Eclipse
  1.  Download a tarball containing the cloudera source form Cloudera 3 archive 
  2. Unpack the tarball
  3. In Eclipse create a new project
  4. In the "Java Settings" dialog set the default output folder to something other than bin
    "build/eclipse-classes"
      We do this because the "bin" is used to contain hadoop shell scripts
  5. Close the project in eclipse
  6. Copy the contents of the unpacked cloudera tarball to the directory for your new eclipse project e.g
    mv some-dir/hadoop-0.20.2-CDH3B4/*  ~/workspace/your-project/
    
  7. Open the project and refresh it (right click the project and hit F5
  8. At this point in Eclipse package explorer you should see src listed under your project
      Unfortunately, the src isn't set up properly given the package names
  9. Setup the src folders in Eclipse 
    1. Right click on properties for your project
    2.  select java build path->source tab
    3. Remove the entry project/src
    4. Click add folder and add the following entries
      1. src/core
      2. src/contrib/streaming/src/java
      3. src/mapred
      4. Set the output folder to project/eclipse-build
    1. Click on Libraries and add all the jars in
      • project/lib
  10. Click on the menu project and uncheck build automatically
  11. Create a new ant builder
    1. Right click on project properties
    2. Select builders
    3. Click new and select Ant Builder
    4. For the buildfile select browse workspace and then select build.xml in your project
    5. click on targets and set after a "clean"
      1.  compile-core-classes, compile-core
      2. For manual build set the targets to compile, compile-core-classes, compile-mapred-clases, compile-contrib
  12. You should now be able to build it by clicking project-> build project
  13. To debug a file e.g "HadoopStreaming.java" just right click and select debug as "java application"
    • If you get a warning about errors in project try clicking on proceed
Final Notes
  • The wiki describes an ant task which automatically sets up Eclipse. Unfortunately the eclipse templates don't appear to be included in the cloudera CDH3B4 release. Although you could try downloading them from the Apache repository.
  • On another system I had some trouble getting Eclipse to stop at breakpoints. This seems to be java+Eclipse issue.

No comments:

Post a Comment