Thursday, April 14, 2011

Problems setting up LinuxTaskController with Hadoop (cloudera release 3)

This post describes some of the problems I ran into trying to setup the LinuxTaskController  using Cloudera (CDH3u0). I wanted to setup the LinuxTaskController so that map reduce jobs would run as the user who submitted them.

I started by following the Instructions for setting up security in CDH3  (Note. I skipped all the steps except installing the secure packages and setting up the secure mapreduce).

Most of the problems I had were because of spacing issues in the taskcontroller. cfg file.
  1. I needed to add at least 1 newline after the final line of the taskcontroller.cfg  (which for me sets the value of mapred.tasktracker.group). (This was a known  bug in CDH3B4 but since its no longer described in the latest docs, I assume its supposed to be fixed. Its possible it was fixed and there was a problem with my upgrade from CDH3B4 to CDH3u0)
  2. Extra spaces at the end of lines.
    • I had extra spaces at the end of the line "mapred.local.dir=/somedir"
    • This caused the task controller to fail to start any task attempts
    • I discovered this by looking at my task controller log file where I saw an exception:
      • Failed to create directory "/somedir /tasktracker/jlewilocal"  (notice the space)
 

No comments:

Post a Comment