python - Hadoop Streaming - Unable to find file error -
i trying run hadoop-streaming python job.
bin/hadoop jar contrib/streaming/hadoop-0.20.1-streaming.jar -d stream.non.zero.exit.is.failure=true -input /ixml -output /oxml -mapper scripts/mapper.py -file scripts/mapper.py -inputreader "streamxmlrecordreader,begin=channel,end=/channel" -jobconf mapred.reduce.tasks=0 i made sure mapper.py has permissions. errors out saying
caused by: java.io.ioexception: cannot run program "mapper.py": error=2, no such file or directory @ java.lang.processbuilder.start(processbuilder.java:460) @ org.apache.hadoop.streaming.pipemapred.configure(pipemapred.java:214) ... 19 more caused by: java.io.ioexception: error=2, no such file or directory @ java.lang.unixprocess.forkandexec(native method) @ java.lang.unixprocess.(unixprocess.java:53) @ java.lang.processimpl.start(processimpl.java:91) @ java.lang.processbuilder.start(processbuilder.java:453) i tried copying mapper.py hdfs , give same hdfs://localhost/mapper.py link, not work too! thoughts on how fix bug?.
looking @ example on hadoopstreaming wiki page, seems should change
-mapper scripts/mapper.py -file scripts/mapper.py to
-mapper mapper.py -file scripts/mapper.py since "shipped files go working directory". might need specify python interpreter directly:
-mapper /path/to/python mapper.py -file scripts/mapper.py
Comments
Post a Comment