python - Hadoop Streaming - Unable to find file error -


i trying run hadoop-streaming python job.

bin/hadoop jar contrib/streaming/hadoop-0.20.1-streaming.jar  -d stream.non.zero.exit.is.failure=true  -input /ixml  -output /oxml  -mapper scripts/mapper.py  -file scripts/mapper.py  -inputreader "streamxmlrecordreader,begin=channel,end=/channel"  -jobconf mapred.reduce.tasks=0  

i made sure mapper.py has permissions. errors out saying

caused by: java.io.ioexception: cannot run program "mapper.py":      error=2, no such file or directory     @ java.lang.processbuilder.start(processbuilder.java:460)     @ org.apache.hadoop.streaming.pipemapred.configure(pipemapred.java:214) ... 19 more caused by: java.io.ioexception: error=2, no such file or directory     @ java.lang.unixprocess.forkandexec(native method)     @ java.lang.unixprocess.(unixprocess.java:53)     @ java.lang.processimpl.start(processimpl.java:91)     @ java.lang.processbuilder.start(processbuilder.java:453) 

i tried copying mapper.py hdfs , give same hdfs://localhost/mapper.py link, not work too! thoughts on how fix bug?.

looking @ example on hadoopstreaming wiki page, seems should change

-mapper scripts/mapper.py  -file scripts/mapper.py  

to

-mapper mapper.py  -file scripts/mapper.py  

since "shipped files go working directory". might need specify python interpreter directly:

-mapper /path/to/python mapper.py  -file scripts/mapper.py  

Comments

Popular posts from this blog

Add email recipient to all new Trac tickets -

400 Bad Request on Apache/PHP AddHandler wrapper -

php - Change action and image src url's with jQuery -