工学1号馆

home

« | 返回首页 | »

使用命令行编译打包运行MapReduce程序

By Wu Yudong on May 21, 2015

原创文章,转载请注明: 转载自工学1号馆

本文链接地址: http://wuyudong.com/archives/40

对于如何编译WordCount.java,对于0.20 等旧版本版本的做法很常见,具体如下:

 javac -classpath /usr/local/hadoop/hadoop-1.0.1/hadoop-core-1.0.1.jar WordCount.java

但较新的 2.X 版本中,已经没有 hadoop-core*.jar 这个文件,因此编辑和打包自己的MapReduce程序与旧版本有所不同。

本文以 Hadoop 2.6环境下的WordCount实例来介绍 2.x 版本中如何编辑自己的MapReduce程序。

Hadoop 2.x 版本中的依赖 jar

Hadoop 2.x 版本中jar不再集中在一个 hadoop-core*.jar 中,而是分成多个 jar,如运行WordCount实例需要如下三个 jar:

  • $HADOOP_HOME/share/hadoop/common/hadoop-common-2.6.0.jar
  • $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.6.0.jar
  • $HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar

编译、打包 Hadoop MapReduce 程序

将上述 jar 添加至 classpath 路径:

hadoop@ubuntu:~$ export CLASSPATH="$HADOOP_HOME/home/hadoop/opt/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar:$HADOOP_HOME/home/hadoop/opt/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.6.0.jar:$HADOOP_HOME/home/hadoop/opt/hadoop-2.6.0/share/hadoop/common/lib/commons-cli-1.2.jar:$CLASSPATH"

接着就可以编译 WordCount.java 了(使用的是 2.6.0源码中的 WordCount.java)

文件位于/hadoop-2.6.0-src/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples 中,

javac WordCount.java

编译时会有警告,可以忽略。编译后可以看到生成了几个.class文件。

/home/hadoop/opt/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar(org/apache/hadoop/fs/Path.class): warning: Cannot find annotation method 'value()' in type 'LimitedPrivate': class file for org.apache.hadoop.classification.InterfaceAudience not found
1 warning
hadoop@ubuntu:~/opt/code$ ls
WordCount.class WordCount.java WordCount$MapClass.class WordCount$Reduce.class

接着把 .class 文件打包成 jar,才能在 Hadoop 中运行:

hadoop@ubuntu:~/opt/code$ jar -cvf WordCount.jar ./WordCount*.class

added manifest
adding: WordCount.class(in = 3363) (out= 1687)(deflated 49%)
adding: WordCount$MapClass.class(in = 1978) (out= 800)(deflated 59%)
adding: WordCount$Reduce.class(in = 1641) (out= 645)(deflated 60%)

创建HDFS所需的输入文件夹:

hadoop@ubuntu:~/opt/code$ mkdir input
hadoop@ubuntu:~/opt/code$ echo "Hello Hadoop Goodbye Hadoop" > ./input/file1
hadoop@ubuntu:~/opt/code$ echo "Hello World Bye World" > ./input/file2
hadoop@ubuntu:~/opt/code$ ls ./input
file1 file2

运行我们的wordcount程序:

hadoop@ubuntu:~$ cd ~/opt/code

hadoop@ubuntu:~/opt/code$ ~/opt/hadoop-2.6.0/bin/hadoop jar WordCount.jar org.apache.hadoop.examples.WordCount input output

程序运行完之后,检查我们的输出结果:

hadoop@ubuntu:~/opt/code$ ls ./output

part-r-00000 _SUCCESS

hadoop@ubuntu:~/opt/code$ cat ./output/part-r-00000

Bye 1
Goodbye 1
Hadoop 2
Hello 2
World 2

如果文章对您有帮助,欢迎点击下方按钮打赏作者

Comments

No comments yet.
To verify that you are human, please fill in "七"(required)