Wu Yudong's Blog

原创文章，转载请注明： 转载自工学1号馆

这篇文将将继续介绍Hadoop作业的提交，主要剖析执行shell命令的内部机制

假设用户采用java语言编写了一个MapReduce程序，并将其打包成XXX.jar，然后通过以下命令提交作业：

$HADOOP_HOME/bin/hadoop jar example.jar \
-D mapred.job.name=”example” \
-D mapred.reduce.tasks=2 \
-files=blacklist.txt,whitelist.txt \
-libjars=third-party.jar \
-archives=dictionary.zip \
-input /test/input \
-output /test/output

当用户输入以上命令后，bin/hadoop脚本根据“jar”命令将作业交给RunJar类处理，相关代码如下：

……
elif [ "$COMMAND" = "jar" ] ; then
CLASS=org.apache.hadoop.util.RunJar
……

下面来详细剖析RunJar类源代码：

RunJar类中只有一个函数：unJar，作用是将一个jar包解压缩成一个目录

public static void unJar(File jarFile, File toDir) throws IOException {
    JarFile jar = new JarFile(jarFile);
    try {
      Enumeration entries = jar.entries();
      while (entries.hasMoreElements()) {
        JarEntry entry = (JarEntry)entries.nextElement();
        if (!entry.isDirectory()) {
          InputStream in = jar.getInputStream(entry);
          try {
            File file = new File(toDir, entry.getName());
            if (!file.getParentFile().mkdirs()) {
              if (!file.getParentFile().isDirectory()) {
                throw new IOException("Mkdirs failed to create " + 
                                      file.getParentFile().toString());
              }
            }
            OutputStream out = new FileOutputStream(file);
            try {
              byte[] buffer = new byte[8192];
              int i;
              while ((i = in.read(buffer)) != -1) {
                out.write(buffer, 0, i);
              }
            } finally {
              out.close();
            }
          } finally {
            in.close();
          }
        }
      }
    } finally {
      jar.close();
    }
}

需了解的知识：

JarFile类：用于从任何可以使用 java.io.RandomAccessFile 打开的文件中读取 jar 文件的内容。它扩展了 java.util.zip.ZipFile 类，使之支持读取可选的 Manifest 条目。Manifest 可用于指定关于 jar 文件及其条目的元信息。

public Enumeration<JarEntry> entries()：返回 zip 文件条目的枚举。

JarEntry类：此类用于表示 JAR 文件条目。 理论了半天，来做个实验，在eclipse上建立一个java类来模拟一下。由于使用的都是java库文件，所以不需要配置hadoop运行环境，十分方便

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.Enumeration;
import java.util.jar.*;

public class UnjarTest {
	 public static void unJar(File jarFile, File toDir) throws IOException {
		    JarFile jar = new JarFile(jarFile); //创建一个jar包文件对象
            int j = 0;
		    try {
		      Enumeration entries = jar.entries();  //返回 zip 文件条目的枚举
		      while (entries.hasMoreElements()) {  //条目枚举对象非空
		        JarEntry entry = (JarEntry)entries.nextElement(); //将条目的枚举强制转换为Jar文件条目
		        if (!entry.isDirectory()) {
		          InputStream in = jar.getInputStream(entry);
		          try {
		            File file = new File(toDir, entry.getName());
		            if (!file.getParentFile().mkdirs()) {
		              if (!file.getParentFile().isDirectory()) {
		                throw new IOException("Mkdirs failed to create " + 
		                                      file.getParentFile().toString());
		              }
		            }
		            OutputStream out = new FileOutputStream(file);
		            try {
		              byte[] buffer = new byte[8192];

		              int i;
		              while ((i = in.read(buffer)) != -1) {
		                out.write(buffer, 0, i);
		             //   System.out.println(i);
		                j++;
		              }
		             // System.out.println("------------------------");
		              System.out.println(buffer);
		            } finally {
		              out.close();
		            }
		          } finally {
		            in.close();
		          }
		        }//endif
		      }
		    } finally {
		    	System.out.println(j);
		      jar.close();
		    }
		  }
	
	public static void main(String[] args) {
		File fromFile = new File("F:\\workspace\\jar\\hadoop-examples-1.0.1.jar");
		File toFile = new File("F:\\workspace\\jar");
		try {
			unJar(fromFile, toFile);
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}
}

我这里使用的是hadoop自带的examples的jar包来做实验，输入输出目录指定后，即可运行程序，结果如预期的一样，将hadoop-examples-1.0.1.jar包解压缩到指定的目录

RunJar类中的main函数解压jar包和设置环境变量，将运行参数传递给MapReduce程序，并运行之。

Hadoop作业提交深度剖析2–执行shell命令

Comments