搭建Hadoop的Pseudo-Distributed Mode环境(二) — 运行一个Map/Reduce任务

在
上一篇文章里我们说了怎么搭建Pseudo环境本身，现在应该在这个环境上真正跑一下MapReduce任务了。

关于MapReduce程序，我们可以直接用
这里贴的代码，有些地方要改一下：

将输入、输出文件改成hdfs://localhost/…

//Coupon11LogJobMain 
String inputFile = "hdfs://localhost/home/kent/coupon11/coupon11.log";  
String outDir = "hdfs://localhost/home/kent/coupon11/output" + System.currentTimeMillis();

当然，你要把输入文件复制到hdfs系统中

$hadoop dfs -copyFromLocal /home/kent/coupon11/coupon11.log hdfs://localhost/home/kent/coupon11/coupon11.log

配置maven assembly插件，以便把程序以及所有lib打包成一个单个的jar文件

<!-- pom.xml -->
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-assembly-plugin</artifactId>
				<version>2.3</version>
				<configuration>
					<descriptorRefs>
						<descriptorRef>jar-with-dependencies</descriptorRef>
					</descriptorRefs>
				</configuration>
			</plugin>

编译打包

$mvn assembly:assembly

将jar文件提交给MapReduce执行

$hadoop jar coupon11logs-1.0-SNAPSHOT-jar-with-dependencies.jar coupon11log.Coupon11LogJobMain

最后，从web控制台监控job/task的状态：
http://localhost:50030/

Leave a Comment Cancel Reply