搭建Hadoop的Pseudo-Distributed Mode环境(二) — 运行一个Map/Reduce任务


上一篇文章里我们说了怎么搭建Pseudo环境本身,现在应该在这个环境上真正跑一下MapReduce任务了。

关于MapReduce程序,我们可以直接用
这里贴的代码,有些地方要改一下:

将输入、输出文件改成hdfs://localhost/…

//Coupon11LogJobMain 
String inputFile = "hdfs://localhost/home/kent/coupon11/coupon11.log";  
String outDir = "hdfs://localhost/home/kent/coupon11/output" + System.currentTimeMillis(); 

当然,你要把输入文件复制到hdfs系统中

$hadoop dfs -copyFromLocal /home/kent/coupon11/coupon11.log hdfs://localhost/home/kent/coupon11/coupon11.log

配置maven assembly插件,以便把程序以及所有lib打包成一个单个的jar文件

<!-- pom.xml -->
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-assembly-plugin</artifactId>
				<version>2.3</version>
				<configuration>
					<descriptorRefs>
						<descriptorRef>jar-with-dependencies</descriptorRef>
					</descriptorRefs>
				</configuration>
			</plugin>

编译打包

$mvn assembly:assembly

将jar文件提交给MapReduce执行

$hadoop jar coupon11logs-1.0-SNAPSHOT-jar-with-dependencies.jar coupon11log.Coupon11LogJobMain

最后,从web控制台监控job/task的状态:
http://localhost:50030/

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.