博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
编译hadoop版的hello,world
阅读量:6456 次
发布时间:2019-06-23

本文共 2396 字,大约阅读时间需要 7 分钟。

cd ~/srcmkdir classesjavac -classpath ~/hadoop-0.20.2/hadoop-0.20.2-core.jar WordCount.java -d classesjar -cvf WordCount.jar -C classes/ .hadoop jar WordCount.jar com.codestyle.hadoop.WordCount input outputhadoop fs -ls outputhadoop fs -cat output/part-00000

要点:

编译WordCount.java时必须通过classpath指定hadoop的库文件。指定源码输出到classes目录

打包class文件成为jar文件

通过hadoop调用jar文件执行MapReduce, 内容输出到output目录 (如果该目录存在,则要先删掉这个目录)在命令参数中必须指定包名+类名


 

WordCount.java

package com.codestyle.hadoop;import java.io.IOException;import java.util.*;import org.apache.hadoop.fs.Path;import org.apache.hadoop.conf.*;import org.apache.hadoop.io.*;import org.apache.hadoop.mapred.*;import org.apache.hadoop.util.*;public class WordCount {   public static class Map extends MapReduceBase implements Mapper
{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector
output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } } public static class Reduce extends MapReduceBase implements Reducer
{ public void reduce(Text key, Iterator
values, OutputCollector
output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); }}

查看执行结果

lishujun@lishujun-virtual-machine:~/src$ hadoop fs -cat output/part-00000Hadoop    1Hello    2World    1

 

参考资料:

 

转载于:https://www.cnblogs.com/code-style/p/3737035.html

你可能感兴趣的文章
领域驱动设计和实践
查看>>
css背景雪碧图等
查看>>
20145222《信息安全系统设计基础》第九周学习总结
查看>>
ElasticSearch 配置示例
查看>>
【原】Centos 7.3下Graylog2 安装笔记
查看>>
分页查询
查看>>
mysql 原理 ~ change buffer
查看>>
mongodb 案例 ~ 经典故障案例
查看>>
阅读博客——我们应当怎样做需求分析?
查看>>
jquery对表单操作_表单验证
查看>>
思维导图
查看>>
Android数据存储(2):Internal Storage
查看>>
Nginx反向代理以及负载均衡配置
查看>>
spring 注解模式 详解
查看>>
Ubuntu上安装Oracle JDK
查看>>
Win7无法保存共享帐户密码
查看>>
集群相关、用keepalived配置高可用集群
查看>>
C#字典类型转byte数组
查看>>
CodeTimerPerformance EasyPerformanceCountersHelper .NET v3.5
查看>>
hpu 1267 Cafeteria (01背包)
查看>>