[Hadoop]MultiInput和MultiOutput

摘自象书

一个Job里可以从多个同质或异质的输入源读取数据,并使用各自的Mapper

MultipleInputs.addInputPath(conf, ncdcInputPath,
    TextInputFormat.class, MaxTemperatureMapper.class)
MultipleInputs.addInputPath(conf, metOfficeInputPath,
    TextInputFormat.class, MetOfficeMaxTemperatureMapper.class);

MultiOutputFormat可以让你按一定规则指定、分隔reduce output的文件名,如

...
static class StationNameMultipleTextOutputFormat
    extends MultipleTextOutputFormat<NullWritable, Text> {
    
    private NcdcRecordParser parser = new NcdcRecordParser();
    
    protected String generateFileNameForKeyValue(NullWritable key, Text value,
        String name) {
      parser.parse(value);
      return parser.getStationId();
    }
  }
...

另有MultiOutputs类,在此不表

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.