本文介绍了map减少输出文件:part-r- *和part- *的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些关于缩小输出零件文件的问题。


    1> map缩减输出中的part-r- *文件和part- *文件有什么不同? part-r- *从mapper输出,part- *从reducer输出


    2>如果reducer没有产生任何结果,mapper输出将会停留或将被删除?

解决方案

通常,part-r- *来自reducer。 MultipleOutputs 允许您使用不同的命名约定。如果没有减少步骤,输出将是部分m- *。据我了解,如果定义了reducer,不管reducer是否产生任何东西,mapper输出都会被删除。通常还会生成reducer输出文件,即使它们是空的,除非使用 LazyOutputFormat 。你在哪里找到没有以m-nnnnn或r-nnnnn结尾的part- *文件?


I have some questions about map reduce output part files.

    1> What are the differences between part-r-* files and part-* files in map reduce output? part-r-* is output from mapper and part-* is from reducer?
    2> If reducer doesn't produce any results, mapper output will be staying or will be deleted?

解决方案

Normally, part-r-* comes from the reducer. MultipleOutputs allows you to use a different naming convention. If there is no reduce step, the output will be part-m-*. As I understand it, if there is a reducer defined, the mapper outputs are deleted regardless of if the reducers produce anything. Usually the reducer output files will be produced as well even if they are empty, unless you use LazyOutputFormat. Where did you find part-* files that did not end with either m-nnnnn or r-nnnnn ?

这篇关于map减少输出文件:part-r- *和part- *的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-20 08:19