本文介绍了为什么仅在蜂巢中执行地图操作会在单个输出文件中产生结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

执行以下查询时,尽管我有8个映射器和0个reducer,但我只得到一个文件作为输出.

When I execute the following query, I get only one file as output although I have 8 mappers and 0 reducers.

create table table_2 as select * from table_1.

8个映射器被调用,并且没有reducer阶段.table_2位置只有一个文件,不应有8个文件,因为我们有8个映射器和0个reducer.

8 mappers are invoked and there is no reducer phase. There is just only one file in the location of table_2, shouldn't there be 8 files as we have 8 mappers and 0 reducers.

推荐答案

从Hive文档中,配置属性 ...

From Hive documentation, Configuration Properties...

hive.merge.tezfiles
  默认值:false
   在Tez DAG末尾合并小文件

hive.merge.tezfiles
  Default Value: false
  Merge small files at the end of a Tez DAG

hive.merge.smallfiles.avgsize
默认值: 16000000
  作业的平均输出文件大小 小于此数字,
  Hive将启动额外的map-reduce作业 将输出文件合并成更大的文件...

hive.merge.smallfiles.avgsize
  Default Value: 16000000
  When the average output file size of a job is less than this number,
  Hive will start an additional map-reduce job to merge the output files into bigger files...

因此,如果(a)的测试数据集非常小,而(b)则不使用TEZ,而是使用简单的旧版MapReduce,那么Hive将会发布一个帖子-Map步骤默认情况下只是合并(中间)结果.

So, if (a) your test dataset is very small and (b) you don't use TEZ but plain old MapReduce, then Hive will run a post-Map step just to merge the (intermediate) results, by default.

在减少步骤之后它不会发生,除非您将hive.merge.mapredfiles强制为true.

Whereas it would not happen after a Reduce step, unless you force hive.merge.mapredfiles to true.

这篇关于为什么仅在蜂巢中执行地图操作会在单个输出文件中产生结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-01 08:04