本文介绍了蜂巢中Cluster BY和CLUSTERED BY之间的区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道蜂巢中Cluster By和CLUSTERED BY之间的主要区别是什么.

I would like to know what is the major difference between Cluster By and CLUSTERED BY in hive.

集群用于对表进行存储.它将使用哈希函数.

Cluster By used for bucketing the table. And it will use the Hash function.

CLUSTERED BY用于在减速器中按值排序.

CLUSTERED BY used for order by value with in the reducer.

两者之间是否还有其他区别.

is there any other difference between.

请让我知道

谢谢

venkatbala.

venkatbala.

推荐答案

聚类"仅将您的密钥分配到不同的存储桶中,聚类"确保N个reducer均获得不重叠的范围,然后按这些范围进行排序减速器.主要区别在于排序.

"clustered by" only distributes your keys into different buckets, "cluster by" ensures each of N reducers gets non-overlapping ranges, then sorts by those ranges at the reducers. The major difference is about sorting.

这篇关于蜂巢中Cluster BY和CLUSTERED BY之间的区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-01 08:28