本文介绍了Impala GROUP BY分区列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!



让我们说我的表有四列:A,B,C,D. A和D的值相等,表按A列划分.

Lets say I have table with four columns : A,B,C,D. Values of A and D are equal, table is partitioned by column A.

明智的性能,如果我发出此查询,会有所不同吗?按A选择SUM(B)GROUP;或这一个:SELECT SUM(B)GROUP BY D;

Performance wise, would it make any difference if I issue this querySELECT SUM(B) GROUP BY A ;or this one :SELECT SUM(B) GROUP BY D ;

我要问的是,通过在分区列上使用GROUP BY可以提高性能吗?

In different words I'm asking, is there any performance gain by using the GROUP BY on partitioned column ?




Usually there are performance gains if you use the partitioned columns on a filter (WHERE clause in your SQL)


since both queries use a "full table scan" it should not have a lot of difference between both queries. You might see a difference if theres is a lot of partitions (Like around 50K), with tends to degrade the query performance, but that is not usually the case.

这篇关于Impala GROUP BY分区列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 05:14