本文介绍了Impala如何支持分区?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Impala如何支持分区&如果支持,Hive Partitioning& Impala分区?

解决方案

默认情况下,表的所有数据文件都位于单个目录中。

分区是基于一列或多列的值在加载过程中物理划分数据的技术,用于加速测试这些列的查询。例如,在一个年份列上分区的school_records表中,每个不同年份值都有一个单独的数据目录,并且该年的所有数据都存储在该目录中的数据文件。包含WHERE条件(例如YEAR = 1966,YEAR IN(1989,1999)或YEAR BETWEEN 1984和1989)的查询只能检查相应目录中的数据文件,从而大大减少要读取和测试的数据量。

静态和动态分区



指定SQL语句中的所有分区列称为 static partitioning ,因为该语句会影响单个可预测的分区。例如,您可以对仅影响一个分区的 ALTER TABLE 语句或使用将所有值插入到同一分区的 INSERT 语句使用静态分区:


$ b

当您在 INSERT 语句中指定某些分区键列时,忽略这些值,Impala将确定要插入的分区此技巧称为 动态分区
$ b

根据变量年,月和日创建新的分区;插入一个值。

如果需要为指定的年份和月份创建新分区,但可变日期;插入一个值。

PARTITION 子句中指定的关键字列越多,SELECT列表中需要的列越少。您可以参考以供进一步阅读。



希望有帮助!


How does Impala supports the concept of Partitioning & If it supports, what are the differences between Hive Partitioning & Impala Partitioning?

解决方案

By default, all the data files for a table are located in a single directory.

Partitioning is a technique for physically dividing the data during loading, based on values from one or more columns, to speed up queries that test those columns.

For example, with a school_records table partitioned on a year column, there is a separate data directory for each different year value, and all the data for that year is stored in a data file in that directory. A query that includes a WHERE condition such as YEAR=1966, YEAR IN (1989,1999), or YEAR BETWEEN 1984 AND 1989 can examine only the data files from the appropriate directory or directories, greatly reducing the amount of data to read and test.

Static and Dynamic Partitioning

Specifying all the partition columns in a SQL statement is called "static partitioning" ,because the statement affects a single predictable partition. For example, you use static partitioning with an ALTER TABLE statement that affects only one partition, or with an INSERT statement that inserts all values into the same partition:

When you specify some partition key columns in an INSERT statement, but leave out the values, Impala determines which partition to insert This technique is called "dynamic partitioning":

Create new partition if necessary based on variable year, month, and day; insert a single value.

Create new partition if necessary for specified year and month but variable day; insert a single value.

The more key columns you specify in the PARTITION clause, the fewer columns you need in the SELECT list. The trailing columns in the SELECT list are substituted in order for the partition key columns with no specified value.

You may refer to this link for further reading.

Hope that helps!

这篇关于Impala如何支持分区?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-11 06:54