本文介绍了Impala如何支持分区?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Impala如何支持分区的概念,如果支持的话,Hive分区和Impala分区之间有什么区别?

How does Impala support the concept of partitioning and, if it supports it, what are the differences between Hive Partitioning and Impala Partitioning?

推荐答案

默认情况下,表的所有数据文件都位于单个目录中.

By default, all the data files for a table are located in a single directory.

分区是一种用于在加载过程中根据一个或多个列中的值对数据进行物理划分的技术,以加快测试这些列的查询的速度.

Partitioning is a technique for physically dividing the data during loading, based on values from one or more columns, to speed up queries that test those columns.

例如,将school_records表划分在year列上,每个不同的year值都有一个单独的数据目录,该年的所有数据都存储在该目录中的数据文件中.包含WHERE条件(例如YEAR = 1966,YEAR IN(1989,1999)或YEAR BETWEEN 1984 AND 1989)的查询只能检查来自适当目录的数据文件,从而大大减少了读取和测试的数据量.

For example, with a school_records table partitioned on a year column, there is a separate data directory for each different year value, and all the data for that year is stored in a data file in that directory. A query that includes a WHERE condition such as YEAR=1966, YEAR IN (1989,1999), or YEAR BETWEEN 1984 AND 1989 can examine only the data files from the appropriate directory or directories, greatly reducing the amount of data to read and test.

静态分区和动态分区

在SQL语句中指定所有分区列称为 静态分区" ,因为该语句会影响单个可预测的分区.例如,您将静态分区与仅影响一个分区的 ALTER TABLE 语句一起使用,或与将所有值插入同一分区的 INSERT 语句一起使用:

Specifying all the partition columns in a SQL statement is called "static partitioning" ,because the statement affects a single predictable partition. For example, you use static partitioning with an ALTER TABLE statement that affects only one partition, or with an INSERT statement that inserts all values into the same partition:

当您在 INSERT 语句中指定一些分区键列,但忽略这些值时,Impala将确定要插入的分区.这种技术称为 动态分区" :

When you specify some partition key columns in an INSERT statement, but leave out the values, Impala determines which partition to insert This technique is called "dynamic partitioning":

根据需要根据可变的年,月和日创建新分区;插入一个值.

Create new partition if necessary based on variable year, month, and day; insert a single value.

如有必要,请为指定的年和月创建新分区,但日期可变;插入一个值.

Create new partition if necessary for specified year and month but variable day; insert a single value.

PARTITION 子句中指定的键列越多,则SELECT列表中所需的列越少. SELECT列表中的尾随列将替换为没有指定值的分区键列.

The more key columns you specify in the PARTITION clause, the fewer columns you need in the SELECT list. The trailing columns in the SELECT list are substituted in order for the partition key columns with no specified value.

您可以参考此链接以供进一步阅读.

You may refer to this link for further reading.

希望有帮助!

这篇关于Impala如何支持分区?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-11 06:54