问题描述
我试图用嵌套的Collection项目创建一个配置单元表。假设我有一个struct数组。
CREATE TABLE SAMPLE(
记录数组< struct< col1:string,col2 :字符串>>
)行格式定界
字段以','结尾
以'|'结尾的收集项目;
第一级,分隔符','将覆盖默认分隔符'^ A'。
第二级,分隔符'|'将覆盖默认的第二级分隔符'^ B'来分隔最外层结构(即Array)。
第三级配置单元将使用默认的第三级分隔符'^ C'作为Struct的分隔符
现在我的问题是我怎么能定义一个第二级别的分隔符(即Struct),因为'^ C'字符很难读取和生成。
有什么方法可以显式定义分隔符代替^ C?
预先感谢。
尝试像这样:
CREATE TABLE SAMPLE(
id BIGINT,
记录数组< struct< col1:字符串,col2:字符串>>
)以格式分隔的
字段以','结尾
以'|'结尾的集合项目|
映射键封端的由:;
现在您的文本文件中的数据如下所示:
1345653,110909316904:1341894546 | 221065796761:1341887508
从SAMPLE中选择record.col1;
I am trying to create a hive table with nested Collection items. Suppose I have an array of struct.
CREATE TABLE SAMPLE(
record array<struct<col1:string,col2:string>>
)row format delimited
fields terminated by ','
collection items terminated by '|';
First level, the separator ',' will override the default delimiter '^A'.
Second level, the separator '|' will override the default second level delimiter '^B' to separate out the outer most structure (i.e. Array).
Third level hive will use the default third level delimiter '^C' as the separator for the Struct
Now my question is how can I define a separator for the second level (i.e. Struct), because '^C' character is hard to read as well as to generate.
Is there any way to explicitly define the separator instead of ^C ?
Thanks in advance.
Try something like this:
CREATE TABLE SAMPLE(
id BIGINT,
record array<struct<col1:string,col2:string>>
)row format delimited
fields terminated by ','
collection items terminated by '|'
map keys terminated by ':';
Now you data in text file will look like this:
1345653,110909316904:1341894546|221065796761:1341887508
You can then query it like :
select record.col1 from SAMPLE;
这篇关于如何在Hive中定义嵌套的收集项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!