如何在Hive中定义嵌套的收集项目

本文介绍了如何在Hive中定义嵌套的收集项目的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图用嵌套的Collection项目创建一个配置单元表。假设我有一个struct数组。

  CREATE TABLE SAMPLE（
记录数组< struct< col1：string，col2 ：字符串>> 
）行格式定界
字段以'，'结尾
以'|'结尾的收集项目;

第一级，分隔符'，'将覆盖默认分隔符'^ A'。

第二级，分隔符'|'将覆盖默认的第二级分隔符'^ B'来分隔最外层结构（即Array）。

第三级配置单元将使用默认的第三级分隔符'^ C'作为Struct的分隔符

现在我的问题是我怎么能定义一个第二级别的分隔符（即Struct），因为'^ C'字符很难读取和生成。

有什么方法可以显式定义分隔符代替^ C？

预先感谢。
解决方案
尝试像这样：

CREATE TABLE SAMPLE（ id BIGINT，记录数组< struct< col1：字符串，col2：字符串>> ）以格式分隔的字段以'，'结尾以'|'结尾的集合项目| 映射键封端的由：;
现在您的文本文件中的数据如下所示：
1345653,110909316904：1341894546 | 221065796761：1341887508

从SAMPLE中选择record.col1;

I am trying to create a hive table with nested Collection items. Suppose I have an array of struct.
CREATE TABLE SAMPLE( record array<struct<col1:string,col2:string>> )row format delimited fields terminated by ',' collection items terminated by '|';
First level, the separator ',' will override the default delimiter '^A'.
Second level, the separator '|' will override the default second level delimiter '^B' to separate out the outer most structure (i.e. Array).
Third level hive will use the default third level delimiter '^C' as the separator for the Struct
Now my question is how can I define a separator for the second level (i.e. Struct), because '^C' character is hard to read as well as to generate.
Is there any way to explicitly define the separator instead of ^C ?
Thanks in advance.
解决方案
Try something like this:
CREATE TABLE SAMPLE( id BIGINT, record array<struct<col1:string,col2:string>> )row format delimited fields terminated by ',' collection items terminated by '|' map keys terminated by ':';
Now you data in text file will look like this:
1345653,110909316904:1341894546|221065796761:1341887508
You can then query it like :
select record.col1 from SAMPLE;

这篇关于如何在Hive中定义嵌套的收集项目的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！