本文介绍了如何在Hive中定义嵌套的收集项目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用嵌套的Collection项目创建一个配置单元表。假设我有一个struct数组。

  CREATE TABLE SAMPLE(
记录数组< struct< col1:string,col2 :字符串>>
)行格式定界
字段以','结尾
以'|'结尾的收集项目;

第一级,分隔符','将覆盖默认分隔符'^ A'。



第二级,分隔符'|'将覆盖默认的第二级分隔符'^ B'来分隔最外层结构(即Array)。

第三级配置单元将使用默认的第三级分隔符'^ C'作为Struct的分隔符

现在我的问题是我怎么能定义一个第二级别的分隔符(即Struct),因为'^ C'字符很难读取和生成。



有什么方法可以显式定义分隔符代替^ C?



预先感谢。

解决方案

尝试像这样:

  CREATE TABLE SAMPLE(
id BIGINT,
记录数组< struct< col1:字符串,col2:字符串>>
)以格式分隔的
字段以','结尾
以'|'结尾的集合项目|
映射键封端的由:;

现在您的文本文件中的数据如下所示:

  1345653,110909316904:1341894546 | 221065796761:1341887508 





 从SAMPLE中选择record.col1; 


I am trying to create a hive table with nested Collection items. Suppose I have an array of struct.

    CREATE TABLE SAMPLE(
    record array<struct<col1:string,col2:string>>
    )row format delimited
    fields terminated by ','
    collection items terminated by '|';

First level, the separator ',' will override the default delimiter '^A'.

Second level, the separator '|' will override the default second level delimiter '^B' to separate out the outer most structure (i.e. Array).

Third level hive will use the default third level delimiter '^C' as the separator for the Struct

Now my question is how can I define a separator for the second level (i.e. Struct), because '^C' character is hard to read as well as to generate.

Is there any way to explicitly define the separator instead of ^C ?

Thanks in advance.

解决方案

Try something like this:

CREATE TABLE SAMPLE(
id BIGINT,
record array<struct<col1:string,col2:string>>
)row format delimited
fields terminated by ','
collection items terminated by '|'
map keys terminated by ':';

Now you data in text file will look like this:

1345653,110909316904:1341894546|221065796761:1341887508

You can then query it like :

select record.col1 from SAMPLE;

这篇关于如何在Hive中定义嵌套的收集项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-23 01:18