问题描述
我正在尝试从 csv 文件加载数据,其中的值用双引号 '"' 和制表符分隔 '' 括起来.但是当我尝试将它加载到配置单元时,它不会抛出任何错误并且加载数据时没有任何错误,但我认为所有数据都被加载到单个列中,并且大多数值显示为 NULL.下面是我的创建表语句.
I am trying to load data from a csv file in which the values are enclosed by double quotes '"' and tab separated '' .But when I try to load that into hive its not throwing any error and data is loaded without any error but I think all the data is getting loaded into a single column and most of the values it showing as NULL.below is my create table statement.
CREATE TABLE example
(
organization STRING,
order BIGINT,
created_on TIMESTAMP,
issue_date TIMESTAMP,
qty INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ' '
ESCAPED BY '"'
STORED AS TEXTFILE;
输入文件示例;-
"Organization" "Order" "Created on" "issue_date" "qty"
"GB" "111223" "2015/02/06 00:00:00" "2015/05/15 00:00:00" "5"
"UK" "1110" "2015/05/06 00:00:00" "2015/06/1 00:00:00" "51"
和Load语句将数据推送到hive表中.
and Load statement to push data into hive table.
LOAD DATA INPATH '/user/example.csv' OVERWRITE INTO TABLE example
可能是什么问题,我如何忽略文件的标题.如果我从 create 语句中删除 ESCAPED BY '"' 它在各自的列中加载,但所有值都用双引号括起来.如何从值中删除双引号并忽略文件的标题?
What could be the issue and how can I ignore header of the file.and if I remove ESCAPED BY '"' from create statement its loading in respective columns but all the values are enclosed by double quotes.How can I remove double quotes from values and ignore header of the file?
推荐答案
您现在可以使用 OpenCSVSerde 允许您定义分隔符并轻松转义周围的双引号:
You can now use OpenCSVSerde which allows you to define the separator character and easily escape surrounding double-quotes :
CREATE EXTERNAL TABLE example (
organization STRING,
order BIGINT,
created_on TIMESTAMP,
issue_date TIMESTAMP,
qty INT
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = " ",
"quoteChar" = """
)
LOCATION '/your/folder/location/';
这篇关于如何将双引号括起来并由制表符分隔的CSV数据加载到HIVE表中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!