If you are reading this post, then you probably are considering using BigData or started BigData ecosystem for your huge data processing. When you say huge data, that means you may get all different kind of structured, unstructured and semi-structured data. Hive is just like your regular data warehouse appliances and you may receive files with single or double quoted values. In this article, we will see Apache Hive load quoted values CSV files and see some examples for the same.
Apache Hive Load Quoted Values CSV File
Let us say you are processing data that is generated by machine for example, you are loading SS7 switch data. The file you receive will have quoted (single or double quotes) values. Now the question is, how do you handle those single or double quoted values when you load that data to Hive table? The good news is, Hive version 0.14 and later supports open-CSV SerDes. You can use this to define the properties of your data values in flat file.
Below is the usage of Hive Open-CSV SerDes:
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = "\t", "quoteChar" = "'", "escapeChar" = "\\" )
Use above syntax while creating your table in Hive and load different types of quoted values flat files.
Apache Hive Load Quoted Values CSV File Examples
First create Hive table with open-CSV SerDe option as ROW FORMAT:
create table test_quoted_value (a string,b string,c string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar"="\"", "escapeChar"="\\" ) STORED AS TEXTFILE;
Below is the sample content of the file that we are going to the Hive table that is created in previous steps:
[cloudera@quickstart ~]$ more /home/cloudera/csv_embeded_comma.csv "1","abc,bcd","aaa,fff,rrr" "2","frwf,asf","grg,fgr,htf" "3","tgfd,asfr","fregf,sasdf" Now load quoted values CSV file into Hive table using Hive LOAD command: hive> LOAD DATA LOCAL INPATH '/home/cloudera/csv_embeded_comma.csv' OVERWRITE INTO TABLE test_quoted_value; Loading data to table default.test_quoted_value Table default.test_quoted_value stats: [numFiles=1, numRows=0, totalSize=87, rawDataSize=0] OK Time taken: 1.38 seconds
Verify Hive Table for Loaded Data
You can select data from table to verify loaded data:
hive> select * from test_quoted_value; OK 1 abc,bcd aaa,fff,rrr 2 frwf,asf grg,fgr,htf 3 tgfd,asfr fregf,sasdf Time taken: 0.82 seconds, Fetched: 3 row(s)
Feel free to let me know any other option to load quoted CSV values.
Also read my other post on Apache Hive
- Hadoop – Export Hive Data with Quoted Values into Flat File and Example
- Hadoop Hive WITH Clause Syntax and Examples
- Hadoop Hive Date Functions and Examples
- Commonly used Hadoop Hive Commands
- Hadoop Hive Dynamic Partition and Examples
- Hive String Functions and Examples
- Hive Join Types and Examples
- Apache Hive Insert from Select Statement and Examples
what if one of the column value has double quotes and newline characters
Hi,
OpenCSVSerde does not support embedded new line characters as of now.
Thanks