Apache Hive Load Quoted Values CSV File and Examples

  • Post author:
  • Post last modified:September 25, 2019
  • Post category:BigData
  • Reading time:4 mins read

If you are reading this post, then you probably are considering using BigData or started BigData ecosystem for your huge data processing. When you say huge data, that means you may get all different kind of structured, unstructured and semi-structured data. Hive is just like your regular data warehouse appliances and you may receive files with single or double quoted values. In this article, we will see Apache Hive load quoted values CSV files and see some examples for the same.

Apache Hive Load Quoted Values CSV File

Apache Hive Load Quoted Values CSV File

Let us say you are processing data that is generated by machine for example, you are loading SS7 switch data. The file you receive will have quoted (single or double quotes) values. Now the question is, how do you handle those single or double quoted values when you load that data to Hive table? The good news is, Hive version 0.14 and later supports open-CSV SerDes. You can use this to define the properties of your data values in flat file.

Below is the usage of Hive Open-CSV SerDes:

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = "\t",
"quoteChar" = "'",
"escapeChar" = "\\" 
)

Use above syntax while creating your table in Hive and load different types of quoted values flat files.

Apache Hive Load Quoted Values CSV File Examples

First create Hive table with open-CSV SerDe option as ROW FORMAT:

create table test_quoted_value
(a string,b string,c string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = ",",
"quoteChar"="\"",
"escapeChar"="\\"
)
STORED AS TEXTFILE;

Below is the sample content of the file that we are going to the Hive table that is created in previous steps:

[cloudera@quickstart ~]$ more /home/cloudera/csv_embeded_comma.csv
"1","abc,bcd","aaa,fff,rrr"
"2","frwf,asf","grg,fgr,htf"
"3","tgfd,asfr","fregf,sasdf"
Now load quoted values CSV file into Hive table using Hive LOAD command:

hive> LOAD DATA LOCAL INPATH '/home/cloudera/csv_embeded_comma.csv' OVERWRITE INTO TABLE test_quoted_value;
Loading data to table default.test_quoted_value
Table default.test_quoted_value stats: [numFiles=1, numRows=0, totalSize=87, rawDataSize=0]
OK
Time taken: 1.38 seconds

Verify Hive Table for Loaded Data

You can select data from table to verify loaded data:

hive> select * from test_quoted_value;
OK
1 abc,bcd aaa,fff,rrr
2 frwf,asf grg,fgr,htf
3 tgfd,asfr fregf,sasdf
Time taken: 0.82 seconds, Fetched: 3 row(s)

Feel free to let me know any other option to load quoted CSV values.

Also read my other post on Apache Hive

This Post Has 2 Comments

  1. shikhar

    what if one of the column value has double quotes and newline characters

    1. Vithal S

      Hi,

      OpenCSVSerde does not support embedded new line characters as of now.

      Thanks

Comments are closed.