Apache Hive Data Types and Best Practices

  • Post author:
  • Post last modified:July 17, 2019
  • Post category:BigData
  • Reading time:3 mins read

In general, data type is an attribute that specifies type of data that is going to be stored in that specific column. Each column, variable and expression has related data type associated with its column in SQL and HiveQL. However, data type names are not consistent across all databases. Hive supports almost all data types that relational database supports. In this article, we will check Apache Hive data types and Best practices.

When you issue Apache Hive create table command in the Hadoop environment, each column in a table structure must have name and a data type associated with it.

Apache Hive Data Types

Below is the list of data types available in Hive.

Related reading:

Here is the Hive CREATE TABLE example having all the supported Apache Hive data types:

https://gist.github.com/75e73b7e7836e236fa69992fd150fcf8

Apart from above data types, Hive also supports below complex data types as well:

  • array type: ARRAY<data_type>
  • map type: MAP<primitive_type, data_type>
  • structs type: STRUCT<col_name : data_type [COMMENT col_comment], …>
  • union type: UNIONTYPE<data_type, data_type, …>
  • Interval Data Types

Restrictions: Hive Data Type Maximum Lengths

Below are some of LIMITS on data types in Hive:

  • Varchar types are created with a length specifier (between 1 and 65535), which defines the maximum number of characters allowed in the character string.
  • Maximum length of CHAR is fixed at 255.
  • The range of values supported for the Date type is 0000-01-01 to 9999-12-31.
  • The precision of the DECIMAL and NUMERIC type is limited to 38 digits.