Hive Pivot Table-Transpose Rows to Column and Example

  • Post author:
  • Post last modified:September 25, 2020
  • Post category:BigData
  • Reading time:6 mins read

Many relational databases such as Oracle, Snowflake support PIVOT function which you can use to convert row to column. But, Apache Hive does not support Pivot function yet. As an alternative method, you can use CASE and DECODE statements to convert table rows to column, or columns to rows as per your requirements. In this article, we will check different methods to transpose Hive table using a Pivot function alternative method with some examples.

Hive Pivot Table-Transpose Rows to Column and Example, Hive UNPIVOT Table - Transpose Columns to Rows Example

Hive Pivot Table

Creating pivot table is a relatively common task in a data warehouse. You may have to convert rows to column, or column to rows before loading into the target table. Microsoft excel is a popular tool that allows you to pivot columns as per your requirement.Apache Hive does not support Pivot or unpivot function yet. Implementing same thing in the Hive is not an easy task. You have to use a workaround to transpose rows to column and vice versa.

However, you can use CASE or DECODE statements in the Hive to transpose rows to column.

Transpose Rows to Column Example

As mentioned earlier, transpose rows to column is common pre-processing requirement before loading into the target table.

Test Data

We will use the following student table for this example.

> select class,section,pass_fail,count from students;
+---------+----------+------------+--------+--+
|  class  | section  | pass_fail  | count  |
+---------+----------+------------+--------+--+
| CLASS2  | B        | FAIL       | 333    |
| CLASS2  | B        | PASS       | 543    |
| CLASS1  | A        | FAIL       | 123    |
| CLASS1  | A        | PASS       | 321    |
+---------+----------+------------+--------+--+

Consider following example to convert the PASS and FAIL rows as a column.

SELECT class, 
       section, 
       Sum(CASE 
             WHEN pass_fail = 'PASS' THEN count 
             ELSE 0 
           END) AS PASS, 
       Sum(CASE 
             WHEN pass_fail = 'FAIL' THEN count 
             ELSE 0 
           END) AS FAIL 
FROM   students 
GROUP  BY class, 
          section; 

+---------+----------+-------+-------+--+
|  class  | section  | pass  | fail  |
+---------+----------+-------+-------+--+
| CLASS1  | A        | 321   | 123   |
| CLASS2  | B        | 543   | 333   |
+---------+----------+-------+-------+--+

Hive UNPIVOT Table – Transpose Columns to Rows Example

Just like the previous example, Pivot or transpose columns to rows is also a very common requirement in a data warehouse environment.

For example, consider below an example of sample sales table. The table contains monthly sales quantity.

> select month_id,sale_qty from sales_by_month;
+-----------+-----------+--+
| month_id  | sale_qty  |
+-----------+-----------+--+
| 201601    | 897456    |
| 201603    | 267156    |
| 201602    | 232253    |
| 201604    | 265646    |
| 201606    | 265654    |
| 201606    | 123457    |
+-----------+-----------+--+

Now, let us transpose month values as columns and quantity as a row.

SELECT Sum(CASE 
             WHEN month_id = 201601 THEN sale_qty 
             ELSE NULL 
           END) AS m_201601, 
       Sum(CASE 
             WHEN month_id = 201602 THEN sale_qty 
             ELSE NULL 
           END) AS m_201602, 
       Sum(CASE 
             WHEN month_id = 201603 THEN sale_qty 
             ELSE NULL 
           END) AS m_201603, 
       Sum(CASE 
             WHEN month_id = 201604 THEN sale_qty 
             ELSE NULL 
           END) AS m_201604, 
       Sum(CASE 
             WHEN month_id = 201605 THEN sale_qty 
             ELSE NULL 
           END) AS m_201605, 
       Sum(CASE 
             WHEN month_id = 201606 THEN sale_qty 
             ELSE NULL 
           END) AS m_201606 
FROM   sales_by_month; 

+---------+---------+---------+---------+-------+---------+--+
|  m_01   |  m_02   |  m_03   |  m_04   | m_05  |  m_06   |
+---------+---------+---------+---------+-------+---------+--+
| 897456  | 232253  | 267156  | 265646  | NULL  | 389111  |
+---------+---------+---------+---------+-------+---------+--+

Oracle PIVOT FOR Alternative in Apache Hive

The relational databases such as Oracle and Snowflake provide PIVOT function with FOR clause.

You can use a CASE statement as an alternative method.

Following is the example of an Oracle PIVOT function with FOR clause.

SELECT * FROM order_stats
PIVOT(
    COUNT(order_id)
    FOR category_name
    IN ( 
        'CPU',
        'VC',
        'MB',
        'Storage'
    )
)
ORDER BY status;

+------------+------+-----+-----+----------+--+
|   status   | cpu  | vc  | mb  | storage  |
+------------+------+-----+-----+----------+--+
| cancelled  | 2    | 0   | 0   | 0        |
| pending    | 0    | 1   | 0   | 0        |
| shipped    | 1    | 1   | 1   | 1        |
+------------+------+-----+-----+----------+--+

And following is the Apache Hive equivalent example.

select status,
count( case when category_name = 'CPU' then order_id else null end ) as CPU,
count( case when category_name = 'VC' then order_id else null end ) as VC,
count( case when category_name = 'MB' then order_id else null end ) as MB,
count( case when category_name = 'Storage' then order_id else null end ) as Storage
from order_stats
group by status
order by status
;

+------------+------+-----+-----+----------+--+
|   status   | cpu  | vc  | mb  | storage  |
+------------+------+-----+-----+----------+--+
| cancelled  | 2    | 0   | 0   | 0        |
| pending    | 0    | 1   | 0   | 0        |
| shipped    | 1    | 1   | 1   | 1        |
+------------+------+-----+-----+----------+--+

Hope this helps 🙂