Many relational databases such as Oracle, Snowflake support PIVOT function which you can use to convert row to column. But, Apache Hive does not support Pivot function yet. As an alternative method, you can use CASE and DECODE statements to convert table rows to column, or columns to rows as per your requirements. In this article, we will check different methods to transpose Hive table using a Pivot function alternative method with some examples.
Hive Pivot Table
Creating pivot table is a relatively common task in a data warehouse. You may have to convert rows to column, or column to rows before loading into the target table. Microsoft excel is a popular tool that allows you to pivot columns as per your requirement.Apache Hive does not support Pivot or unpivot function yet. Implementing same thing in the Hive is not an easy task. You have to use a workaround to transpose rows to column and vice versa.
However, you can use CASE or DECODE statements in the Hive to transpose rows to column.
Transpose Rows to Column Example
As mentioned earlier, transpose rows to column is common pre-processing requirement before loading into the target table.
Test Data
We will use the following student table for this example.
> select class,section,pass_fail,count from students;
+---------+----------+------------+--------+--+
| class | section | pass_fail | count |
+---------+----------+------------+--------+--+
| CLASS2 | B | FAIL | 333 |
| CLASS2 | B | PASS | 543 |
| CLASS1 | A | FAIL | 123 |
| CLASS1 | A | PASS | 321 |
+---------+----------+------------+--------+--+
Consider following example to convert the PASS and FAIL rows as a column.
SELECT class,
section,
Sum(CASE
WHEN pass_fail = 'PASS' THEN count
ELSE 0
END) AS PASS,
Sum(CASE
WHEN pass_fail = 'FAIL' THEN count
ELSE 0
END) AS FAIL
FROM students
GROUP BY class,
section;
+---------+----------+-------+-------+--+
| class | section | pass | fail |
+---------+----------+-------+-------+--+
| CLASS1 | A | 321 | 123 |
| CLASS2 | B | 543 | 333 |
+---------+----------+-------+-------+--+
Hive UNPIVOT Table – Transpose Columns to Rows Example
Just like the previous example, Pivot or transpose columns to rows is also a very common requirement in a data warehouse environment.
For example, consider below an example of sample sales table. The table contains monthly sales quantity.
> select month_id,sale_qty from sales_by_month;
+-----------+-----------+--+
| month_id | sale_qty |
+-----------+-----------+--+
| 201601 | 897456 |
| 201603 | 267156 |
| 201602 | 232253 |
| 201604 | 265646 |
| 201606 | 265654 |
| 201606 | 123457 |
+-----------+-----------+--+
Now, let us transpose month values as columns and quantity as a row.
SELECT Sum(CASE
WHEN month_id = 201601 THEN sale_qty
ELSE NULL
END) AS m_201601,
Sum(CASE
WHEN month_id = 201602 THEN sale_qty
ELSE NULL
END) AS m_201602,
Sum(CASE
WHEN month_id = 201603 THEN sale_qty
ELSE NULL
END) AS m_201603,
Sum(CASE
WHEN month_id = 201604 THEN sale_qty
ELSE NULL
END) AS m_201604,
Sum(CASE
WHEN month_id = 201605 THEN sale_qty
ELSE NULL
END) AS m_201605,
Sum(CASE
WHEN month_id = 201606 THEN sale_qty
ELSE NULL
END) AS m_201606
FROM sales_by_month;
+---------+---------+---------+---------+-------+---------+--+
| m_01 | m_02 | m_03 | m_04 | m_05 | m_06 |
+---------+---------+---------+---------+-------+---------+--+
| 897456 | 232253 | 267156 | 265646 | NULL | 389111 |
+---------+---------+---------+---------+-------+---------+--+
Oracle PIVOT FOR Alternative in Apache Hive
The relational databases such as Oracle and Snowflake provide PIVOT function with FOR clause.
You can use a CASE statement as an alternative method.
Following is the example of an Oracle PIVOT function with FOR clause.
SELECT * FROM order_stats
PIVOT(
COUNT(order_id)
FOR category_name
IN (
'CPU',
'VC',
'MB',
'Storage'
)
)
ORDER BY status;
+------------+------+-----+-----+----------+--+
| status | cpu | vc | mb | storage |
+------------+------+-----+-----+----------+--+
| cancelled | 2 | 0 | 0 | 0 |
| pending | 0 | 1 | 0 | 0 |
| shipped | 1 | 1 | 1 | 1 |
+------------+------+-----+-----+----------+--+
And following is the Apache Hive equivalent example.
select status,
count( case when category_name = 'CPU' then order_id else null end ) as CPU,
count( case when category_name = 'VC' then order_id else null end ) as VC,
count( case when category_name = 'MB' then order_id else null end ) as MB,
count( case when category_name = 'Storage' then order_id else null end ) as Storage
from order_stats
group by status
order by status
;
+------------+------+-----+-----+----------+--+
| status | cpu | vc | mb | storage |
+------------+------+-----+-----+----------+--+
| cancelled | 2 | 0 | 0 | 0 |
| pending | 0 | 1 | 0 | 0 |
| shipped | 1 | 1 | 1 | 1 |
+------------+------+-----+-----+----------+--+
Hope this helps 🙂