Apache Hive ROWNUM Pseudo Column Equivalent

  • Post author:
  • Post last modified:February 28, 2018
  • Post category:BigData
  • Reading time:2 mins read

Hive is batch processing engine, you cannot use it as a transaction system. Sometimes you may need to generate sequence row number for document use. ROWNUM is sometime useful when you are working with multi-level SQL queries. There is no ROWNUM pseudo column in Apache Hive. In this article, we will check Hive ROWNUM pseudo column equivalent.

Apache Hive ROWNUM Pseudo Column Equivalent

If you are coming from Oracle or traditional database background, you will find it difficult in Hive without ROWNUM pseudo column. The one possible solution to this is ROW_NUMBER() analytical function as Hive ROWNUM pseudo column equivalent.

Hive also support a LIMIT clause to restrict the output. There is no identity or sequence available in Apache Hive.

ROW_NUMBER as a Apache HIve ROWNUM Pseudo Column Equivalent

You can use the Hive ROW_NUMBER analytical function to produce the sequential row numbers:

Read:

For Example;

hive> SELECT ROW_NUMBER() OVER( ORDER BY ID) as ROWNUM, ID, NAME FROM sample_test_tab; 
rownum id name
1 1 AAA
2 2 BBB
3 3 CCC
4 4 DDD
5 5 EEE
6 6 FFF
Time taken: 4.047 seconds, Fetched: 6 row(s)

Do not provide any PARTITION BY clause as you will be considering all records as single partition for ROW_NUMBER function.

The above solution will work perfectly but the only issue is ROWNUM order is determined by the ORDER BY clause provide in OVER clause so try to give a unique value.