If you are migrating from traditional database to Cloudera Impala then you might have noticed there is not sequence number function. In the process of Cloudera Impala Generate Sequence Numbers without UDF, you can use analytical function that are available in Cloudera Impala.
If you want generate sequential sequences that automatically keep in sync with your table sequence number, you can do so with the help of Cloudera impala supported ROW_NUMBER analytical function.
Related reading:
- Impala Conditional Functions
- An Introduction to Cloudera Hadoop Impala Architecture
- Commonly used Impala shell Command Line Options
Cloudera Impala Generate Sequence Numbers Example
Here is the example that will help you to generate Sequence number without using any UDF:
[quickstart.cloudera:21000:21000] > select tab2.max_seq + 1 + row_number() over (order by tab1.id) seq_num, tab1.* from test tab1 inner join (select max(id) max_seq from test) tab2; Query: select tab2.max_seq + 1 + row_number() over (order by tab1.id) seq_num, tab1.* from test tab1 inner join (select max(id) max_seq from test) tab2 +---------+----+------+ | seq_num | id | name | +---------+----+------+ | 10 | 1 | abc | | 11 | 2 | bcd | | 12 | 3 | cde | | 13 | 4 | def | | 14 | 5 | aaa | | 15 | 6 | bbb | | 16 | 7 | ccc | | 17 | 8 | ddd | +---------+----+------+ Fetched 8 row(s) in 1.08s [quickstart.cloudera:21000:21000] >
Returns an ascending sequence of integers stating from the max of the id loaded in the table. The ORDER BY clause is required. The PARTITION BY clause is optional. The window clause is not allowed. We have not used the PARTITION BY clause here as we are partitioning the all records from table in single partition.
Please be advised that this method of sequence number generation using Cloudera Impala analytics function wouldn’t be safe if you are running multiple insert concurrently on the table. You may end up generating duplicate sequence numbers if multiple insert statements on same table.
This is not ideal as in distributed environment it will create duplicate values