The lateral column alias in Databricks allows users to reuse an expression specified earlier in the same SELECT list, eliminating the need to use nested subqueries and Common Table Expressions (CTEs) in many cases. This blog post discusses the use cases of the feature and the benefits it brings to Spark and Databricks users.
Page Contents
Introduction
Databricks has introduced a much needed lateral column alias reference feature. This feature enhancement allows you to reference a column alias within the same query immediately after it is created. This can significantly improve the readability and efficiency of your complex SQL queries.
What is Lateral Column Alias in Databricks?
Lateral Column Alias in Databricks provides users the capability to reuse an expression specified earlier within the same SELECT list.
Let’s explore the concept of the lateral column alias through an example.
select clicks / impressions as probability,
round(100 * probability, 1) as percentage
from some_data;
With the Lateral Coolumn alias (LCA) feature, you can define the alias probability
and use it within the same select statement: round(100 * probability, 1)
.
Benefits of using Lateral Column Alias in Databricks SQL
Here are some of the benefits of lateral column aliases in Databricks:
- Enhanced Readability: By using lateral column aliases, you can make your queries more readable and easier to understand. This is particularly useful in complex queries where the same expression might be used multiple times.
- Increased Efficiency: It reduces the need to repeat expressions, which can simplify query writing and maintenance. This optimizes the query performance in Databricks.
Alternative Methods to Lateral Column Alias in Databricks SQL
Before the introduction of lateral column alias, users had to deal with multiple subqueries and CTEs when trying to reference any column defined by a previous alias. This increased the complexity and readability of SQL queries, making them hard to read, write and maintain.
There are two methods that you can use as an alternative methods to lateral column alias in Databricks SQL.
- Use a Common Table Expressions (CTE) to Reference Previous Alias
- Use a Subquery to Derive Column to Reference Previous Alias
Use a Common Table Expressions (CTE) to Reference Previous Alias
Using common table expression is one of the easiest method that you can use to reference previous alias in the same SELECT.
For example, consider below query that compute derive column.
WITH cte1
AS (SELECT col1,
col1 * 10 AS der_col2
FROM sample_table1)
SELECT a.col1,
b.der_col2,
b.der_col2 * 100 AS der_col3
FROM sample_table1 a
JOIN cte1 b
ON( a.col1 = b.col1 );
col1 | der_col2 | der_col3
------+----------+----------
1 | 10 | 1000
10 | 100 | 10000
20 | 200 | 20000
(3 rows)
As you can see, der_col2 is derived from the col1 and der_col3 is derived by referencing der_col2 .
Use a Subquery to Derive Column to Reference Previous Alias
Another simple method is to use subquery to reference a column and reuse that column in upper SELECT statement.
For example, consider below query with subquery.
SELECT sub.col1,
sub.der_col2,
sub.der_col2 * 100 AS der_col3
FROM
(
SELECT col1,
col1 * 10 AS der_col2
FROM sample_table1
) as sub;
col1 | der_col2 | der_col3
------+----------+----------
1 | 10 | 1000
10 | 100 | 10000
20 | 200 | 20000
(3 rows)
As you can see, der_col2 is derived from the col1 in subquery and der_col3 is derived by referring der_col2 in upper SELECT clause.
Conclusion
The new lateral column aliasing feature in Databricks makes SQL queries more efficient and easier to handle. It helps you write cleaner, more maintainable and readable code, saving time and reducing mistakes. If you can’t use lateral column aliasing, you can still use CTEs (Common Table Expressions) and databricks SQL subqueries to achieve similar results.
Hope this helps 🙂