BigQuery lateral Column alias are columns that are derived from the previously computed columns in same SELECT statement. Derived columns or lateral Column alias are virtual columns that are not physically stored in the table. Their values are re-calculated every time they are referenced in a query. Many PostgreSQL relational databases such as Netezza supports reuse of the column aliases within the same SELECT statement but GCP BigQuery does not support reuse of calculated derived columns. In this article, we will identify the alternate methods reuse column aliases in Google BigQuery
Post Content
Introduction
BigQuery is a fully-managed, cloud data warehouse offered by Google Cloud Platform (GCP). It allows users to store, analyze, and query large datasets in a scalable and cost-effective manner. With its powerful SQL-like query language and built-in machine learning capabilities, BigQuery enables organizations to derive insights from their data quickly and efficiently. Additionally, it integrates seamlessly with other GCP services, making it easy to build end-to-end data pipelines and applications. Whether you’re a data analyst, data scientist, or developer, BigQuery provides a flexible and robust platform for working with big data.
Here are some important features of BigQuery:
- Serverless architecture: BigQuery is a fully-managed, serverless data warehouse.
- Integration with other GCP services: BigQuery integrates seamlessly with other Google Cloud Platform services, including Cloud Storage, Cloud Dataproc, Cloud Dataflow, and Cloud Machine Learning Engine.
- Powerful SQL-like query language: BigQuery’s query language supports standard SQL and procedural language.
- Fast query performance: BigQuery uses a distributed columnar storage format and parallel processing to execute queries quickly, even on massive datasets.
- Built-in machine learning capabilities: BigQuery offers built-in machine learning models.
- Secure data processing: BigQuery provides multiple layers of security, including encryption of data at rest and in transit, role-based access control, and integration with Cloud Key Management Service for managing encryption keys.
- Cost-effective pricing: BigQuery offers a pay-as-you-go pricing model based on data storage and query usage, with no up-front costs or minimum usage requirements.
Understanding Lateral Column Aliases
Lateral column aliases are a powerful feature which is available in many databases such as Netezza, Redshift, Oracle, Teradata, etc. The Lateral column aliases can help optimize query performance and reduce query complexity. With lateral column aliases, you can reuse column aliases within the same query without having to repeat the underlying expressions. By reusing column aliases, you can simplify your query code and avoid duplicating complex expressions, which can result in faster query execution times. Additionally, lateral column aliases can improve the readability of your queries, making them easier to understand and debug.
Here is the Teradata example to understand lateral column aliases:
SELECT name,
CASE
WHEN name = 'Product A' THEN 2
WHEN name = 'Product B' THEN 3
WHEN name = 'Product C' THEN 2
WHEN name = 'Product D' THEN 2
WHEN name = 'Product E' THEN 3
END AS new_quantity,
new_quantity * 2 AS bonus_qty,
count(*) AS total
FROM
ds_tbl_db.demo_table
WHERE new_quantity IS NOT NULL
GROUP BY
name,
new_quantity * 2;
As you can see, query reuses the new_quantity
column alias. But same syntax is not allowed in BigQuery.
Lateral Column alias in BigQuery to Reuse Column
GCP BigQuery does not currently allow you to write queries that refer to a column alias within the same query immediately after it is declared.
Following are some alternate methods that you can use to reuse column aliases in BigQuery:
- Reuse Column Aliases using BigQuery UNNEST
- Derive Column Aliases using Subquery
- Reuse Column Aliases using BigQuery WITH clause
Now, let us check these method with some examples:
Reuse Column Aliases using BigQuery UNNEST
The UNNEST()
function in BigQuery is used to flatten an array or a struct field in a table into individual rows. It is a powerful function for exploring and analyzing complex data structures stored in BigQuery tables.
When you use the UNNEST()
function, it takes an array or struct field as input and produces a new table output that has one row for each element in the input array or struct. This new table will have additional columns that correspond to the fields of the input array or struct.
We can utilize BigQuery’s UNNEST function along with the STRUCT function to handle the new_quantity
column alias in the previous example.
Following is an example:
SELECT
name,
new_quantity * 2 AS bonus_qty,
count(*) AS total
FROM
myDataset.demo_table,
UNNEST([STRUCT(CASE
WHEN name = 'Product A' THEN 2
WHEN name = 'Product B' THEN 3
WHEN name = 'Product C' THEN 2
WHEN name = 'Product D' THEN 2
WHEN name = 'Product E' THEN 3
END AS new_quantity)])
WHERE new_quantity IS NOT NULL
GROUP BY
name,
new_quantity * 2;
The query uses a cross join to combine rows from demo_table
with the output of the UNNEST()
function, which generates a table with one row per array element. The UNNEST()
function takes a hardcoded array of values and transforms it into a table with a single column named new_quantity
.
Derive Column Aliases using Subquery
In this method, you can use a subquery to calculate the new_quantity
column, and then reference it in the outer query.
For examples,
SELECT name,
new_quantity,
new_quantity * 2 AS bonus_qty,
COUNT(*) AS total
FROM (
SELECT name,
CASE
WHEN name = 'Product A' THEN 2
WHEN name = 'Product B' THEN 3
WHEN name = 'Product C' THEN 2
WHEN name = 'Product D' THEN 2
WHEN name = 'Product E' THEN 3
END AS new_quantity
FROM ds_tbl_db.demo_table
)
WHERE new_quantity IS NOT NULL
GROUP BY name, new_quantity, bonus_qty;
In this query, the inner subquery calculates the new_quantity
column using a CASE
statement. Then the outer query uses the new_quantity
columns from the subquery, and calculates the bonus_qty
and total
columns. The calculated column is also used in WHERE and GROUP BY clause.
Reuse Column Aliases using BigQuery WITH clause
In this method, you can use a WITH clause (CTE) to calculate the new_quantity
column, and then reference it in the outer query.
Here is an example:
WITH
cte AS (
SELECT name,
CASE
WHEN name = 'Product A' THEN 2
WHEN name = 'Product B' THEN 3
WHEN name = 'Product C' THEN 2
WHEN name = 'Product D' THEN 2
WHEN name = 'Product E' THEN 3
END AS new_quantity
FROM `ds_tbl_db.demo_table`
)
SELECT name,
new_quantity,
new_quantity * 2 AS bonus_qty,
COUNT(*) AS total
FROM cte
WHERE new_quantity IS NOT NULL
GROUP BY name, new_quantity, bonus_qty;
In this example, Common Table Expression (CTE) is used to calculates the new_quantity
column using a CASE
statement. Then the select statement of the query uses the new_quantity
columns from the subquery, and calculates the bonus_qty
and total
columns. The calculated column is also used in WHERE and GROUP BY clause.
Benefits of using Lateral Column Aliases
In general, the benefits of using lateral column aliases include:
- Improved Readability: By using lateral column aliases, you can create more readable and understandable queries.
- Simplified Maintenance: By reusing column aliases, you can reduce the amount of duplicated code in your queries. This can make your queries more maintainable, as you can make changes to column calculations or filters in one place and have those changes apply to all instances of the column alias.
- Performance Optimization: By reusing column aliases, you can reduce the number of calculations that need to be performed, which can improve query performance.
Conclusion
In summary, lateral column aliases allow you to reuse column aliases in a query, which can improve readability, simplify maintenance, and optimize performance. Unfortunately, BigQuery does not support lateral column aliases. However, there are alternative methods available, such as using UNNEST function, Common Table Expressions (CTEs) and Subquery to define column aliases and reuse them throughout the query. By using these techniques, you can achieve similar benefits as you would with lateral column aliases, making your queries more readable, maintainable, and performant.
Related Articles,
- BigQuery Recursive Query Alternative – Example
- What is SQL Cursor Alternative in BigQuery?
- Google BigQuery GROUP BY CUBE Alternative and Example
- Google BigQuery Grouping Sets Alternative and Example
Hope this helps 🙂