Lateral Column Alias in Databricks – Example

The lateral column alias in Databricks allows users to reuse an expression specified earlier in the same SELECT list, eliminating the need to use nested subqueries and Common Table Expressions (CTEs) in many cases. This blog post discusses the use cases of the feature and the benefits it brings to Spark and Databricks users. Page Contents Introduction What is Lateral Column Alias Support? Benefits of using Lateral Column Alias in Databricks SQL Alternative Methods to Lateral Column Alias in Databricks SQL Conclusion Introduction Databricks has introduced a much needed lateral…

Continue ReadingLateral Column Alias in Databricks – Example
Comments Off on Lateral Column Alias in Databricks – Example

Database Migration to Snowflake: Best Practices and Tips

The Snowflake cloud data warehouse has become widely recognized as a flexible, high-performing, and scalable solution for both data warehousing and analytics needs. This article will explore how to migrate a database to Snowflake cloud data warehouse and also provide insights into some best practices for the migration. Page Content Introduction Preparing for Migration Migrating to Snowflake Best Practices for Database Migration to Snowflake Best Practices for File Sizing and Format Best Practices for Data Transfer Best Practices for Running Source and Snowflake Databases Best Practices for Temporary and Transient…

Continue ReadingDatabase Migration to Snowflake: Best Practices and Tips
Comments Off on Database Migration to Snowflake: Best Practices and Tips

Reuse Column Aliases in BigQuery – Lateral Column alias

BigQuery lateral Column alias are columns that are derived from the previously computed columns in same SELECT statement. Derived columns or lateral Column alias are virtual columns that are not physically stored in the table. Their values are re-calculated every time they are referenced in a query. Many PostgreSQL relational databases such as Netezza supports reuse of the column aliases within the same SELECT statement but GCP BigQuery does not support reuse of calculated derived columns. In this article, we will identify the alternate methods reuse column aliases in Google…

Continue ReadingReuse Column Aliases in BigQuery – Lateral Column alias
Comments Off on Reuse Column Aliases in BigQuery – Lateral Column alias

How to use Amazon Redshift Replace Function?

The Amazon Redshift REPLACE function is one of the important string functions. The replace function allows you to manipulate the string in Amazon Redshift. This string function is similar to translate and regexp_replace functions. Amazon Redshift Replace Function Page Contents Introduction to Amazon Redshift Introduction to Amazon Redshift Syntax of the Redshift Replace Function Redshift Replace Function Example Usage of the Redshift Replace Function Best practices for using the Redshift Replace Function Conclusion Introduction to Amazon Redshift Amazon Redshift is a fully managed, cloud-based data warehouse service offered by Amazon…

Continue ReadingHow to use Amazon Redshift Replace Function?
Comments Off on How to use Amazon Redshift Replace Function?

How to Optimize Query Performance on Redshift?

In most of the cases, we pay lots of attention to improve the performance of the web application, but ignore the back-end SQL performance tuning. Amazon Redshift is a fully managed, petabyte-scale, massively parallel data warehouse that offers simple operations and high performance. Amazon Redshift can run a data model such as production transaction system third-normal-form model, star and snowflake schemas, data vault, or simple flat tables. This article takes you through the most common performance-related opportunities when writing query in Amazon Redshift and gives you concrete guidance on how to optimize…

Continue ReadingHow to Optimize Query Performance on Redshift?
Comments Off on How to Optimize Query Performance on Redshift?

Redshift RSQL Control Statements – IF-ELSE-GOTO-LABEL

Amazon Redshift is a data warehousing service provided by Amazon Web Services (AWS). It allows users to store and analyze large amounts of data in a scalable and cost-effective manner. Amazon AWS Redshift RSQL is a command-line client for interacting with Amazon Redshift clusters and databases. Redshift RSQL is similar to Teradata BTEQ and is used to interact with the data stored in a Redshift cluster. In this article, we will check Amazon Redshift RSQL control statements such as IF, ELSE, GOT, LABEL,etc. Redshift RSQL Control Statements Amazon Redshift RSQL…

Continue ReadingRedshift RSQL Control Statements – IF-ELSE-GOTO-LABEL
Comments Off on Redshift RSQL Control Statements – IF-ELSE-GOTO-LABEL

How to Connect to Databricks SQL Endpoint from Azure Data Factory?

A Databricks SQL Endpoint is a compute cluster, quite similar to the cluster we have known in the Databricks that allows you execute SQL commands on data objects within the Databricks environment. The Databricks allows you to connect using various tools such as DBT, connect to Notebook using Azure Data Factory, etc. But there is no direct method to connect Databricks SQL endpoint warehouse. In this article, we will check how to connect to Databricks SQL endpoint from Azure Data Factory (ADF). Connect to Databricks SQL Endpoint from Azure Data…

Continue ReadingHow to Connect to Databricks SQL Endpoint from Azure Data Factory?
Comments Off on How to Connect to Databricks SQL Endpoint from Azure Data Factory?

How to Export SQL Server Table to S3 using Spark?

Apache Spark is one of the emerging Bigdata technology. Due to its in memory distributed and fast computation, you can use it to perform heavy jobs such as analyzing petabytes of data or export millions or billions of records from any relational database to cloud storage such as Amazon S3, Azure Blob or Google cloud storage. In this article, we will check how to export SQL Server table to the Amazon cloud S3 bucket using Spark. We will use PySpark to demonstrate the method. In my other article, we have…

Continue ReadingHow to Export SQL Server Table to S3 using Spark?
Comments Off on How to Export SQL Server Table to S3 using Spark?

Connect to SQL Server From Spark – PySpark

Due to its in memory distributed and fast computation, Apache Spark is one of the emerging Bigdata technology. Apache Spark in memory distributed computation allows you to analyze petabytes of data without any performance issue. In this article, we will check one of methods to connect SQL Server database from Spark program. Preferably, we will use PySpark to read SQL Server table. Connection method is similar to that have already discussed for Oracle, Netezza, Snowflake, Teradata, etc. Steps to Connect SQL Server From Spark To access SQL Server from Apache…

Continue ReadingConnect to SQL Server From Spark – PySpark
Comments Off on Connect to SQL Server From Spark – PySpark

How to Create Synonym in Snowflake?

Synonyms in relational databases allows you to create easily names for long table, view names, or any other objects such as sequence, procedure, function, materialized view. Databases such as Netezza, Oracle support creating and managing synonyms. The synonyms provide an alternate way of referencing tables or views that present in the current or other databases. The Snowflake database does not support creating synonym yet. In this article, we will check an alternate method similar to create synonym in Snowflake. How to Create Synonym in Snowflake? The Snowflake cloud database does…

Continue ReadingHow to Create Synonym in Snowflake?
Comments Off on How to Create Synonym in Snowflake?