DWgeek.com

Databricks Temporary Tables Support

Starting with Databricks Runtime 16.4, Databricks now supports temporary tables—just like other databases such as Netezza, Snowflake, and Oracle. You can use these tables for storing data that you only need for a short time. They exist only during your current session and are automatically deleted when the session ends. You can’t access them from any other session. In this blog post, we’ll walk through how to create temporary tables in Databricks, look at the syntax, explore common use cases, and go over some limitations—with some easy-to-follow examples. Page Contents Introduction What…

Comments Off

June 23, 2025

Databricks

Lateral Column Alias in Databricks – Example

The lateral column alias in Databricks allows users to reuse an expression specified earlier in the same SELECT list, eliminating the need to use nested subqueries and Common Table Expressions (CTEs) in many cases. This blog post discusses the use cases of the feature and the benefits it brings to Spark and Databricks users. Page Contents Introduction What is Lateral Column Alias Support? Benefits of using Lateral Column Alias in Databricks SQL Alternative Methods to Lateral Column Alias in Databricks SQL Conclusion Introduction Databricks has introduced a much needed lateral…

Comments Off

September 24, 2024

Snowflake

Database Migration to Snowflake: Best Practices and Tips

The Snowflake cloud data warehouse has become widely recognized as a flexible, high-performing, and scalable solution for both data warehousing and analytics needs. This article will explore how to migrate a database to Snowflake cloud data warehouse and also provide insights into some best practices for the migration. Page Content Introduction Preparing for Migration Migrating to Snowflake Best Practices for Database Migration to Snowflake Best Practices for File Sizing and Format Best Practices for Data Transfer Best Practices for Running Source and Snowflake Databases Best Practices for Temporary and Transient…

Comments Off

April 20, 2023

GCP BigQuery

Reuse Column Aliases in BigQuery – Lateral Column alias

BigQuery lateral Column alias are columns that are derived from the previously computed columns in same SELECT statement. Derived columns or lateral Column alias are virtual columns that are not physically stored in the table. Their values are re-calculated every time they are referenced in a query. Many PostgreSQL relational databases such as Netezza supports reuse of the column aliases within the same SELECT statement but GCP BigQuery does not support reuse of calculated derived columns. In this article, we will identify the alternate methods reuse column aliases in Google…

Comments Off

April 18, 2023

Redshift

How to use Amazon Redshift Replace Function?

The Amazon Redshift REPLACE function is one of the important string functions. The replace function allows you to manipulate the string in Amazon Redshift. This string function is similar to translate and regexp_replace functions. Amazon Redshift Replace Function Page Contents Introduction to Amazon Redshift Introduction to Amazon Redshift Syntax of the Redshift Replace Function Redshift Replace Function Example Usage of the Redshift Replace Function Best practices for using the Redshift Replace Function Conclusion Introduction to Amazon Redshift Amazon Redshift is a fully managed, cloud-based data warehouse service offered by Amazon…

Comments Off

March 13, 2023

Redshift

How to Optimize Query Performance on Redshift?

In most of the cases, we pay lots of attention to improve the performance of the web application, but ignore the back-end SQL performance tuning. Amazon Redshift is a fully managed, petabyte-scale, massively parallel data warehouse that offers simple operations and high performance. Amazon Redshift can run a data model such as production transaction system third-normal-form model, star and snowflake schemas, data vault, or simple flat tables. This article takes you through the most common performance-related opportunities when writing query in Amazon Redshift and gives you concrete guidance on how to optimize…

Comments Off

January 31, 2023

Redshift

Redshift RSQL Control Statements – IF-ELSE-GOTO-LABEL

Amazon Redshift is a data warehousing service provided by Amazon Web Services (AWS). It allows users to store and analyze large amounts of data in a scalable and cost-effective manner. Amazon AWS Redshift RSQL is a command-line client for interacting with Amazon Redshift clusters and databases. Redshift RSQL is similar to Teradata BTEQ and is used to interact with the data stored in a Redshift cluster. In this article, we will check Amazon Redshift RSQL control statements such as IF, ELSE, GOT, LABEL,etc. Redshift RSQL Control Statements Amazon Redshift RSQL…

Comments Off

January 27, 2023

General

How to Connect to Databricks SQL Endpoint from Azure Data Factory?

A Databricks SQL Endpoint is a compute cluster, quite similar to the cluster we have known in the Databricks that allows you execute SQL commands on data objects within the Databricks environment. The Databricks allows you to connect using various tools such as DBT, connect to Notebook using Azure Data Factory, etc. But there is no direct method to connect Databricks SQL endpoint warehouse. In this article, we will check how to connect to Databricks SQL endpoint from Azure Data Factory (ADF). Connect to Databricks SQL Endpoint from Azure Data…

Comments Off

January 27, 2023

General

How to Export SQL Server Table to S3 using Spark?

Apache Spark is one of the emerging Bigdata technology. Due to its in memory distributed and fast computation, you can use it to perform heavy jobs such as analyzing petabytes of data or export millions or billions of records from any relational database to cloud storage such as Amazon S3, Azure Blob or Google cloud storage. In this article, we will check how to export SQL Server table to the Amazon cloud S3 bucket using Spark. We will use PySpark to demonstrate the method. In my other article, we have…

Comments Off

January 19, 2023

General

Connect to SQL Server From Spark – PySpark

Due to its in memory distributed and fast computation, Apache Spark is one of the emerging Bigdata technology. Apache Spark in memory distributed computation allows you to analyze petabytes of data without any performance issue. In this article, we will check one of methods to connect SQL Server database from Spark program. Preferably, we will use PySpark to read SQL Server table. Connection method is similar to that have already discussed for Oracle, Netezza, Snowflake, Teradata, etc. Steps to Connect SQL Server From Spark To access SQL Server from Apache…

Comments Off

January 19, 2023