Spark SQL Create Temporary Tables, Syntax and Examples

The temporary tables are tables that are available within the current session. Tables are automatically dropped at the end of the current session. In this article, we will check how to create Spark SQL temporary tables, its syntax and some examples. Spark SQL Create Temporary Tables Temporary tables or temp tables in Spark are available within the current spark session. Spark temp tables are useful, for example, when you want to join the dataFrame column with other tables. Spark DataFrame Methods or Function to Create Temp Tables Depends on the…

Continue ReadingSpark SQL Create Temporary Tables, Syntax and Examples
Comments Off on Spark SQL Create Temporary Tables, Syntax and Examples

Spark SQL CASE WHEN on DataFrame – Examples

In general, the CASE expression or command is a conditional expression, similar to if-then-else statements found in other languages. Spark SQL supports almost all features that are available in Apace Hive. One of such a features is CASE statement. In this article, how to use CASE WHEN and OTHERWISE statement on a Spark SQL DataFrame. Spark SQL CASE WHEN on DataFrame The CASE WHEN and OTHERWISE function or statement tests whether any of a sequence of expressions is true, and returns a corresponding result for the first true expression. Spark…

Continue ReadingSpark SQL CASE WHEN on DataFrame – Examples
Comments Off on Spark SQL CASE WHEN on DataFrame – Examples

Import CSV file to Pyspark DataFrame – Example

Many organization uses a flat file format such as CSV or TSV to offload their tables. Managing flat file is easy and can be transported by any electronic medium. In this article we will check how to import CSV file to Pyspark DataFrame with some examples. Import CSV file to Pyspark DataFrame There are many methods that you can use to import CSV file into pyspark or Spark DataFrame. But, the following methods are easy to use. Read Local CSV using com.databricks.spark.csv FormatRun Spark SQL Query to Create Spark DataFrame…

Continue ReadingImport CSV file to Pyspark DataFrame – Example
Comments Off on Import CSV file to Pyspark DataFrame – Example

Spark SQL Date and Timestamp Functions and Examples

Spark SQL provides many built-in functions. The functions such as date and time functions are useful when you are working with DataFrame which stores date and time type values. The built-in functions also support type conversion functions that you can use to format the date or time type. In this article, we will check what are Spark SQL date and timestamp functions with some examples. Spark SQL Date and Timestamp Functions Spark SQL supports almost all date and time functions that are supported in Apache Hive. You can use these…

Continue ReadingSpark SQL Date and Timestamp Functions and Examples
Comments Off on Spark SQL Date and Timestamp Functions and Examples

Rename PySpark DataFrame Column – Methods and Examples

A DataFrame in Spark is a dataset organized into named columns. Spark data frame is conceptually equivalent to a table in a relational database or a data frame in R/Python, but with richer optimizations. When you work with Datarames, you may get a requirement to rename the column. In this article, we will check how to rename a PySpark DataFrame column, Methods to rename DF column and some examples. Rename PySpark DataFrame Column As mentioned earlier, we often need to rename one column or multiple columns on PySpark (or Spark) DataFrame. Note…

Continue ReadingRename PySpark DataFrame Column – Methods and Examples
Comments Off on Rename PySpark DataFrame Column – Methods and Examples

SQL Merge Operation Using Pyspark – UPSERT Example

In the relational databases such as Snowflake, Netezza, Oracle, etc, Merge statement is used to manipulate the data stored in the table. In this article, we will check how to SQL Merge operation simulation using Pyspark. The method is same in Scala with little modification. SQL Merge Statement The MERGE command in relational databases, allows you to update old records and insert new records simultaneously. This command is sometimes called UPSERT (UPdate and inSERT command). Following is the sample merge statement available in RDBMS. merge into merge_test using merge_test2 on…

Continue ReadingSQL Merge Operation Using Pyspark – UPSERT Example
1 Comment

Spark DataFrame Column Type Conversion using CAST

In my other post, we have discussed how to check if Spark DataFrame column is of Integer Type. Some application expects column to be of a specific type. For example, Machine learning models accepts only integer type. In this article, we will check how to perform Spark DataFrame column type conversion using the Spark dataFrame CAST method. Spark DataFrame Column Type Conversion You can use the Spark CAST method to convert data frame column data type to required format. Test Data Frame Following is the test data frame (df) that…

Continue ReadingSpark DataFrame Column Type Conversion using CAST
Comments Off on Spark DataFrame Column Type Conversion using CAST

Spark DataFrame Integer Type Check and Example

Apache Spark is one of the easiest framework to deal with different data sources. You can combine heterogeneous data source with the help of dataFrames. Some application, for example, Machine Learning model requires only integer values. You should check the data type of the dataFrame before feeding it to ML models, or you should type cast it to an integer type. In this article, how to perform Spark dataFrame integer type check and how to convert it using CAST function in Spark. Spark DataFrame Integer Type Check Requirement As mentioned…

Continue ReadingSpark DataFrame Integer Type Check and Example
Comments Off on Spark DataFrame Integer Type Check and Example

How to Create Spark SQL User Defined Functions? Example

A user defined function (UDF) is a function written to perform specific tasks when built-in function is not available for the same. In a Hadoop environment, you can write user defined function using Java, Python, R, etc. In this article, we will check how to create Spark SQL user defined functions with an python user defined functionexample. Spark SQL User-defined Functions When you migrate your relational database warehouse to Hive and use Spark as an execution engine, you may miss some of the built-in function support. Some user defined functions…

Continue ReadingHow to Create Spark SQL User Defined Functions? Example
Comments Off on How to Create Spark SQL User Defined Functions? Example

Spark SQL isnumeric Function Alternative and Example

Most of the organizations are moving their data warehouse to the Hive and using Spark as an execution engine. Spark as an execution engine will boost the performance. In SQL, there are many options that you can use to deal with non-numeric values, for example, you can create user defined functions to filter out unwanted data. In this article, we will check Spark SQL isnumeric function alternative and examples. Spark SQL isnumeric Function Spark SQL, or Apache Hive does not provide support for is numeric function. You have to write…

Continue ReadingSpark SQL isnumeric Function Alternative and Example
Comments Off on Spark SQL isnumeric Function Alternative and Example