Vithal S, Author at DWgeek.com

Different Vertica Join Types and Examples

Joins in SQL are used to combine specific columns from two or more tables based on common data columns available. Multiples tables are integrated using joins. In general, joins are used to combine multiple tables. In this article, we will check different Vertica join types with an example. Test Data Below are the tables and associated data that we will be using to demonstrate different joins available in Vertica database: Stud: VMart=> select * from testdb.stud; Id | name | city ----+-------+----------- 1 | Bill | London 2 | Ram…

Comments Off

April 1, 2019

BigData

What are SQL Features Missing in Hive?

Apache Hive syntax looks similar to SQL-92 standards but does not fully compatible to SQL-92. Storage and querying underlying table’s closes resembles traditional databases available in industry. HiveQL provides some of the extensions that are not present in traditional databases. There are some features gap between traditional SQL and Apache Hive. In this article, we will check some basic and import SQL features missing in Hive. SQL features Missing in Hive Below are some of important yet basic SQL features missing in Hive: Online Transaction Processing (OLTP)Correlated Sub-queriesMaterialized ViewsTruncate TableIndexes…

Comments Off

March 31, 2019

Vertica

Run SQL Script File using Vertica vsql Variable Substitution

Variable substitution is nothing but passing value to a variable referred in Vertical SQL queries. Let me explain this using an example. Imagine you have written a general SQL script to generate monthly aggregated data for a particular report, you have to provide a current month values to a filter variable referred in SQL script. Another instance could be executing Vertica SQL queries from Linux shell script and passing shell variable value to a variable referred in SQL script. You can accomplish this task using Vertica vsql variable substitution method.…

Comments Off

March 29, 2019

Vertica

Vertica WITH Clause, Syntax, Usage and Examples

Let us say you have a requirement to use the results of a piece of code in your complex query. Instead of writing that piece of code everywhere, you can create a temporary table of use Vertica WITH clause. This WITH clause can be used to improve your complex SQL queries that have complex sub-queries, thus improving overall execution speed of your queries. SQL WITH clause will allow you to name your complex, repeat sub-query. You can use that name anywhere in your query just like normal database table. Vertica…

Comments Off

March 28, 2019

Data Warehouse

Data Vault Modeling Methodology Architecture

Based on what you are working and expected results, you have to use different methodologies and best practices. A data warehouse is no different, you have to use different modeling methodologies based on the type of source data and integration. Big data is a hot cake now, everybody wants to move their data to bigdata world. Traditional methods such as Kibmal’s Star schema and Inmon’s relational 3NF may not work. You have to choose a different approach based on your ecosystem and data. In this article, we will check new…

Comments Off

March 28, 2019

BigData

Hive DELETE FROM Table Alternative– Easy Steps

By definition, Data Warehouse is mechanism to store historical data in an easy accessible manner. Data may be updated to keep tables with up-to date records. This performance critical operation holds good when you plan to migrate your data warehouse to bigdata world. In this article, we will check one of the method to remove outdated records from Hive table i.e. Hive DELETE FROM table Alternative. Hive DELETE FROM Table Alternative Apache Hive is not designed for online transaction processing and does not offer real-time queries and row level…

Comments Off

March 27, 2019

Apache Spark

Spark Dataset Join Operators using Pyspark – Examples

Joining two different tables results in different dataset. You can join two different datasets to perform specific task, such as getting common rows. Relational databases like Netezza, Teradata supports different join types. Just like RDBMS, Apache Hive also supports different join types. In this article, we will check Spark Dataset Join Operators using Pyspark and some examples to demonstrate different join types. Before going into Spark SQL dataframe join types, let us check what is join in SQL? “A query that accesses multiple rows of the same or different table…

Comments Off

March 18, 2019

Apache Spark

Spark SQL Cumulative Average Function and Examples

Spark SQL supports Analytics or window functions. You can use Spark SQL to calculate certain results based on the range of values. Result might be dependent of previous or next row values, in that case you can use cumulative sum or average functions. Databases like Netezza, Teradata, Oracle, even latest version of Apache Hive supports analytic or window functions. In this article, we will check Spark SQL cumulative Average function and how to use it with an example. Spark SQL Cumulative Average Function There are two methods to calculate cumulative…

Comments Off

March 16, 2019

Apache Spark

Spark SQL Cumulative Sum Function and Examples

Spark SQL supports Analytics or window function. You can use Spark SQL to calculate certain results based on the range of values. Most of the databases like Netezza, Teradata, Oracle, even latest version of Apache Hive supports analytic or window functions. In this article, we will check Spark SQL cumulative sum function and how to use it with an example. Spark SQL Cumulative Sum Function Before going deep into calculating cumulative sum, first, let is check what is running total or cumulative sum? “A running total or cumulative sum refers…

Comments Off

March 16, 2019

Apache Spark

Spark SQL Analytic Functions and Examples

Spark SQL analytic functions sometimes called as Spark SQL windows function compute an aggregate value that is based on groups of rows. These functions optionally partition among rows based on partition column in the windows spec. Like other analytic functions such as Hive Analytics functions, Netezza analytics functions and Teradata Analytics functions, Spark SQL analytic functions works on groups of rows. These functions optionally ignore NULL values in the data. Spark SQL Analytic Functions There are two types of Spark SQL windows functions: Ranking functions and Analytic functions Related Articles:…

Comments Off

March 15, 2019