How to Export Spark-SQL Results to CSV?

Data plays important role in today's decision making process. Be it online bookstore, e-commerce website or online food delivery applications use user data to provide better customer service. These are many organizations that share data to decision making systems. These companies provide data in the form of flat files or direct access to the source system. Many companies use Spark as an execution engine. In this article, we will check how to export Spark-SQL results to CSV flat file. The created flat files or CSV files then be transported using…

Continue ReadingHow to Export Spark-SQL Results to CSV?
Comments Off on How to Export Spark-SQL Results to CSV?

Amazon Released PartiQL – An Open Source SQL Compatible Query language

Today, most of the business critical decisions are driven by data. As data grows, it is typically spread across a combination of relational databases, non-relational data stores, and data lakes. As per Harvard business review, on average, less than half of structured data is being used to make business critical decision. Structured data could be a data from relational databases. They do no have facilities to consider other data sources such as the local file system or NoSQL database data or data lakes. In this article, we will check newly…

Continue ReadingAmazon Released PartiQL – An Open Source SQL Compatible Query language
Comments Off on Amazon Released PartiQL – An Open Source SQL Compatible Query language

How to use Impala Replace Function and Examples

The latest version of the Cloudera supports Impala replace function. String manipulation function replace is very much needed in case if you are manipulating strings and when there is a need to replace the particular value. Value could be a junk value or any other values based on your requirements. In this article, we will check Impala replace function and alternative methods that you can use whenever required. Impala Replace Function As mentioned earlier, the latest version of Cloudera impala does provide support for replace function. The syntax and usage…

Continue ReadingHow to use Impala Replace Function and Examples
Comments Off on How to use Impala Replace Function and Examples

Best SQL Query Format Tools Online

Who does not like formatted SQL codes? They are easy to read, understand and follow the complex logic. Sometimes, when you export SQL code from the database, the code will become messy and hard to understand. In this article, we will check Best SQL query format tools available for free on the internet. Some of them provide an API so that you can integrate in your application. Best SQL Query Format Tools Before jumping on the tools introduction. let us understand why the SQL code formatting is important? Why do…

Continue ReadingBest SQL Query Format Tools Online
Comments Off on Best SQL Query Format Tools Online

Export Hive Table DDL, Syntax and Shell Script Example

There are many situations where you are required to export DDL's. For example, you are migrating some of your Hive tables to the RDBMS for reporting. If you are working as a Hadoop administrator, you should have knowledge on how to export Table DDL. In this article, we will check on how to export Hive table DDL to a text file using shell script and beeline connection string. Export Hive Table DDL As mentioned earlier, it is good to have a utility that allows you to generate DDL in Hive.…

Continue ReadingExport Hive Table DDL, Syntax and Shell Script Example
Comments Off on Export Hive Table DDL, Syntax and Shell Script Example

Database Table Denormalization Example

Bigdata technologies such as Hive, HBase, NoSQL taking over industry, thanks to its fast and distributed processing. Hadoop works on commodity hardware, so it is cheap too. Every organization wants to move its data to Bigdata world. If you are reading this article, your organization may be planning to migrate your relational database to Hadoop. Hadoop works best with denormalized tables. In this article, we will check how database Table denormalization works with an example. What is Table Denormalization? Before jumping into denormalization process, let us first understand what is…

Continue ReadingDatabase Table Denormalization Example
Comments Off on Database Table Denormalization Example

Impala Interval Data Type and Conversion Examples

Cloudera Impala Interval type is slightly different compared to Apache Hive interval data types. Only difference is it accept interval unit as a integer, where are in Hive it is string type. Interval type in Impala woks same way as in other relational databases such as Netezza, Vertica, Greenplum, Oracle, etc. In article, we will check more information on Impala interval data type and how to convert it. Impala Interval Data Type Impala interval type syntax accept unit specifications. The unit could be SECOND, HOUR, DAY, MONTH, YEAR. You have…

Continue ReadingImpala Interval Data Type and Conversion Examples
Comments Off on Impala Interval Data Type and Conversion Examples

Apache Hive Replace Function and Examples

By default, there is no Hive replace function available. String manipulation function replace is very much needed in case if you are manipulating strings and when there is a need to replace the particular value. Value could be junk value. In this article, we will check what are Hive replace function alternative methods that you can use whenever required. Hive Replace Function As mentioned earlier, Apache Hive does not provide support for replace function. However, it does provides support for regular expression functions and translate function. You can use any…

Continue ReadingApache Hive Replace Function and Examples
Comments Off on Apache Hive Replace Function and Examples

Hive Interval Data Types and Conversion Examples

Hive supports interval types in the same way as other relational databases such as Netezza, Vertica, Oracle, etc. It accepts interval syntax with unit specifications. You have to specify the units along withe interval value. For example, INTERVAL '1' DAY refers to day time. In this article, we will check Hive interval data types and its conversion examples. Hive Interval Data Types Hive version 1.2 and above supports interval types. Intervals of time units, Year to month intervals and Day to second intervals are available in hive version 1.2 and…

Continue ReadingHive Interval Data Types and Conversion Examples
Comments Off on Hive Interval Data Types and Conversion Examples

Spark Modes of Operation and Deployment

Apache Spark Mode of operations or Deployment refers how Spark will run. Spark can run either in Local Mode or Cluster Mode. Local mode is used to test your application and cluster mode for production deployment. In this article, we will check the Spark Mode of operation and deployment. Spark Mode of Operation Apache Spark by default runs in Local Mode. Usually, local modes are used for developing applications and unit testing. Spark can be configured to run in Cluster Mode using YARN Cluster Manager. Currently, Spark supports Three Cluster…

Continue ReadingSpark Modes of Operation and Deployment
Comments Off on Spark Modes of Operation and Deployment