Apache Hive Table Design Best Practices and Considerations

As you plan your database or data warehouse migration to Hadoop ecosystem, there are key table design decisions that will heavily influence overall Hive query performance. In this article, we will check Apache Hive table design best practices.  Apache Hive Table Design Best Practices Table design play very important roles in Hive query performance. These design choices also have a significant effect on storage requirements, which in turn affects query performance by reducing the number of I/O operations and minimizing the memory required to process Hive queries. Read: Apache Hive…

Continue ReadingApache Hive Table Design Best Practices and Considerations
Comments Off on Apache Hive Table Design Best Practices and Considerations

Amazon Redshift WITH Clause Syntax, Usage and Examples

Redshift WITH Clause is an optional clause that always precedes SELECT clause in the query statements. WITH clause has a subquery that is defined as a temporary tables similar to View definition. Each subquery in the WITH clause specifies a table name, an optional list of column names, and a query expression that evaluates to a table (usually a SELECT statement). In SQL, WITH clause are commonly referred to as Common Table Expressions (CTE). A CTE or WITH clause is a syntactical sugar for a subquery. Where you can use…

Continue ReadingAmazon Redshift WITH Clause Syntax, Usage and Examples
Comments Off on Amazon Redshift WITH Clause Syntax, Usage and Examples

Apache Hive EXPLAIN Command and Example

Latest version of Hive uses Cost Based Optimizer (CBO) to increase the Hive query performance. Hive uses a cost-based optimizer to determine the best method for scan and join operations, join order, and aggregate operations. You can use the Apache Hive EXPLAIN command to display the actual execution plan that Hive query engine generates and uses while executing any query in the Hadoop ecosystem. Read: Hive ANALYZE TABLE Command Hive Performance Tuning Best Practices Apache Hive Cost Based Optimizer Latest version of Apache Hive uses the cost based optimizer to…

Continue ReadingApache Hive EXPLAIN Command and Example
Comments Off on Apache Hive EXPLAIN Command and Example

HiveServer2 Beeline Command Line Shell Options and Examples

HiveServer2 supports a command shell Beeline that works with HiveServer2. It's a JDBC client that is based on the SQLLine CLI. The Beeline shell works in both embedded mode as well as remote mode. In the embedded mode, it runs an embedded Hive (similar to Hive Command line) whereas remote mode is for connecting to a separate HiveServer2 process over Thrift. In this article, we will check commonly used HiveServer2 Beeline command line shell options with an examples. You can run all Hive command line and Interactive options from Beeline…

Continue ReadingHiveServer2 Beeline Command Line Shell Options and Examples
Comments Off on HiveServer2 Beeline Command Line Shell Options and Examples

Easy Methods to Integrate Netezza and Amazon S3 – Steps

Amazon AWS is gaining popularity as cloud based web services. It will just take few clicks to make your system or storage up and running. Amazon web services (AWS) provides on-demand cloud computing platforms and storage services (S3). Amazon S3 is fast, reliable cloud storage that is the reason most of organizations are using it to store its data. In this article, we will check easy methods to Integrate Netezza and Amazon S3 storage for data transfer between them. You may have to connect to Amazon S3 to pull data…

Continue ReadingEasy Methods to Integrate Netezza and Amazon S3 – Steps
Comments Off on Easy Methods to Integrate Netezza and Amazon S3 – Steps

Different Methods to Load Data from Amazon S3 into Netezza Table

Amazon as a cloud based service gaining popularity. Amazon web services (AWS) provides on-demand cloud computing platforms and storage services. Amazon S3 is fast, reliable cloud storage that is the reason most of organizations are using it to store its data. In this article, we will check how to load data from Amazon S3 into Netezza tables. You may also interested in loading Netezza data to S3 bucket: Export Netezza Data into Amazon S3 Bucket We will be using Amazon AWS CLI to load data from Amazon S3 into Netezza…

Continue ReadingDifferent Methods to Load Data from Amazon S3 into Netezza Table
Comments Off on Different Methods to Load Data from Amazon S3 into Netezza Table

Different Methods to Export Netezza Data into Amazon S3 Bucket

Now a days, Amazon AWS is gaining popularity as cloud based web services. It will just take few clicks to make your system or storage up and running. In this article, we will check how to integrate Netezza and Amazon S3. We will also check how to export Netezza data into Amazon S3 bucket using Amazon web services command line interface (aws cli) with an example. You may also interested in load data from Amazon S3 to Netezza table: Different Methods to Load Data from Amazon S3 into Netezza Table…

Continue ReadingDifferent Methods to Export Netezza Data into Amazon S3 Bucket
Comments Off on Different Methods to Export Netezza Data into Amazon S3 Bucket

Apache Hive Performance Tuning Best Practices – Steps

When it comes to building data warehouse-on-Hadoop ecosystem, there are handful open source frameworks available. Hive and Impala are most widely used to build data warehouse on the Hadoop framework. Hive is developed by Facebook and Impala by Cloudera. In this article, we will explain Apache Hive Performance Tuning Best Practices and steps to be followed to achieve high performance. Apache Hive Performance Tuning Best Practices You can adapt number of steps to tune the performance in Hive including better schema design, right file format, using proper execution engines etc.…

Continue ReadingApache Hive Performance Tuning Best Practices – Steps
Comments Off on Apache Hive Performance Tuning Best Practices – Steps

Commonly used Apache Hive Interactive Shell Command Options and Examples

You can use the Hive Interactive shell command options to add JAR or resource files, set variables, display list of resource files and delete them when not required. Hive interactive shell provides various option. You can even execute shell or linux commands from Hive interactive shell without actually leaving Hive shell. For ad hoc queries and data exploration, you can submit SQL statements in an interactive session. You can add the UDF JAR files to the Hive using Apache Hive interactive shell command options. Read: Steps to Connect to Hive…

Continue ReadingCommonly used Apache Hive Interactive Shell Command Options and Examples
Comments Off on Commonly used Apache Hive Interactive Shell Command Options and Examples

Commonly used Apache Hive Command Line Options and Examples

You can use the Hive shell interactive tool (hive) to set up databases and tables, insert data, and issue queries. If you have worked on Netezza or Oracle, this tool is similar to nzsql or SQLPlus. For ad hoc queries and data exploration, you can submit SQL statements in an interactive session. You can write the queries in the script file and execute those using Hive shell Command Line Options. Read: Steps to Connect to Hive Using Beeline CLI HiveServer2 Beeline Command Line Shell Options and Examples Commonly used Hive…

Continue ReadingCommonly used Apache Hive Command Line Options and Examples
Comments Off on Commonly used Apache Hive Command Line Options and Examples