Hadoop Hive Regular Expression Functions and Examples

The Hadoop Hive regular expression functions identify precise patterns of characters in the given string and are useful for extracting string from the data and validation of the existing data, for example, validate date, range checks, checks for characters, and extract specific characters from the data. In this article, we will be checking some commonly used Hadoop Hive regular expressions with an examples. Types of Hadoop Hive regular expression functions As of now, Hive supports only two regular expression functions: REGEXP_REPLACE REGEXP_EXTRACT Hive REGEXP_REPLACE Function Searches a string for a…

Continue ReadingHadoop Hive Regular Expression Functions and Examples
Comments Off on Hadoop Hive Regular Expression Functions and Examples

Apache Hive Table Design Best Practices and Considerations

As you plan your database or data warehouse migration to Hadoop ecosystem, there are key table design decisions that will heavily influence overall Hive query performance. In this article, we will check Apache Hive table design best practices.  Apache Hive Table Design Best Practices Table design play very important roles in Hive query performance. These design choices also have a significant effect on storage requirements, which in turn affects query performance by reducing the number of I/O operations and minimizing the memory required to process Hive queries. Read: Apache Hive…

Continue ReadingApache Hive Table Design Best Practices and Considerations
Comments Off on Apache Hive Table Design Best Practices and Considerations

Apache Hive EXPLAIN Command and Example

Latest version of Hive uses Cost Based Optimizer (CBO) to increase the Hive query performance. Hive uses a cost-based optimizer to determine the best method for scan and join operations, join order, and aggregate operations. You can use the Apache Hive EXPLAIN command to display the actual execution plan that Hive query engine generates and uses while executing any query in the Hadoop ecosystem. Read: Hive ANALYZE TABLE Command Hive Performance Tuning Best Practices Apache Hive Cost Based Optimizer Latest version of Apache Hive uses the cost based optimizer to…

Continue ReadingApache Hive EXPLAIN Command and Example
Comments Off on Apache Hive EXPLAIN Command and Example

HiveServer2 Beeline Command Line Shell Options and Examples

HiveServer2 supports a command shell Beeline that works with HiveServer2. It's a JDBC client that is based on the SQLLine CLI. The Beeline shell works in both embedded mode as well as remote mode. In the embedded mode, it runs an embedded Hive (similar to Hive Command line) whereas remote mode is for connecting to a separate HiveServer2 process over Thrift. In this article, we will check commonly used HiveServer2 Beeline command line shell options with an examples. You can run all Hive command line and Interactive options from Beeline…

Continue ReadingHiveServer2 Beeline Command Line Shell Options and Examples
Comments Off on HiveServer2 Beeline Command Line Shell Options and Examples

Apache Hive Performance Tuning Best Practices – Steps

When it comes to building data warehouse-on-Hadoop ecosystem, there are handful open source frameworks available. Hive and Impala are most widely used to build data warehouse on the Hadoop framework. Hive is developed by Facebook and Impala by Cloudera. In this article, we will explain Apache Hive Performance Tuning Best Practices and steps to be followed to achieve high performance. Apache Hive Performance Tuning Best Practices You can adapt number of steps to tune the performance in Hive including better schema design, right file format, using proper execution engines etc.…

Continue ReadingApache Hive Performance Tuning Best Practices – Steps
Comments Off on Apache Hive Performance Tuning Best Practices – Steps

Commonly used Apache Hive Interactive Shell Command Options and Examples

You can use the Hive Interactive shell command options to add JAR or resource files, set variables, display list of resource files and delete them when not required. Hive interactive shell provides various option. You can even execute shell or linux commands from Hive interactive shell without actually leaving Hive shell. For ad hoc queries and data exploration, you can submit SQL statements in an interactive session. You can add the UDF JAR files to the Hive using Apache Hive interactive shell command options. Read: Steps to Connect to Hive…

Continue ReadingCommonly used Apache Hive Interactive Shell Command Options and Examples
Comments Off on Commonly used Apache Hive Interactive Shell Command Options and Examples

Commonly used Apache Hive Command Line Options and Examples

You can use the Hive shell interactive tool (hive) to set up databases and tables, insert data, and issue queries. If you have worked on Netezza or Oracle, this tool is similar to nzsql or SQLPlus. For ad hoc queries and data exploration, you can submit SQL statements in an interactive session. You can write the queries in the script file and execute those using Hive shell Command Line Options. Read: Steps to Connect to Hive Using Beeline CLI HiveServer2 Beeline Command Line Shell Options and Examples Commonly used Hive…

Continue ReadingCommonly used Apache Hive Command Line Options and Examples
Comments Off on Commonly used Apache Hive Command Line Options and Examples

Hive CREATE INDEX to Optimize and Improve Query Performance

The main goal of creating INDEX on Hive table is to improve the data retrieval speed and optimize query performance. For example, let us say you are executing Hive query with filter condition WHERE col1 = 100, without index hive will load entire table or partition to process records and with index on col1 would load part of HDFS file to process records. But be informed that Index on hive table is not recommended. The create index will help if you are migrating your existing data warehouse to Hive and…

Continue ReadingHive CREATE INDEX to Optimize and Improve Query Performance
Comments Off on Hive CREATE INDEX to Optimize and Improve Query Performance

Hive Create View Syntax and Examples

You can use Hive create view to create a virtual table based on the result-set of a complex SQL statement that may have multiple table joins. The CREATE VIEW statement lets you create a shorthand abbreviation for a more complex and complicated query. Apache Hive view is purely a logical construct (an alias for a complex query) with no physical data behind it. Note that, Hive view is different from lateral view.  Read: Hive CREATE INDEX to Optimize and Improve Query Performance Hadoop Hive Bucket Concept and Bucketing Examples Hive…

Continue ReadingHive Create View Syntax and Examples
Comments Off on Hive Create View Syntax and Examples

Clouderal Impala SQL Join Types and Examples

Impala SQL Join is a clause that is used for combining specific fields from two or more tables based on the common columns. The joins in the Impala are similar to the SQL and Hive joins. Joins are used to combine rows from multiple tables. In this article, we will learn about different Impala SQL join types with examples. Different Impala Join Types Following are Different Hive Join Types INNER JOIN LEFT OUTER JOIN RIGHT OUTER JOIN FULL OUTER JOIN SEMI JOIN ANTI JOIN CROSS JOIN Below are the tables…

Continue ReadingClouderal Impala SQL Join Types and Examples
Comments Off on Clouderal Impala SQL Join Types and Examples