DWgeek.com

Apache Hive Performance Tuning Best Practices – Steps

When it comes to building data warehouse-on-Hadoop ecosystem, there are handful open source frameworks available. Hive and Impala are most widely used to build data warehouse on the Hadoop framework. Hive is developed by Facebook and Impala by Cloudera. In this article, we will explain Apache Hive Performance Tuning Best Practices and steps to be followed to achieve high performance. Apache Hive Performance Tuning Best Practices You can adapt number of steps to tune the performance in Hive including better schema design, right file format, using proper execution engines etc.…

Comments Off

November 13, 2017

BigData

Commonly used Apache Hive Interactive Shell Command Options and Examples

You can use the Hive Interactive shell command options to add JAR or resource files, set variables, display list of resource files and delete them when not required. Hive interactive shell provides various option. You can even execute shell or linux commands from Hive interactive shell without actually leaving Hive shell. For ad hoc queries and data exploration, you can submit SQL statements in an interactive session. You can add the UDF JAR files to the Hive using Apache Hive interactive shell command options. Read: Steps to Connect to Hive…

Comments Off

November 13, 2017

BigData

Commonly used Apache Hive Command Line Options and Examples

You can use the Hive shell interactive tool (hive) to set up databases and tables, insert data, and issue queries. If you have worked on Netezza or Oracle, this tool is similar to nzsql or SQLPlus. For ad hoc queries and data exploration, you can submit SQL statements in an interactive session. You can write the queries in the script file and execute those using Hive shell Command Line Options. Read: Steps to Connect to Hive Using Beeline CLI HiveServer2 Beeline Command Line Shell Options and Examples Commonly used Hive…

Comments Off

November 13, 2017

BigData

Apache HBase Writing Data Best Practices

For writing data into HBase, you use methods of the HtableInterface class. You can also use the Java API directly, or use the HBase Shell Commands. When you issue an HBase Shell Put command, the coordinates of the data are the row, the column, and the timestamp. The timestamp is unique per version of the cell, and it can be generated automatically or specified programmatically by your application, and must be a long integer. In this article, we will check Apache HBase writing data best practices to tune the performance…

Comments Off

November 13, 2017

BigData

Hive CREATE INDEX to Optimize and Improve Query Performance

The main goal of creating INDEX on Hive table is to improve the data retrieval speed and optimize query performance. For example, let us say you are executing Hive query with filter condition WHERE col1 = 100, without index hive will load entire table or partition to process records and with index on col1 would load part of HDFS file to process records. But be informed that Index on hive table is not recommended. The create index will help if you are migrating your existing data warehouse to Hive and…

Comments Off

November 11, 2017

BigData

Hive Create View Syntax and Examples

You can use Hive create view to create a virtual table based on the result-set of a complex SQL statement that may have multiple table joins. The CREATE VIEW statement lets you create a shorthand abbreviation for a more complex and complicated query. Apache Hive view is purely a logical construct (an alias for a complex query) with no physical data behind it. Note that, Hive view is different from lateral view. Read: Hive CREATE INDEX to Optimize and Improve Query Performance Hadoop Hive Bucket Concept and Bucketing Examples Hive…

Comments Off

November 11, 2017

General

Teradata Split Delimited Fields into Table Records and Examples

If you are working on the huge amount of different source system then you may come across the requirement of Teradata split delimited fields into table records. You can perform Teradata split delimited string into columns in various ways using Teradata built-in string functions or Teradata regular expressions. You can use any of the below methods as per your requirements: Teradata Split Delimited fields using STRTOK_SPLIT_TO_TABLE Function Since TD14, there is a STRTOK_SPLIT_TO_TABLE function. You can use this function to split your string or delimited field into table columns. Teradata…

Comments Off

November 10, 2017

General

Teradata Regular Expressions and Examples

The Teradata regular expressions functions identify precise patterns of characters and are useful for extracting string from the data and validation of the existing data, for example, validate date, range checks, checks for characters, and extract specific characters from the data. In this article, we will check some of commonly used Teradata regular expressions. Read: Teradata String Functions and Examples Commonly used Teradata date functions and Examples Teradata Substring Regular Expression - REGEXP_SUBSTR This function is used to extracts a substring from source_string that matches a regular expression specified by…

Comments Off

November 10, 2017

General

Teradata String Functions and Examples

Teradata String Functions are primarily used for various string manipulation. It also supports most of the standard string function along with the Teradata extension to those functions. Teradata String Functions Below are the commonly used Teradata string functions: Read: Teradata Regular Expressions and Examples Teradata Set Operators: UNION, UNION ALL, INTERSECT, EXCEPT/MINUS Commonly used Teradata Analytics Functions and Examples Teradata Date Functions and Examples Function Description concat(string1, ..., stringN) Returns the concatenation of two or more string values. This function provides the same functionality as the SQL-standard concatenation operator (||). length(string) Returns…

2 Comments

November 9, 2017

BigData

Access HBase Tables from Impala working Examples

As you know Hadoop Hive or Impala does not properly support transaction data. HBase is best suited for the table which required lot of delete, update, insert etc. You may want to explore the data stored in the HBase table. This article, helps you to understand how to access HBase tables from Impala and we will check out process with an example. Read other article on loading HBase table from Hive: Loading HBase Table from Apache Hive Why you want to access the HBase tables from Impala? This is obvious…

Comments Off

November 8, 2017