Details about bigdata

Splitting HBase Tables, Examples and Best Practices

Apache HBase distributes its load through region splitting. HBase stored rows in the tables and each table is split into ‘regions’. Those regions are distributed across the cluster, hosted and made available to client processes by the RegionServer process in the system. All rows in the tables are sorted between regions start and end key. Every single row is belonging to exactly one region and a region is served by single region server at any given point of time. In this article, we will check Splitting HBase Tables, Examples and…

Continue ReadingSplitting HBase Tables, Examples and Best Practices
Comments Off on Splitting HBase Tables, Examples and Best Practices

HBase Exit Code – Capture Last Executed Command Status

You can use HBase exit code to check for success or failure of last executed command in script. These exit code help you to make decision on whether to continue the script execution or abort it in case of command failure. In this article, we have discussed HBase exit code – We have also discussed how to capture last executed command status. HBase Exit Code You can us $? to return the status of last executed command on HBase Shell. Just like other relational databases like Netezza, HBase exit code…

Continue ReadingHBase Exit Code – Capture Last Executed Command Status
Comments Off on HBase Exit Code – Capture Last Executed Command Status

Working with HBase Table Variables – Assign Table Name to jruby Variable

Apache HBase 0.95 allows you to assign table name to a jruby variable. This new feature allows you to save lot of time while working on table operations such as insert, read, delete data from table. In this article, we will discuss working with HBase table variables – assign table name to jruby variable with some examples. Working with HBase Table Variables – Assign Table Name to jruby Variable In earlier version, HBase shell commands takes table name as an argument. Apache HBase 0.95 version of HBase adds facility to…

Continue ReadingWorking with HBase Table Variables – Assign Table Name to jruby Variable
Comments Off on Working with HBase Table Variables – Assign Table Name to jruby Variable

How to Rename HBase Table? – Examples

In earlier, we had a simple script ‘rename_table.rb’, that would rename the HBase hdfs table directory and then edit hbase:meta table replacing all details of the old table name with the new. The script was deprecated and removed as it was un-maintained. In this article, we will check how to rename HBase table using snapshot with some examples. How to Rename HBase Table? You can use HBase snapshot facility to rename the tables. Here is how you would do it using the HBase shell: Related reading: Steps to Migrate HBase…

Continue ReadingHow to Rename HBase Table? – Examples
Comments Off on How to Rename HBase Table? – Examples

Steps to Migrate HBase Tables from Default to another Namespace

In Hbase, you can create different namespaces as per your requirements. You can think namespace as schema in relational database. When you create HBase tables without specifying namespace then tables will be available in “default” namespace. In this article, we will check steps to migrate HBase tables from default to another namespace with some examples. HBase Snapshots You can INSERT (single value at a time) data into tables that is present in another namespace using HBase PUT command. But, you can’t use the HBase put command to copy entire table…

Continue ReadingSteps to Migrate HBase Tables from Default to another Namespace
Comments Off on Steps to Migrate HBase Tables from Default to another Namespace

Export Hive Query Output into Local Directory using INSERT OVERWRITE

INSERT OVERWRITE statements to HDFS filesystem or LOCAL directories are the best way to extract large amounts of data from Hive table or query output. Hive can write to HDFS directories in parallel from within a map-reduce job. In this article, we will check Export Hive Query Output into Local Directory using INSERT OVERWRITE and some examples. Export Hive Query Output into Local Directory using INSERT OVERWRITE Query results can be inserted into filesystem directories by using Hive INSERT OVERWRITE statement. You can insert data into either HDFS or LOCAL…

Continue ReadingExport Hive Query Output into Local Directory using INSERT OVERWRITE
Comments Off on Export Hive Query Output into Local Directory using INSERT OVERWRITE

Apache Hive ALTER TABLE Command and Examples

You can use the Apache Hive ALTER TABLE command to change the structure of an existing table. You can add, modify existing columns in Hive tables. Uses of Hive ALTER TABLE Command Below are the most common uses of the ALTER TABLE command: You can rename table and column of existing Hive tables. You can add new column to the table. Rename Hive table column. Add or drop table partition. Add Hadoop archive option to Hive table. Related reading: Apache Hive Data Types and Best Practices Apache Hive CREATE TABLE…

Continue ReadingApache Hive ALTER TABLE Command and Examples
Comments Off on Apache Hive ALTER TABLE Command and Examples

Improve Hive Memory Usage using Hadoop Archive

Hadoop hdfs is designed in such a way that, number of hdfs files directly affects the memory consumption in the namenode as it must keep track of all files in the hdfs environment. It does not affect if cluster is small, memory usage may cause problem on cluster when file count crosses 50 to 100 million files. Hadoop ecosystem performs best with fewer number of files. Now, let us check Improve Hive Memory Usage using Hadoop Archive. Related reading: Hadoop HDFS Architecture Improve Hive Memory Usage using Hadoop Archive You can…

Continue ReadingImprove Hive Memory Usage using Hadoop Archive
Comments Off on Improve Hive Memory Usage using Hadoop Archive

Apache Hive Data Types and Best Practices

In general, data type is an attribute that specifies type of data that is going to be stored in that specific column. Each column, variable and expression has related data type associated with its column in SQL and HiveQL. However, data type names are not consistent across all databases. Hive supports almost all data types that relational database supports. In this article, we will check Apache Hive data types and Best practices. When you issue Apache Hive create table command in the Hadoop environment, each column in a table structure…

Continue ReadingApache Hive Data Types and Best Practices
Comments Off on Apache Hive Data Types and Best Practices

Apache Hive Fixed-Width File Loading Options and Examples

In general, fixed-width text files are special types of text files where the row format is specified by column widths, pad character and either left or right alignments. In the fixed width file format, column width is in terms of units of characters. Fixed width format files are usually generated by machines such as switches, SS7 etc. In this article, we will learn about Apache Hive fixed-width file loading options and some examples. Fixed-Width File Overview In general, fixed-length format files use ordinal positions, which are offsets to identify where…

Continue ReadingApache Hive Fixed-Width File Loading Options and Examples
5 Comments