Import data using Apache Sqoop

  • Post author:
  • Post last modified:March 7, 2019
  • Post category:BigData
  • Reading time:4 mins read

Sqoop allows easy import of data from structured data stores such as relational databases, enterprise data warehouses, and NoSQL systems. Using Sqoop, you can provision the data from external system on to HDFS, and populate tables in Hive and HBase. Sqoop can integrates with Oozie, allowing you to schedule and automate tasks.

Read:

Sqoop uses a connector based architecture which supports plugins that provide connectivity to new external systems.

The import is done in two steps as shown in Figure 1 . In the first Step, Sqoop introspects the database to gather the necessary metadata for the data being imported. The second step, is a map-only Hadoop job that Sqoop submits to the Hadoop cluster. It is this job that does the actual data transfer using the metadata captured in the previous step.

sqoop import

HDFS directory holds the data imported from RDBMS. Any other alternative HDFS directory can be specified to import data.

By default, all imported files contain comma as a field delimiter, and new line as a records separator. You can easily override the format in which data is copied over by explicitly specifying the field separator and record terminator characters.

Sqoop supports different data formats for importing data. E.g. Specifying the option –as-avrodatafile with the import command to import data in avro format.

Hive Imports

Importing data into Hive metastore is same as running separate task to get data to HDFS and create table or partition on top of it.

Sqoop takes care of populating the Hive metastore (tables) with the appropriate metadata for the table and also invokes the necessary commands to load the table or partition as the case may be. You need to simply specify the option –hive-import with the import command rest sqoop will take care.

$ sqoop import –connect jdbc:netezza://localhost/MYDB –table ORDERS –username test –password **** –hive-import –target-dir /hive/warehouse/path

When you run a Hive import, Sqoop converts the data from the native datatypes within the external data store into the corresponding types within Hive. Sqoop automatically chooses the native delimiter set used by Hive. If the data being imported has new line or other Hive delimiter characters in it, Sqoop allows you to remove such characters and get the data correctly populated for consumption in Hive.

If you are using cloudera impala, than you should run INVALIDATE METADATA command to get records to Impala tables.

Importing to HBASE

Sqoop can be used to import the data of particular column family within Hbase table. This can be done by specifying the options that relate to the HBase table and column family being populated. All data imported to Hbase converted to string format and inserted in UTF-8 format.

$ sqoop import –connect jdbc:netezza://localhost/MYDB \
–table ORDERS –username Vithal
–password **** \
–hbase-create-table –hbase-table
ORDERS –column-family netezza
Sqoop-import-all-tables

The import-all-tables tool imports a set of tables from an RDBMS to HDFS. Data from each table is stored in a separate directory in HDFS. As a result of this command, Sqoop import creates the separate directories for each tables in given parent directory

For the import-all-tables tool to be useful, the following conditions must be met:

1. Each table must have a single-column primary key

2.You must intend to import all columns of each table

3.You must not intend to use non-default splitting column, nor impose any conditions via a WHERE clause.

Command shown below

$ sqoop import-all-tables –connect jdbc:netezza://localhost/MYDB –username Vithal –warehouse-dir –direct-split-size 2 \ –username test –password ****

Using password file with Sqoop

Sqoop can also read passwords from a password file. Sqoop password file can be HDFS file.

$ sqoop import –connect jdbc:netezza://localhost/MYDB –username Vithal  –table ORDERS –username test –password-file /etc/sqoop/conf/passwords/mysql-pass.txt
 

This Post Has One Comment

  1. Dhamodharan

    Good Article about sqoop. Thank you very much. I find some other sqoop commands also in http://www.geoinsyssoft.com while searching in google. Thank You.

Comments are closed.