Netezza Hadoop Integration and different types of Ingestion

  • Post author:
  • Post last modified:February 28, 2018
  • Post category:Netezza
  • Reading time:2 mins read

Big Data and Netezza are two terms you hear lot about when you are working with loads of data. You want to process bunch of data and perform analytics on same. Sometimes it comes to raw data as well; you may get requirement to perform the analytics on the semi-structured data or unstructured data. Netezza Hadoop Integration comes into picture.

So now question is how can you perform low latency data analytics on above mentioned data sets?Answer is Netezza Hadoop integration. Process the semi-structured or unstructured data in Hadoop and ingest it to Netezza where you can perform some serious analytics with help of built in functions.

Semi-structured data ingest using Hadoop

Semi-structured data is a form of data that does not conforms to formal structure of RDBMS data model. The semi structured data includes the weblog or any sever logs.

netezza hadoop integration semi-structured

Using Hadoop as the data ingestion engine is very common practice of Netezza Hadoop integration. If you are working on the project like digital media analytics then weblog data is the primary data source. Weblog data includes page view, click, impression data etc are semi-structured data.

Semi-structured data is loaded, parsed and sometimes aggregated in the Hadoop Cluster and then loaded into a Netezza server for high performance analytics and faster reporting. You can use the Netezza Hadoop connector to connect to the Netezza server and transfer the processed semi-structured data.

Read:

Unstructured data ingest using Hadoop

Un-structured data refers to information that is not organized in a pre-defined manner. Unstructured data is typically a textual or contextualized data.

netezza hadoop integration unstructured-data

Unstructured data in this pattern is contextualized (classified, mined, keyworded and indexed) in Hadoop and then moved into a Netezza data warehouse appliance for the low-latency, high-performance analytics to drive business decisions.

Hadoop provides the best suited scalable mechanism to process the data as mentioned above and there are Netezza Hadoop connectors available to move data from Hadoop cluster to Netezza data warehouse appliance.