DWgeek.com

Amazon Redshift Array Support and Alternatives – Example

It is common to store values in the form of array in the database. Many databases like Netezza, PostgreSQL supports array functions to manipulate the array types. However, Amazon Redshift does not support array types. In this article, we will check alternative method that you can use. Amazon Redshift Array Support As mentioned in the previous section, Amazon Redshift does not support array types or functions. Though, Redshift uses PostgreSQL, but they yet to provide support to arrays. You can check unsupported features in the official documentation. Best part is,…

Comments Off

September 20, 2019

Redshift

Amazon Redshift Load CSV File using COPY and Example

Many organizations use flat files such as CSV or TSV files to offload tables, managing flat files is easy and can be transported by any electronic medium. In this article, we will check how to load or import CSV file into Amazon Redshift using COPY command with some examples. Page Content Introduction Preparing for Redshift Load CSV Load CSV File using Redshift COPY Command Redshift COPY command with Column Names Redshift COPY command to ignore First Line from CSV Redshift COPY Command with Custom Delimiter Best Practices for Loading CSV…

Comments Off

September 19, 2019

Redshift

How to Create Redshift Table from DataFrame using Python

As you know, Python is one of the widely used Programming languages for the data analysis, data science and machine learning. When analyzing data using Python, you will use Numpy and Pandas extensively. In this article, we will check how to create Redshift table from DataFrame in Python. Here DataFrame is actually referred to pandas not Spark. I will write another article on how to create tables out of Spark DataFrame, but for now let us stick to pandas df. Python Pandas DataFrame The pandas DataFrame's are really very useful…

Comments Off

September 18, 2019

BigData

How to Exclude Hive Partition Column From SELECT Query

Apache Hive is a data warehouse framework on top of Hadoop HDFS. Hive is a high level language to store and analyse large volumes of data. Apache Hive support most of the relational database features such as partitioning large tables and store values according to partition column. But, Hive stores partition column as a virtual column and is visible when you perform 'select * from table'. In this article, we will check method to exclude Hive partition column from a SELECT query. Hive Table Partition Partition in Hive table is…

Comments Off

September 17, 2019

Redshift

How to Change Redshift Table Sort Key and Example

Just like using a right distribution style, selecting right sort key is important to boost your Redshift database performance. How data is stored onto the disk blocks defines the performance of the queries that you are running against those tables. It is not that easy to change the sort key once defined. In this article, we will check how to change Redshift table sort key with an example. Page Content Introduction Understanding the Redshift Table Sort Key Why to Change Redshift Table Sort Key? How to Change Redshift Table Sort…

Comments Off

September 13, 2019

Redshift

How to Change Redshift Table Distribution Style and Example

Choosing right distribution style is one of the important factors to improve the performance of Redshift Database. Distribution style will directly affect the performance of your query. Table with wrong distribution style might hamper your Redshift cluster because of workload. In this article, we will check how to change Redshift table distribution style with an example. Why to Change Redshift Table Distribution style? There are three types of distribution style available in Redshift: EVEN distributionKEY distributionALL distribution You can choose any of the style based on your data, size and…

Comments Off

September 13, 2019

Redshift

Redshift Show and Describe Table Command Alternative

When you work on relatively big Enterprise data warehouse (EDW), you will have a large number of tables with different structures. The table structure includes, column name, type of data type, distribution style or sort key use. Redshift does not provide show (list) or describe SQL command. Though, you can use psql command line options. In this article, we will check what are Redshift show and describe table command alternative with an examples. Redshift Show and Describe Table Command Alternative As mentioned earlier, the Redshift SQL reference does not provide…

Comments Off

September 12, 2019

BigData

Identify and Remove Duplicate Records from Hive Table

Apache Hive being batch processing engine, does not support primary, foreign or unique key constraints. You can insert the duplicate records in the Hive table. There are no constraints to ensure uniqueness or primary key, but if you have a table and have loaded data twice, then you can de-duplicate in several ways. Below methods explain you how to identify and Remove duplicate records or rows from Hive table. Remove Duplicate Records from Hive Table Apache Hive does not provide support to many functions or internal columns that are supported…

Comments Off

September 12, 2019

Redshift

Identify and Remove Duplicate Records from Redshift Table

Redshift do not have a primary or unique key. You can define primary, Foreign or unique key constraints, but Redshift will not enforce them. You can insert the duplicate records in the Redshift table. There are no constraints to ensure uniqueness or primary key, but if you have a table and have loaded data twice, then you can de-duplicate in several ways. Below methods explain you how to identify and Remove duplicate records from Redshift table. Remove Duplicate Records from Redshift Table There are many methods that you can use…

Comments Off

September 11, 2019

Redshift

Redshift Cumulative SUM, AVERAGE and Examples

The cumulative sum or running total is one of the interesting problems where you have to calculate the sum or average using current result and previous row value. Most of the modern analytical database line Netezza, Teradata, Oracle, Vertica provides supports to analytical functions. You can make use of those analytical functions along with window specification to calculate the cumulative sum and average. In this article, we will check how to calculate Redshift Cumulative Sum (running total) or cumulative average with some examples. Redshift Cumulative Sum As explained earlier, cumulative…

Comments Off

September 11, 2019