Steps to Generate and Load TPC-DS Data into Netezza Server

  • Post author:
  • Post last modified:July 19, 2021
  • Post category:Netezza
  • Reading time:6 mins read

The TPC-DS benchmark model decision support system of a retail product supplier. This includes various queries and data maintenance. Database schema, data population, queries, data maintenance model and implementation rules have been designed to be broadly representative of new or modern decision support systems. In this post, we will discuss steps to generate and load TPC-DS data into Netezza or puredata systems server.

What is TPC-DS?

The Transaction Processing Performance Council (TPC) is a benchmark model decision support system. Several major firms are member of TPC. You can get more information about TPC and members in the official website. 

I have personally used TPC-DS to build one of machine learning model and compare results across different environments.

Steps to Generate and Load TPC-DS Data into Netezza Server

Below are the steps to generate and load TPC-DS data into Netezza server:

You can download TPC-DS tool from official website.

  • Download TPC-DS tool

You can download TPC-DS tool () from official website.

  • Copy downloaded * -tpc-ds-tool.zip tool to any of edge node or host machine and unzip it
unzip <>-tpc-ds-tool.zip
  • Generate Binary file

Navigate to tools directory in unzipped folder and make file to generate binary files required to generate TPC-DS data

make -f Makefile.suite
  • Generate TPC-DS data

Now you have required tools and binary files. Execute dsdgen tool to generate data.

$ ./dsdgen -scale 1 -dir /export/home/nz/tpcdata
dsdgen Population Generator (Version 2.10.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2018
Warning: This scale factor is valid for QUALIFICATION ONLY

Note -scale parmeter define the size of data to be generated.

  • -scale 1 : Generate 1GB data
  • -scale 100: Generate 100GB data
  • Create Database in Netezza or Puredata systems

You have generated retail data. You need database on Netezza system to create DDLs and load data, create database on Netezza or puredata systems for analytics

SYSTEM.ADMIN(ADMIN)=> create database NEWTPCDS;
  • Create tables in Netezza system

Next step is to connect to newly created database and execute tpcds.sql script.

NEWTPCDS.ADMIN(ADMIN)=> \c NEWTPCDS
You are now connected to database NEWTPCDS.
NEWTPCDS.ADMIN(ADMIN)=> \i tpcds.sql;

Note that, you need to enable random table distribution otherwise Netezza system will distribute data on first column.

Set enable_random_table_distribute = true;
  • Load TPC-DS data into Netezza Server

Now you have data, database and required tables. Next step is to load data using nzload utility.

For example,

nzload  -db NEWTPCDS -t STORE -df store.dat -delim ‘|’

Read: Netezza nzload command and its usage

  • Generate TPC-DS queries

You can even generate queries for loaded TPC-DS data

# Query Generate
[nz@netezza tools]$ ./dsqgen \
> -DIRECTORY ../query_templates \
> -INPUT ../query_templates/templates.lst \
> -VERBOSE Y \
> -QUALIFY Y \
> -SCALE 100 \
> -DIALECT netezza \
> -OUTPUT_DIR /export/home/nz/querytpcds
qgen2 Query Generator (Version 2.10.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2018
Warning: Selected scale factor is NOT valid for result publication
Parsed 99 templates
[nz@netezza tools]$

You should have query_0.sql file will all 100 queries copied to it.

ERROR: Substitution’_END’ is used before being initialized

If you ever get error like one mentioned above, add define _END = “”; at the beginning of each template files available in query_templates folder.

Hope this helps 🙂