This article is about how Teradata data distribution works on the various AMP’s present in the Teradata system.
What is AMP?
Before going in detail in the Teradata data distribution, lets check what AMP is? AMP, acronym for Access Module Processor, is the Virtual Processor (vproc) used to manage the database, handle file tasks and and manipulate the disk subsystem in the multi-tasking and possibly parallel-processing environment of the Teradata Database.
Each AMP then contained its own microprocessor, disk drive, file system, database software (Database Manager), Teradata Operating System (TOS), and YNET interface. There may be multiple AMPs on one node, and the communication among the AMPs is controlled by the BYNET.
You can read about Teradata Architecture here – Teradata Architecture – Components of Teradata
Read:
- Commonly used Teradata Date Functions and Examples
- Teradata Set Operators: UNION, UNION ALL, INTERSECT, EXCEPT/MINUS
- Teradata Architecture – Components of Teradata
How Teradata Data Distribution Works?
An index is a physical mechanism used to store and access the rows of a table. Indexes on tables in a relational database function much like indexes in books, they speed up information retrieval. The Teradata distributes the data based on the primary index (PI) that you create during table creation. Unique Primary Indexes (UPIs) guarantee uniform distribution of table rows across all AMP’s.
When the client runs queries to insert records, Parsing engine sends the records to BYNET. The data is hashed using hash algorithm on Primary Index (PI). The hash algorithm produces hash bucket value and row hash value. Hash bucket value will be having AMP’s numbers. The BYNET sends the records to respective AMP’s based on Hashed values and AMP will intern store that in it’s associated disks.
Below picture depicts the actual data distribution in Teradata:
Note that, you should create unique promar index on the tables otherwise your Teradata tables will be skewed that is, data will be accumulated on the single AMP and that could impact the system performance. Follow the standard rules to define primary index on Teradata tables.
How can you view Data Distribution in Teradata ?
Teradata uses HASH values to store data in various AMPs in the Teradata system. To view the table data distribution you have to use Hash functions. These Hash functions are usually used over primary index columns to find out actual data distribution .
SELECT HASHAMP(HASHBUCKET(HASHROW(<PRIMARY INDEX>))) AS "AMP",COUNT(*) FROM <your_table> GROUP BY 1 ORDER BY 2 DESC;
If you have unique primary index defined then the data will be even distribution across the AMP’s.
Read:
- Teradata Analytics Functions and Examples
- Netezza and Teradata Comparison: Netezza vs Teradata
- Teradata Architecture – Components of Teradata