Introduction to Teradata: 2014

CHAPTER 1
Teradata architecture.
Teradata's design is "massively parallel". There is a lot of redundancy designed into the system as we shall see.
Teradata is designed to run off of many computers simultaneously and these are referred to as Nodes, . If one node goes down the other nodes can take over and run the job of the downed node until it is brought back up (more on this later). This is one level of redundancy. Teradata has many levels of redundancy.
Each node has several processing units that work in parallel and these are called Access Module Processors or AMPs. The AMP's perform the I/O, they read in the data from disk and write out the data to disk. They can also sort data. They do the logical processing of the data such as calculations like summations, averages, maximum and minimum amounts.
The tables in a database are distributed equally across each AMP. For example, if a table has 8000 rows and we have a two node system, with each node having four AMP's, or a total of eight AMPs, then each AMP contains 1000 rows of the table. This means that each AMP only has to process 1000 rows, which is a lot faster than the 8000 rows of the table. The work is divided between AMP's making processing very fast. There is another layer of redundancy here as well. If an AMP fails the other AMP's can continue to do the reads, writes and processing for the downed AMP until it is brought back up.
The AMP's communicate with each other using a network called the BYNET. There are two BYNETS for redundancy, BYNET 0 and BYNET 1. If one goes down the other can continue to do the work until the downed BYNET is brought back up.
This is a very brief overview of a massively parallel system. To understand this conceptually we will use a deck of cards as an example.
A deck of cards has a total of 52 cards consisting of four suits and 13 cards for each suit. If we give one person the deck of cards and ask him to find the Ace of Diamonds he must look through the 52 cards to find it. If we instead divide the cards into suits and give four people the 13 cards from each suit and ask them to find the Ace of Diamonds, then only the person holding the Diamonds suit needs to look and only needs to search through 13 cards. If we instead give 52 people one card each and ask for the Ace of Diamonds then each person only has to look at one card. This is much faster than a single person searching through the 52 cards. This is a good example of why Teradata is designed the way it is and what is meant by "massively parallel" architecture.

AMPS working in parallel in a system.

AMPs working in parallel in a Teradata system

The AMPs communicate with each other via the BYNET

Shared Nothing Architecture
AMPS can not see each others data and can't access each others disk space. Each AMP has its own disk space and works separately from the other AMPs. This is referred to as shared nothing architecture. There is no overlap of processing by the AMP's.

What is an AMP?
What is meant by Massively Parallel?
What is the BYNET?

Sunday, February 9, 2014