Search This Blog

Wednesday, April 28, 2010

Product Join and Merge Join and Hash Join

A product join, every row of the left table to every row of the right table. Typically it is chosen as the join method, when one of the tables is small. Since it basically needs to read every row of the left table once, and every row of the right table n times, where n is the number of rows in the left table, it can be expensive to execute, if the number of rows in the left table is large.

The typical alternative is merge join. The extra step, however, that is needed to run a merge join is to sort the two source tables. If the tables are already in sorted order this step is not necessary. A merge join executes very fast, since it only reads both tables once. In addition, it is possible, depending on the join selectivities, that not all the datablocks of both tables need to be read.

The hash join does not require that both tables are sorted. The smaller table/spool is "hashed" into memory (sometimes using multiple hash partitions). Then, the larger table is scanned and for each row, it looks up the row from the smaller table in the hashed table that was created in memory. If the smaller table must be broken into partitions to fit into memory, the larger table must also be broken into the same partitions prior to the join.

So, the main advantage of the hash join is that the big table does not have to be sorted and the small table can be much larger than for a product join.


1 comment:

  1. Thanks for Information Teradata is a Relational Database Management System (RDBMS) for the world’s largest commercial databases. Teradata can store data upto Teradata bytes in size. This makes the Teradata as a market leader in data warehousing applications. Teradata Online Training

    ReplyDelete