Wednesday, November 2, 2011

Merge Join

SQL Server supports three physical join operators: Nested Loops, Merge and Hash.

The merge join requires that each table be sorted on the join keys. The merge join works by simultaneously reading and comparing the two sorted tables one row at a time. At each row, the merge join compares the next row from each table. If the rows match, the merge join outputs the joined row and continues on. If the rows do not match, the merge join discards the lesser of the two rows from the tables and continues. Because the tables are sorted, the merge join knows that it is discarding a row that is less than any remaining rows in either table.

In pseudo-code, it shall look something like the following.

get first row Row1 from Table1
get first row Row2 from Table2
while not at the end of either Table
     begin
          if Row1 joins with Row2
          begin
                    return (Row1, Row2)
                    get next row Row2 from Table2
          end
          else if Row1 < Row2
               get next row Row1 from Table1
          else
               get next row Row2 from Table2
          end

Below is an example of a select statement in which the optimizer should use a merge join operator.

select * from AdventureWorks.Person.Contact e left join AdventureWorks.HumanResources.Employee c on c.ContactID = e.ContactID


Performance of the merge join is associated to the number of rows in each table. A merge join more than likely is a better choice for larger inputs compared to a nested loops join. Each table in the merge join is read only once.

No comments:

Post a Comment