SQL Server supports three physical join operators: Nested Loops, Merge and Hash.
The merge join requires that each table be sorted on the join keys. The merge join works by simultaneously reading and comparing the two sorted tables one row at a
time. At each row, the merge join compares the next row from each table. If the rows match, the merge join outputs the joined row and continues on. If the rows do
not match, the merge join discards the lesser of the two rows from the tables and continues. Because the tables are sorted, the merge join knows that it is discarding
a row that is less than any remaining rows in either table.
In pseudo-code, it shall look something like the following.
get first row Row1 from Table1
get first row Row2 from Table2
while not at the end of either Table
begin
if Row1 joins with Row2
begin
return (Row1, Row2)
get next row Row2 from Table2
end
else if Row1 < Row2
get next row Row1 from Table1
else
get next row Row2 from Table2
end
Below is an example of a select statement in which the optimizer should use a merge join operator.
select * from AdventureWorks.Person.Contact e
left join AdventureWorks.HumanResources.Employee c on c.ContactID = e.ContactID
Performance of the merge join is associated to the number of rows in each table. A merge join more than likely is a better choice for larger inputs compared to a
nested loops join. Each table in the merge join is read only once.
No comments:
Post a Comment