Showing posts with label Hash Join. Show all posts
Showing posts with label Hash Join. Show all posts

Friday, November 4, 2011

Hash Join comparison for larger data sets

Building upon my example from my Hash Join post, the following code outputs 3 execution plans that illustrate the efficiency of hash joins when working with large data sets.

I was able to get the query optimizer to use each kind of physical join operator by supplying a hint.

select * from AdventureWorks.Sales.Customer c
left join AdventureWorks.Sales.SalesOrderHeader soh on c.CustomerID = soh.CustomerID

select * from AdventureWorks.Sales.Customer c
left join AdventureWorks.Sales.SalesOrderHeader soh on c.CustomerID = soh.CustomerID
option (merge join)

select * from AdventureWorks.Sales.Customer c
left join AdventureWorks.Sales.SalesOrderHeader soh on c.CustomerID = soh.CustomerID
option (loop join)

Thursday, November 3, 2011

Hash Join

SQL Server supports three physical join operators: Nested Loops, Merge and Hash.

The hash join works well for large sets of data. The hash join has two phases, the build and probe. First in the build phase, the rows are read from the first table and hashes the rows based on the join keys and creates has table in memory. The second phase, the probe phase, the hash join reads all the rows from the second table and hashes these rows based on the same join keys. The hash join then returns the matching rows.

In pseudo-code, it shall look something like the following.

for each row Row1 in the build Table
     begin
          perform hash value calc on Row1 row join key
          insert Row1 row into hash bucket
          end
          for every row Row2 in the probe table
          begin
               calc the hash value on row Row2 join key
               for every row Row1 in the hash bucket
               if row Row1 joins with row Row2
                    return (row Row1, row Row2)
          end

Below is an example of a select statement in which the optimizer should use a merge join operator.

select * from AdventureWorks.Sales.Customer c
left join AdventureWorks.Sales.SalesOrderHeader soh on c.CustomerID = soh.CustomerID


SQL Server tries to use the smaller of the two tables as the build table. SQL Server does this to try and reserve precious memory. While SQL Server attempts a best guess at guessing the amount of memory needed, if SQL has guessed to low it will spill out to tempdb to continue joining for the hash join operation.