Joins Explained: Nested Loop Joins

Understanding the way your SQL statement is executed is a fundamental step in improving the resource usage and response time. Accessing more than one table in a query almost certainly leads to a join operation. This is the first article in a series to look into the different types of table joins. It shows how the nested loop join works and elaborates its advantages and disadvantages.

A join takes two sets of data and generates one result set by choosing matching pairs according to a given join condition. A simple way to do that is by looping (hence the name nested loop join) through one of the sets, apply the join condition and use that to access the second set. When a match is found it is returned as part of the result.

Let’s take a visit to a CD retailer as an example. You want to buy a present for a friend and so you brought a wishlist with you. Normally you have a small number of entries on your list and a huge number of articles in the store. You start by taking a look at the first entry on your list, walk to the appropriate shelf and check the availability of the CD. If you don’t find it there or it does not meet other selection criteria (maybe it is to expensive after all) then you look at the next entry on your list and walk to the corresponding shelf. While you are in the shop you get a call on your mobile from someone who has an additional suggestion. No problem, you simply add that CD to your list and check it as well. You are done when you reach the last item of your list. Basically you use your list as a driver for your search.

Analysis

Finding the correct shelf for an item is essential for the time needed to complete your shopping. If the shop has arranged the CDs by artist and you only have an album title then you will have a hard time unless there is some kind of index where you can look up all artists for a given album name.

The time you need mostly depends on the number of items on your list. The time to find ten items is roughly ten times the time needed for one item.

It is not necessary to complete the whole list if you are looking for a limited number of items. You can easily stop shopping after you have found the number of CDs you want to give away.

It doesn’t matter if your list is incomplete when you start the search. You can still add items to your list after you started shopping.

The items on your list might be unordered and if you blindly follow the order on your list then you may have to return to a shelf where you have been before.

Conclusion

The nested loop join works very well when your driving data set is small and the second data set has a supporting index for the join condition. In this case the size of the second set is almost negligible.

The join operation can start when the first entries in the driving set are available, so there is no need to wait for preceding operations to complete. Also the first results are delivered right away and this makes the nested loop a perfect candidate for online processing where a user is waiting for the result to show up on the screen.

The nested loop join has one drawback. It tends to hit the same items over and over again if the driving set gets large and the lookup set is relatively small. In this case the nested loop is rather inefficient and other join methods are prefered. More about this in the upcoming articles of the series.