SQL joins visualized in a surprising way

August 22nd, 2013

delphix_logo_color      VDBs      delphix_logo_color

Saw a good posting on SQL joins today that echoes a classic image of SQL joins:


I loved this graphic when I first saw it. Seeing the graphic made me think “wow, I can actually wrap my mind around these crazy SQL joins.”

But there is more to SQL joins than meets the eye, at least in these pictures. These pictures leave out the effects of projection and amplification. For example, just taking the simplest case of a two table join (an inner join):


The intersection in the above graphic makes sense and looks simple, but to get a simple intersection requires that the two sets, the two tables,  be related by one-to-one relationships. Let’s take a simple query to illustrate the point:


In the query we join tables A and B. In blue I’ve added a “predicate filter” which limits the rows that are joined.


In a one to one relationship, for every value 1 in table A.field there will be one and only one value in table B.field. This is what the join diagram is showing, but how often do data models have one-to-one relationships? Sure it happens once in a while but the main relationship is one-to-many, which actually causes projections and not intersections.



Most often tables are related by a one to many relationship, a parent to child relationship. For example one customer can have many orders, but each order pertains to one and only one customer. With one to many we no longer get the neat intersections of two circles but a “projection” of one set onto the other. The number of rows is limited by the maximum rows returned by the predicate filter in table A and B.

The most surprising case is amplification or multiplying of rows returned due to many-to-many relationships and thus illustrating part of the reason why many-to-many relationships are problematic



With many to many relationships, the maximum rows returned is the number of rows returned in A with the predicate filter multiplied by the number of rows in B returned  after the predicate filter  all divided  by the minimum of the number of distinct values (NDV) returned on A or B after the predicate filter is applied.

In the above example that is (4*2)/ min(1,1) = 8

With more rows returned by either or both of table A and B, the effect can flood a query with rows to process.

All of this illustrates to me that SQL can be complex, more complex than the useful graphics at the top of the page would


  1. Trackbacks

  2. No trackbacks yet.

  2. | #1

    Hi Kyle,
    Thanks for taking the time to elaborate on the SQL Joins visual (projection, amplification) with your graphics! Brings me back to my earliest SQL learning days circa 1990 :-). On the SQL Joins graphic, the bottom use-cases can also be explained (and coded) with intersection and union concepts of shaded regions. Of course, from an execution perspective, this has implications of performance depending on where those tables lie and/or pre-computed.

    From a completeness perspective, it may be useful to update this graphic with greater drill-down of your 1-to-Many and Many-to-Many scenarios, along verbal (English) statements on each of the 7 basic cases (“exclusively A”, etc…) to set the example in everyday terms.


  3. khailey
    | #2

    Hi Sri, Thanks for your feedback. Yes, plenty more to do that I’d like to do. Unfortunately right now I’m doing other work thaN SQL but would like to get back to the SQL at some point
    Best Wishes, Kyle

× five = 20