Graphics for SQL Optimization

January 4th, 2017

Dan Tow, in his book SQL Tuning, lays out a simple method of tuning SQL queries. The method is

  • Draw a diagram of each table in the query with Children above Parents
  • Draw join lines between each join (many-to-many, one-to-many)
  • Mark each table with a predicate filter and calculate the amount of table filtered out

Then to find a great optimal optimization path candidate

  1. Start at the table with the strongest predicate filter (the filter that returns the fewest % of the table)
  2. join down to children (if multiple children join to child with strongest predicate filter)
  3. If you can’t join to children, join up to parent

The basics are pretty simple and powerful. Of course there are many cases that get more complex and Dan goes into these complex cases in his book.

What about indexes? Well the method will point out joins that should happen and if those joins are missing indexes then it indicates that indexes should be created.

What about join type? I generally leave this to the optimizer. The join type can be important but generally order of joins and indexes are more important. I look at join type as the final optimization.

Let’s take an example query:

SELECT COUNT (*)
FROM   a,
       b,
       c
WHERE
       b.val2 = 100 AND
       a.val1 = b.id AND
       b.val1 = c.id;

There are  indexes on b.id and c.id.  Diagramming the query in DB Optimizer gives

sql1

The red lines with crows feet mean that as far as the definitions go, the relations could be many to many.

Question is “what is the optimal execution path for this query?”

One of  the best execution plans is to

  1. start at the most selective filter table
  2. join to children  if possible
  3. else join to parent

There is one filter in the diagram, represented by the green F on table B. Table B has a filter criteria in the query “b.val2=100″.

Ok, table B is where we start the query. Now where do we go from B? Who is the parent and who is the child? It’s not defined in the constraints nor indexes on these tables so it’s hard for us to know. Guess what ? It’s also hard for Oracle to figure it out. Well, what does Oracle decide to do? This is where the cool part of DB Optimizer  comes in.

The super cool thing with DB Optimizer is we can overlay the diagram with the actual execution path (I think this is awesome)

sql2

For the digram we can see Oracle starts with B and joins to A. The result if this is joined to C. Is this the optimal path?

Well, let’s keep the same indexes and just add some constraints:

alter table c add constraint c_pk_con unique (id);
alter table b add constraint b_pk_con unique (id);

Now let’s diagram the query with DB Optimizer:

sql3

We can now see who the parent and child is, so we can determine the optimal query path which is to start at B, the only filter and  join to the child C then to the parent A.  Now what does Oracle do with the added constraint info:

sql4

Guess what? The execution plan has now changed with the addition of constraints and now Oracle’s execution path goes from a suboptimal plan to  the optimal path. Moral of the story is to make sure and define constraint information because it helps the optimizer, but what I wanted to show here was the explain plan overlay on the diagram which makes comparing execution plans much easier. Putting the queries VST diagrams side by side along with the overlay of execution path we can clearly and quickly see the differences:

sql5sql6

I plan to blog more about this awesome feature. It’s really cool.

Here is an example from an article by Jonathan Lewis

http://www.simple-talk.com/sql/performance/designing-efficient-sql-a-visual-approach/

The query Jonathan discusses is

SELECT order_line_data
FROM
         customers cus
         INNER JOIN
         orders ord
         ON ord.id_customer = cus.id
         INNER JOIN
         order_lines orl
         ON orl.id_order = ord.id
         INNER JOIN
         products prd1
         ON prd1.id = orl.id_product
         INNER JOIN
         suppliers sup1
         ON sup1.id = prd1.id_supplier
   WHERE
         cus.location = 'LONDON' AND
         ord.date_placed BETWEEN '04-JUN-10' AND '11-JUN-10' AND
         sup1.location = 'LEEDS' AND
    EXISTS (SELECT NULL
            FROM
                 alternatives alt
                 INNER JOIN
                 products prd2
                 ON prd2.id = alt.id_product_sub
                 INNER JOIN
                 suppliers sup2
                 ON sup2.id = prd2.id_supplier
           WHERE
                  alt.id_product = prd1.id AND
                  sup2.location != 'LEEDS')

which diagrammed looks like

sql7

There are multiple filters, so we need to know which one is the most selective to know where to start, so we ask DB Optimizer to display the statistics as well  (blue below a table is the filter %, green above is # of rows in table and numbers on join lines are rows returned by a join of just those two tables)

sql8

Now that we can determine a candidate for best optimization path, does Oracle take it?

sql9

Can you find the optimization error?

Dark green is where execution starts. There are two starts: one for the main query body and one for the subquery.

The red is where query execution ends.

PS a big part of this work is by the lead developer Matt Vegh. Many thanks to Matt for this awesome work.

 

PPS another example from Karl Arao

sql10

The dark green nodes are starts, so there are 4 separate starts. We can see how the result sets from each start are joined with each successive table join set. The red is the final step.


Uncategorized

  1. Trackbacks

  2. No trackbacks yet.
  1. Comments

  2. No comments yet.


9 × = thirty six