Archive

Archive for the ‘graphics’ Category

Right Deep, Left Deep and Bushy Joins in SQL

May 5th, 2014

What is right deep verses left deep? Good question. In join trees (not VST) the object on the left is acted upon first then the object on the right.  Below are left deep and right deep examples of the same query, showing

  • query text
  • join tree
  • join tree  modified to more clearly show actions
  • VST showing the same actions

left_right_deep copy

All  of this boils down to the point that a right deep HJ can return rows earlier than a left deep HJ. A left deep HJ has to wait for each join to finished completely so that the result set can be hashed before  the next step can probe it. On the other hand, in a right deep HJ, it’s not the result sets that are being hashed, but a table at each level, thus each table can be hashed without waiting for intermediary results and once these hashes are complete  a probed row can flow all the way through the join tree,  from bottom to top, similar to how a nested loop can start giving results early.  The Left Deep HJs only have two open work areas at a time where as the Right Deep can have multiple work areas open.  One of the key thing to keep in mind is how much data we have to hash. If the intermediate result sets are large (and/or growing each level) then that represents more work each step of the way.

Normally I don’t think in left deep verses right deep because it doesn’t change the the path through the join tree. On the other hand it does change whether we are hashing a result set or we are hashing a table. If hashing a table then the results can be returned, in theory, more quickly.

For NL joins there are only left deep join. The object on the left always probes the object on the right.  The object on the left is always accessed first ( there is no need to modify the object on the left  first and probe from the right with NL).

 download (1)

Besides left deep and right deep, there are  also bushy joins which Oracle doesn’t do unless forces to through sub-queries or views as in the following:

download (2)

Finally,  the above VST diagrams were modified to be more easily compared to the join tree diagrams. Below are the VST diagrams as displayed by default in Embarcadero’s DB Optimizer.   DB Optimizer shows the tables all in a line because the tables are one to one relationships.  Keeping the VST diagram layout constant, it is easier to see the differences in execution paths:

vst_bushy_copy

Note on hinting and order of tables in explain plan:

For NESTED LOOPS and HASH JOINS the hint takes one argument, which is the second table that shows up in the explain plan.

For NESTED LOOPS the second table is the table probed into i.e. the second in order of access.

For HASH Joins the second table is the table doing the probing into the hash result set. Hashing is the first operation which creates the hash result set that the second table probes into.

For NESTED LOOPS  if order is “LEADING(X,Y) then  nested loops hint can only be on Yie USE_NL(Y)

For HASH JOINS if the order is “LEADING(X,Y) then the hash join hint can only be on Y, ie USE_HASH(Y)

Screen Shot 2014-05-06 at 6.51.02 PM

A couple of good references on USE_NL and USE_HASH

For more information see:

graphics, sql

Critical importance of data visualization

January 3rd, 2014

 

 

10214063904_6c6cfc7dfc_z

photo by David Blackwell.

Not sure if you can imagine or have ever experienced a meeting where you bring in your statspack or AWR report, all 30 pages of it, point out some glaring issues that anybody could see and proposed some precise solutions, only to have the management team’s eyes glaze over. Then after you finish your pitch they all start arguing as to what the problem might be despite your clear presentation of the problem and solution.
Have you ever had that same meeting with a printout of top activity from Oracle Enterprise Manager, with it’s load graph of average active sessions and it’s break down as to where the load comes from in terms of CPU and waits and what the top SQL and Session are, and then you explain the problem and solution and they all nod their heads?
Clear presentation of data using graphics are critical to how fast people can understand the information and how comfortable they are in interpreting the information.

 

Screen Shot 2013-12-30 at 12.04.52 PM

 

Edward Tufte wrote a seminal analysis of the decision to launch the space shuttle on January 28, 1986. Some have been critical of the analysis but for reasons that are orthogonal to what I find important . What I find important is the shockingly huge impact the presentation format of data can have on the the viewers interpretation.

On the night before the shuttle launch, the engineers  who designed the solid rocket boosters were concerned that it would be too cold to launch. Cold was an issue because the joints in the solid rocket booster were a type of rubber which becomes stiffer the colder it is. As the rubber became stiffer, it’s capability to seal the joints declined and it increased the danger of solid rocket fuel burning through.

Screen Shot 2013-12-30 at 12.05.04 PM

 

The engineers stayed up late putting together information and faxing it out to the launch control in Florida. The engineers were concerned and trying to prevent the launch the next day. The engineers had information about the damage to the solid rocket boosters from previous flights. On previous flights the rocket boosters  were collected and analyzed for damage after they fell back to the ocean after each launch.  The engineers used this data to show how in past launches that  damage had been related to temperature  on the solid rocket boosters.

Here is a fax showing  the “History of O-Ring damage on SRM field joints:

Screen Shot 2013-12-30 at 12.05.13 PMThe first problem as Tufte points out is that this fax uses three different naming conventions for the data from previous launches which is confusing. Circled in red are the 3 different naming conventions, date, flight# and SRM #

Screen Shot 2013-12-30 at 12.05.22 PM

The fax gives overwhelming detailed information on the damage but no information on the temperatures and the goal was to show a correlation between temperatures and damage.

The next fax shows temperatures but missing many of the damaged flights and includes damage from test fires in the desert that were test fired horizontally not vertically nor with the same stresses  as actual flight.

Screen Shot 2013-12-30 at 12.06.07 PM

 

Finally the inclusion of  tangential data and the exclusion of other data led the comment that there was damage at the hottest flight and the coldest flight.

Screen Shot 2013-12-30 at 12.10.14 PM

 

But the conclusions in the faxes were clear. Estimated temperature at launch was to be 29-38 degrees and the shuttle should not be launched below 53 degrees

Screen Shot 2013-12-30 at 12.10.35 PM

If we take the data that was faxed and plot the number of damage incidents at the temperature which they occur we get a graph like

Screen Shot 2013-12-30 at 12.11.43 PM

 

Based on this information do you think there is a correlation between temperature and damage? Would you have launched the shuttle the next day? Remember that there was tremendous pressure to launch the next day.

Well they did launch and the rest is history. As seen in the picture below there is a white flame coming from one of the o-rings in the solid rocket booster. This flame burned into the liquid fuel and the space shuttle exploded.

Screen Shot 2013-12-30 at 12.11.10 PM

 

It was a national tragedy which led to a congressional investigation. As part of the congressional investigation, the information was drawn up in to graphics.  The  graphics were actually worse than the original faxes because they introduced so much chart junk.

Screen Shot 2013-12-30 at 12.11.18 PM

Screen Shot 2013-12-30 at 12.11.36 PM

 

OK, lets look back at the original data

Screen Shot 2013-12-30 at 12.11.43 PM

Now let’s take that data and change the y-axis to represent not a simple count of damage but a scale of how bad the damage was, and we get

Screen Shot 2013-12-30 at 12.11.51 PM

 

Now include the flights that had no damage, a major piece of information, which makes a huge difference already

Screen Shot 2013-12-30 at 12.11.57 PM

 

Now mark damages of a different type of a different color which is only the one that occurred at 75 degreeScreen Shot 2013-12-30 at 12.12.04 PM

Now at 70 degrees there were both successes and failures, so normalize (average) the damage there

Screen Shot 2013-12-30 at 12.12.11 PM

 

Now we are starting to see some important information

 

Screen Shot 2013-12-30 at 12.13.16 PM

 

We are also starting to see a stronger indicator of correlation

Screen Shot 2013-12-30 at 12.13.24 PM

 

But probably the most important piece of information is still missing – the temperature at which the launch the next day would take place:

Screen Shot 2013-12-30 at 12.13.30 PM

 

X marks the spot of the predicted launch temperature for the next day, January 28, 1986. The launch the next day was well outside the known world. It was so far it out,  that it was almost as big a leap away from the known world as the size of the known world of data was.

In summary

  • NASA engineers, they guys that blew us away putting a man on the moon, can still fail at communicating data clear.
  • Congressional investigators, some of the top lawyers in the country, can still fail at communicating data clearly.
  • Data visualization seems obvious, but it is  difficult.

but lack of clarity can be devastating

Further reading

 

stevej

photo by Steve Jurvetson

graphics

Latency Heat Maps in SQL*Plus

August 23rd, 2013


This is so cool !

Screen Shot 2013-05-10 at 1.13.15 PM

The above is so cool.

The graphic shows the latency heatmap of “log file sync” on Oracle displayed in SQL*Plus! SQL*Plus ?! Yes, the age old text interface to Oracle showing colored graphics.

How did I do this? All I did was type

sqlplus / as sysdba
@OraLatencyMap_event 3 "log file sync"

The script  OraLatencyMap_event was created by  , see http://externaltable.blogspot.com/2013/05/latency-heat-map-in-sqlplus-with.htm

Now if we combine Luca’s  monitoring, with the I/O throttling documeted by Frits Hoogland here https://fritshoogland.wordpress.com/2012/12/15/throttling-io-with-linux/ , we can really have some fun and even draw latency words:

analytics-3-heatmaps-crop

 
In the graphic at the top of the page I put lgwr in an I/O write throttle group and played with the I/O throttle.  I was running a swingbench load and at the same as throttling I/O such that latencies started off good then got worse and then back to normal.
 
The full steps are:
 

Run an auto refresh color coded heatmap on “log file sync” in sqlplus by typing

sqlplus / as sysdba
@OraLatencyMap_event 3 "log file sync"

where OraLatencyMap_event.sql and OraLatencyMap_internal.sql are  your current directory or sqlpath

Now to play with LGWR latency with cgroup throttles see

https://fritshoogland.wordpress.com/2012/12/15/throttling-io-with-linux/

# install cgroups on 2.6.24 LINUX or higher
yum intall cgroup

# setup /cgroup/blkio
grep blkio /proc/mounts || mkdir -p /cgroup/blkio ; mount -t cgroup -o blkio none /cgroup/blkio
cgcreate -g blkio:/iothrottle

# find the device you want
df -k
# my Oracle log file divice was
ls -l /dev/mapper/vg_source-lv_home
lrwxrwxrwx. 1 root root 7 May  1 21:42 /dev/mapper/vg_source-lv_home -> ../dm-2

# my device points to /dev/dm-2
ls -l /dev/dm-2
brw-rw----. 1 root disk 253, 2 May  1 21:42 /dev/dm-2

# my device  major and minor numbers are "253, 2"
# create a write throtte on this device (for read just replace "write" with "read"
# this limits it to 10 writers per second
cgset -r blkio.throttle.write_iops_device="253:2 10" iothrottle

# look for lgwr
ps -ef | grep lgwr
oracle   23165     1  0 13:35 ?        00:00:19 ora_lgwr_o1123

# put lgwr pid into throttle group
echo 23165     >  /cgroup/blkio/iothrottle/tasks

# now play with different throttles
cgset -r blkio.throttle.write_iops_device="253:2 1" iothrottle
cgset -r blkio.throttle.write_iops_device="253:2 10" iothrottle
cgset -r blkio.throttle.write_iops_device="253:2 100" iothrottle
cgset -r blkio.throttle.write_iops_device="253:2 1000" iothrottle

# if you are finished then delete the throttle control group
cgdelete  blkio:/iothrottle

graphics, Uncategorized , , ,

Latency heatmaps in D3 and Highcharts

August 20th, 2013


See Brendan Gregg’s blog on how important and cool heatmaps can be for showing latency information and how average latency hides what is really going on:

Now if we want to create heatmap graphics, how can we do it? Two popular web methods for displaying graphics are Highcharts and D3. Two colleges of mine whipped up some quick examples in both Highcharts and D3 to show latency heatmaps and those two examples are shown below. The data in the charts is random just for the purposes of showing examples of these graphics in actions.

Highcharts Heatmap


see code at http://jsfiddle.net/vladiweb/GNd3G/


 

D3 Heatmap


see code at
http://jsfiddle.net/eyalkaspi/YH8sw

graphics , ,