Archive

Posts Tagged ‘sql’

CURSOR_SHARING : a picture is worth a 1000 words

August 28th, 2013


Anyone who has been around Oracle performance over the years knows the grief that hard parsing SQL queries can cause on highly concurrent applications. The number one reason for hard parsing has been applications that don’t use bind variables. Without bind variables queries that would otherwise be shared get recompiled because their text is different and Oracle treats them as different queries. Oracle addressed this issue with a parameter called cursor_sharing. The parameter cursor_sharing has three values

  1. exact – the default
  2. similar – replace literals with bind variables, if a histogram keep literal in place
  3. force – replace literals with bind variables and use existing plan if it exists

Here is what the load looks like going from the default, exact, to the value force on a load of the same query but a query that doesn’t use bind variables:

looks like a significant load savings – impressive!
Now many people tell me that they think there are bugs with “force” and that you should use “similar”. The value similar does a similar thing but if there are histograms on the column, then Oracle will attempt, in certain cases, to have different plans based on different values. Sounds cool huh? Well their are bugs. Here is the same load with similar:
If we look at the different child cursors for this statement we find that Oracle, instead of sharing the children creates a different one for each execution:
This bug still seems to exist on 11gR2 :
Here is the code for the examples I (run by 8 users on 10g and 12 users on 11g)
--alter session set cursor_sharing=exact;
--alter session set cursor_sharing=force;
--alter session set cursor_sharing=similar;
declare
 l_cursor integer default 0;
 stmt varchar2(400);
 ret number;
BEGIN
 select hparse.nextval into ret  from dual;
 dbms_random.seed(ret);
 FOR i IN 1..1000  LOOP
   l_cursor:=dbms_sql.open_cursor;
   stmt:='SELECT  count(*) FROM t1 where c1  < '|| 
     dbms_random.value()||' and c2  < '||dbms_random.value();
   execute immediate stmt into ret;
   dbms_sql.close_cursor(l_cursor);
 END LOOP;
END;
/
Table t1 has no histograms. In the case above it had one row, but results were similar with no rows:
create table t1 (c1 number, c2 number);
insert into t1 values (0,0);
commit;
The issue should be addressed in 11g with a combination of cursor_sharing and adaptive cursor sharing
Also see Charles Hooper’s blog post on this topic at

Oracle, performance, sql, wait events , , ,

SQL joins visualized in a surprising way

August 22nd, 2013

delphix_logo_color      VDBs      delphix_logo_color


Saw a good posting on SQL joins today that echoes a classic image of SQL joins:

Visual_SQL_JOINS_orig

I loved this graphic when I first saw it. Seeing the graphic made me think “wow, I can actually wrap my mind around these crazy SQL joins.”

But there is more to SQL joins than meets the eye, at least in these pictures. These pictures leave out the effects of projection and amplification. For example, just taking the simplest case of a two table join (an inner join):

Screen-Shot-2013-06-25-at-10.16.39-PM

The intersection in the above graphic makes sense and looks simple, but to get a simple intersection requires that the two sets, the two tables,  be related by one-to-one relationships. Let’s take a simple query to illustrate the point:

Screen-Shot-2013-06-26-at-1.00.06-PM

In the query we join tables A and B. In blue I’ve added a “predicate filter” which limits the rows that are joined.


Screen-Shot-2013-06-25-at-10.16.48-PM

In a one to one relationship, for every value 1 in table A.field there will be one and only one value in table B.field. This is what the join diagram is showing, but how often do data models have one-to-one relationships? Sure it happens once in a while but the main relationship is one-to-many, which actually causes projections and not intersections.


Screen-Shot-2013-06-25-at-10.04.35-PM

Screen-Shot-2013-06-25-at-10.04.29-PM

Most often tables are related by a one to many relationship, a parent to child relationship. For example one customer can have many orders, but each order pertains to one and only one customer. With one to many we no longer get the neat intersections of two circles but a “projection” of one set onto the other. The number of rows is limited by the maximum rows returned by the predicate filter in table A and B.

The most surprising case is amplification or multiplying of rows returned due to many-to-many relationships and thus illustrating part of the reason why many-to-many relationships are problematic


Screen-Shot-2013-06-25-at-10.04.15-PM

Screen-Shot-2013-06-25-at-10.04.24-PM

With many to many relationships, the maximum rows returned is the number of rows returned in A with the predicate filter multiplied by the number of rows in B returned  after the predicate filter  all divided  by the minimum of the number of distinct values (NDV) returned on A or B after the predicate filter is applied.

In the above example that is (4*2)/ min(1,1) = 8

With more rows returned by either or both of table A and B, the effect can flood a query with rows to process.

All of this illustrates to me that SQL can be complex, more complex than the useful graphics at the top of the page would

Uncategorized

Facebook Schema and Performance

August 21st, 2013


From the article “Facebook shares some secrets on making MySql scale

“800 million users and handling more than 60 million queries per second” …”4 million row changes per second.”

and that was almost a two years ago. Think what it’s like now!

Ever wonder why Facebook limits your friends to 5000?  Does Facebook want to stop people from using it to promote themselves?

Ever see this message “There are no more posts to show right now” on Facebook?

Notice it says “There are no more posts to show right now.”

I got this message when scrolled back in “friends” status updates. After scrolling back a few days I hit this message. The message is strange since  thousands of more status updates that Facebook could have shown me. Why did I run into this message?

Is Facebook just being capricious or dictatorial in how it is used? I don’t know but I think the more likely answer is much more mundane and possibly quite interesting. The reason may be just simple technical limitations.

How could would/should/could status updates be stored on Facebook?

The first thing that comes to mind is something like these tables in a relational database:In the above design there are 3 tables

  • Facebook accounts
    • ID
    • email
    • other fields
  • Friends
    • user id
    • friend’s id (contact id or c_id)
  • Status Updates
    • ID of the account making the update
    • status update
    • date of status update

So if sue@y.com logs onto Facebook, then Facebook needs to go and get the status updates of her friends/contacts. First step is to get a list of friends and second step is to get a list of updates from those friends. In SQL this might look like:

    Select  id, status
    From updates
    where id in (select c_id from contacts where id=2)
    order by date

As the number of friends and status updates increases, then this query is going to take longer and longer. Maybe this is the reason why Facebook limits the number of friends and the history.  How can the response time for  the retreval of updates of friends be kept at constant time ?

First, the home page only has to show, at least initially, something like 20 updates. The above query can be wrapped with a top 20 s0mething like

   select * from (
      Select  id,status
      From updates
      where id in (select c_id from contacts where id=2)
      order by date)
   where rownum < 20;

But really, that’s not going to do much good because the query still has to create the result set before sorting it by date then limiting the output to 20 rows. You could add a date limiter on the updates:

   select * from (
      Select  id,status
      From updates
      where id in (select c_id from contacts where id=2) and
      date <= current_date - 2_days
      order by date)
   where rownum < 20;

Seems facebook has a limit on the number of days returned and the number of friends, but there isn’t AFAIK, a limit on the number of updates that friends can do, so as they do more updates, the query takes longer and longer.

What kind of other design could be used? To speed up the query data could be denormalized a lot or a little. For a small change in the data, the date could be added to the list of friends meaning we can limit updates by the date field in  friends instead of all the updates themselves  as in:
Now the query becomes something like

   Select  status
   From updates
   where id in  (  select c_id from
                    (select c_id from contacts where id=2  order by date)
               where rownum < 20 )
   order by date

Instead of having to select status updates from all the friends, the query just selects the 20 (or less) friends who have had the most recent updates.

Or one could go a step farther such that when you post a status update,  a row gets inserted for each of your friends,  such that every friend has your update associeted with them and then all that has to be done is select the top 20 updates from that list. No joining. And if  indexed, then the rows returned can be precisely limited to those 20 rows. On the other hand this creates an enormous amount of insert data and data redundancy. Maybe have two tables, 1 status updates with a unique id and 2  a table with all friends updates. The second table would have every user and for each user a line that contains the status update ids of all their friends and a timestamp.    So if I wanted status updates for my friends, I just get the last 20 status update ids from this table for me and then get the actual content for 20 status updates. Still this keeps a lot of unnecessary information. On the other hand I don’t need to keep the data for that long – maybe the last couple days and beyond that the system could fall back to some of the join query above.

What other kinds of optimizations could they do ?  What would the pros be of a other methods? What are the cons?

This has already been solved a number of times at a number of places.  I haven’t been involved in any nor am I involved in any of these architectural questions right now, but it’s interesting to think about.

Why does Facebook want to know who your close friends are? Is it because they care or because it helps prioritize what status up dates to denormalize? Why do the limit friends  to 5000? Is it because they really care or is scaling issue?

 

Related Reading:

Twitter

id generation

http://engineering.twitter.com/2010/06/announcing-snowflake.html

http://highscalability.com/blog/2011/12/19/how-twitter-stores-250-million-tweets-a-day-using-mysql.html

Facebook schema

http://www.flickr.com/photos/ikhnaton2/533233247/

Facebook lamp stack

http://itc.conversationsnetwork.org/shows/detail4746.html

how does Facebook do it

http://ask.metafilter.com/82769/How-is-Facebook-doing-its-queries

ebay

http://www.addsimplicity.com/downloads/eBaySDForum2006-11-29.pdf

high scalability

http://highscalability.com/

http://memcached.org/

scaling

http://danga.com/words/2007_06_usenix/usenix.pdf

Flickr

http://radar.oreilly.com/archives/2006/04/database-war-stories-3-flickr.html

Myspace

http://www.baselinemag.com/c/a/Projects-Networks-and-Storage/Inside-MySpacecom/

dealing with stale data

http://www.mnot.net/blog/2007/12/12/stale

Facebook schema

http://upload.wikimedia.org/wikipedia/commons/9/95/Metamodel_of_Facebook.jpg

Uncategorized ,