Archive for March, 2014

cache buffer chains latch

March 6th, 2014

After a Tanel Poder pointed out to me that someone had blogged my article verbatim, which I don’t really care about, but they had copied the article verbatim without crediting me, which I do care about, I decided I’d just re-blog this. The article is best formatted at link:


Waits on the cache buffer chains latch, ie the wait event “latch: cache buffers chains” happen when there is extremely high and concurrent access to the same block in a database. Access to a block is normally a fast operation but if concurrent users access a block fast enough, repeatedly then simple access to the block can become an bottleneck. The most common occurance of cbc (cache buffer chains) latch contention happens when multiple users are running nest loop joins on a table and accessing the table driven into via an index. Since the NL  join is basically a

  For all rows in i
     look up a value in j  where j.field1 = i.val
  end loop
then table j’s index on field1 will get hit for every row returned from i. Now if the lookup on i returns a lot of rows and if multiple users are running this same query then the index root block is going to get hammered on the index j(field1).
In order to solve a CBC latch bottleneck we need to know what SQL is causing the bottleneck and what table or index that the SQL statement is using is causing the bottleneck.
From ASH data this is fairly easy:

      nvl(o.object_name,ash.current_obj#) objn,
      substr(o.object_type,0,10) otype,
      CURRENT_FILE# fn,
      CURRENT_BLOCK# blockn
from  v$active_session_history ash
    , all_objects o
where event like 'latch: cache buffers chains'
  and o.object_id (+)= ash.CURRENT_OBJ#
group by sql_id, current_obj#, current_file#,
               current_block#, o.object_name,o.object_type
order by count(*)
From the out put it looks like we have both the SQL (at least the id, we can get the text with the id) and the block:

---- ------------- -------- ------ --- ------
  84 a09r4dwjpv01q MYDUAL   TABLE    1  93170
But the block actually is probably left over from a recent IO and not actually the CBC hot block though it might be.
We can investigate further to get more information by looking at P1, P2 and P3 for the CBC latch wait. How can we find out what P1, P2 and P3 mean? by looking them up in V$EVENT_NAME:
select * from v$event_name

where name = 'latch: cache buffers chains'
EVENT#     NAME                         PARAMETER1 PARAMETER2 PARAMETER3 
---------- ---------------------------- ---------- ---------- ----------
        58 latch: cache buffers chains     address     number      tries 

So  P1 is the address of the latch for the cbc latch wait.

Now we can group the CBC latch waits by the address and find out what address had the most waits:

    lpad(replace(to_char(p1,'XXXXXXXXX'),' ','0'),16,0) laddr
from v$active_session_history
where event='latch: cache buffers chains'
group by p1
order by count(*);   

---------- ----------------
      4933 00000004D8108330   
In this case, there is only one address that we had waits for, so now we can look up what blocks (headers actually) were at that address
select, bh.dbarfil, bh.dbablk, bh.tch

from x$bh bh, obj$ o
where tch > 5
  and hladdr='00000004D8108330'
  and o.obj#=bh.obj
order by tch

----------- ------- ------ ----
EMP_CLUSTER       4    394  120        
We look for the block with the highest “TCH” or “touch count”. Touch count is a count of the times the block has been accesses. The count has some restrictions. The count is only incremented once every 3 seconds, so even if I access the block 1 million times a second, the count will only go up once every 3 seconds. Also, and unfortunately, the count gets zeroed out if the block cycles through the buffer cache, but probably the most unfortunate is that  this analysis only works when the problem is currently happening. Once the problem is over then the blocks will usually get pushed out of the buffer cache.
In the case where the CBC latch contention is happening right now we can run all of this analysis in one query

        name, file#, dbablk, obj, tch, hladdr 
from x$bh bh
    , obj$ o
       o.obj#(+)=bh.obj and
      hladdr in 
    select ltrim(to_char(p1,'XXXXXXXXXX') )
    from v$active_session_history 
    where event like 'latch: cache buffers chains'
    group by p1 
    having count(*) > 5
   and tch > 5
order by tch   
example output

------------- ----- ------ ------ --- --------
BBW_INDEX         1 110997  66051  17 6BD91180
IDL_UB1$          1  54837     73  18 6BDB8A80
VIEW$             1   6885     63  20 6BD91180
VIEW$             1   6886     63  24 6BDB8A80
DUAL              1   2082    258  32 6BDB8A80
DUAL              1   2081    258  32 6BD91180
MGMT_EMD_PING     3  26479  50312 272 6BDB8A80
This can be misleading, as TCH gets set to 0 every rap around the LRU and it only gets updated once every 3 seconds, so in this case DUAL was my problem table not MGMT_EMD_PING
Deeper Analysis from Tanel Poder
Using Tanel’s ideas here’s a script to get the objects that we have the most cbc latch waits on
col object_name for a35
col cnt for 99999
  cnt, object_name, object_type,file#, dbablk, obj, tch, hladdr
  select count(*) cnt, rfile, block from (
    SELECT /*+ ORDERED USE_NL(l.x$ksuprlat) */
      –l.laddr, u.laddr, u.laddrx, u.laddrr,
      dbms_utility.data_block_address_file(to_number(object,’XXXXXXXX’)) rfile,
      dbms_utility.data_block_address_block(to_number(object,’XXXXXXXX’)) block
       (SELECT /*+ NO_MERGE */ 1 FROM DUAL CONNECT BY LEVEL <= 100000) s,
       (SELECT ksuprlnm LNAME, ksuprsid sid, ksuprlat laddr,
       TO_CHAR(ksulawhy,’XXXXXXXXXXXXXXXX’) object
        FROM x$ksuprlat) l,
       (select  indx, kslednam from x$ksled ) e,
                  , ksusesqh     sqlhash
  , ksuseopc
  , ksusep1r laddr
             FROM x$ksuse) u
    WHERE LOWER(l.Lname) LIKE LOWER(‘%cache buffers chains%’)
     AND  u.laddr=l.laddr
     AND  u.ksuseopc=e.indx
     AND  e.kslednam like ‘%cache buffers chains%’
   group by rfile, block
   ) objs,
     x$bh bh,
     dba_objects o
 and  bh.dbablk=objs.block
 and  o.object_id=bh.obj
order by cnt

---- ----------------- ----- ----- ------- ------ ----- --------
   1 WB_RETROPAY_EARNS TABLE     4   18427  52701  1129 335F7C00
   1 WB_RETROPAY_EARNS TABLE     4   18194  52701  1130 335F7C00
   3 PS_RETROPAY_RQST  TABLE     4   13253  52689  1143 33656D00
   3 PS_RETROPAY_RQST  INDEX     4   13486  52692   997 33656D00
   3 WB_JOB            TABLE     4   14443  52698   338 335B9080
   5 PS_RETROPAY_RQST  TABLE     4   13020  52689   997 33656D00
   5 WB_JOB            TABLE     4   14676  52698   338 335B9080
   5 WB_JOB            TABLE     4   13856  52698   338 335F7C00
   6 WB_JOB            TABLE     4   13623  52698   338 335F7C00
   7 WB_JOB            TABLE     4   14909  52698   338 335B9080
 141 WB_JOB            TABLE     4   15142  52698   338 335B9080
2513 WB_JOB            INDEX     4   13719  52699   997 33656D00
Why do we get cache buffers chains latch contention?
In order to understand why we get CBC latch contention we have to understand what the CBC latch protects. The CBC latch protects information controlling the buffer cache. Here is a schematic of computer memory and the Oracle processes, SGA and the main components of the SGA:
The buffer cache holds in memory versions of datablocks for faster access. Can you imagine though how we find a block we want in the buffer cache? The buffer cache doesn’t have a index of blocks it contains and we certainly don’t scan the whole cache looking for the block we want (though I have heard that as a concern when people increase the size of there buffer cache). The way we find a block in the buffer cache is by taking the block’s address, ie it’s file and block number and hashing it. What’s hashing? A simple example of hashing is  the “Modulo” function
1 mod 4 = 1

2 mod 4 = 2
3 mod 4 = 3
4 mod 4 = 0
5 mod 4 = 1
6 mod 4 = 2
7 mod 4 = 3
8 mod 4 = 0
Using “mod 4″ as a hash funtion creates 4 possible results. These results are used by Oracle as “buckets” or identifiers of locations to store things. The things in this case will be block headers.
Block headers are meta data about data block including pointers to the actual datablock as well as pointers to the other headers in the same bucket.
The block headers in the hash buckets are connected via a doubly linked list. One link points forward the other points backwards
The resulting layout looks like
the steps to find a block in the cache are
If there are a lot of sessions concurrently accessing the same buffer header (or buffer headers in the same bucket) then the latch that protects that bucket will get hot and users will have to wait getting “latch: cache buffers chains” wait.
Two ways this can happen (among probably several others)
For the nested loops example, Oracle will in some (most?) cases try and pin the root block of the index because Oracle knows we will be using it over and over. When a block is pinned we don’t have to use the cbc latch. There seem to be cases (some I think might be bugs) where the root block doesn’t get pinned. (I want to look into this more – let me know if you have more info)
One thing that can make CBC latch contention worse is if a session is modifying the data block that users are reading because readers will clone a block with uncommitted changes and roll back the changes in the cloned block:
all these clone copies will go in the same bucket and be protected by the same latch:

How many copies of a block are in the cache?


     , name
     , file#
     , dbablk
     , hladdr 
from   x$bh bh
          , obj$ o
      o.obj#(+)=bh.obj and
      hladdr in 
    select ltrim(to_char(p1,'XXXXXXXXXX') )
    from v$active_session_history 
    where event like 'latch: cache%'
    group by p1 
group by name,file#, dbablk, hladdr
having count(*) > 1
order by count(*);

--- ---------- ------ ------- --------
 14 MYDUAL          1   93170 2C9F4B20
Notice that the number of copies, 14, is higher the the max number of copies allowed set by “_db_block_max_cr_dba = 6″ in 10g. The reason is this value is just a directive not a restriction. Oracle tries to limit the  number of copies.

Find SQL ( Why is application hitting the block so hard? )

Possibly change application logic

Eliminate hot spots

Nested loops, possibly

Hash Partition the index with hot block

Use Hash Join instead of Nested loop join
Use Hash clusters

Look up tables (“select language from lang_table where …”)

Change application
Use plsql function
Spread data out to reduce contention, like set PCTFREE to 0 and recreate the table so that there is only one row per block

Select from dual

Possibly use x$dual
Note starting in 10g Oracle uses the “fast dual” table (ie x$dual) automatically when executing a query on dual as long as the column “dummy” is not accessed. Accessing dummy would be cases like
    select count(*) from dual;
    select * from dual;
    select dummy from dual;
an example of not accessing “dummy” would be:
    select 1 from dual;
    select sysdate from dual;

Updates, inserts , select for update on blocks while reading those blocks

Cause multiple copies and make things worse

What would OEM do?

In DB Optimizer:


Outer Joins : which side does the (+) go on

March 5th, 2014

Bobby Durrett just put together a great little post on “outer joins : where do I put the (+)“.

I also have hard time remembering, and below is my cheat sheet using graphics.

If English and French both have a unique key on the “ordinal_id” then it’s basically one-to-one relationship

We add an arrow in the middle of the line to denote “outer join”. The arrow points from the table that drives the join, ie all the rows in the table pointed from are returned even if a match isn’t found in the table pointed to.


above graphic originally on

type ANSI ANSI 89 (Oracle) type type
inner join english INNER JOIN french
using (ordinal_id)
english e, french f 
where e.ordinal_id=f.ordinal_id
left outer join english LEFT JOIN french
using (ordinal_id)
english e, french f 
where e.ordinal_id=f.ordinal_id(+)
right outer join english RIGHT JOIN french
using (ordinal_id)
english e, french f 
where e.ordinal_id(+)=f.ordinal_id
full join english FULL JOIN french
using (ordinal_id)
english e, french f
where e.ordinal_id=f.ordinal_id(+)
english e, french f
where e.ordinal_id(+)=f.ordinal_id


Managing database dependent development

March 5th, 2014

Application developers frequently experiment to achieve higher quality builds, they try things out, make mistakes and find fixes. Database design can be like that too, but rapidly changing the schema in traditional shared environment tends to break the application.

In this webinar, Kyle Hailey will explore how different types of cloning technology work and their benefits and limitations. The webinar will also demonstrate how you can setup and coordinate risk-free database experiments using Delphix and Red Gate’s Oracle tools.

Kyle Hailey and James Murtagh


Pure Storage and Delphix

March 5th, 2014

Joint webinar:

Super Charge Database Deployments



pure_storage  delphix_logo_color

Delphix and Pure Storage, a combination like chocolate and peanut butter.
Within the Enterprise, database performance can be a massive bottleneck. For database developers, the large and complex landscape of databases within an enterprise can cripple the productivity of the very developers who can solve these problems.Pure Storage’s cost-effective, high-performance all-flash storage arrays coupled with Delphix’s database virtualization technology make it possible for enterprises to move their database landscape to flash at one-tenth the cost of disk-based solutions, with no performance tradeoff. Join Kyle Hailey, Technical Evangelist at Delphix and Chas. Dye, Data Center Architect from Pure Storage to hear how Delphix and Pure Storage a supercharging database deployments.In this webinar, we will cover how you can:

  • Accelerate your virtual database projects by up to 50%
  • Modernize your infrastructure to make it more flexible for the future
  • Vaporize your costs: all-flash for 1/10th the cost of disk storage

Invest 30 mins and hear how this joint partnership is revolutionizing the database and storage world.

Read more…