Archive

Archive for August, 2013

Enqueue – PK, FK or Bitmap Index problem?

August 30th, 2013



MP900302987If one is seeing waits for enq: TX – row lock contention  then there could be a lot of reasons. One distinguishing factor is the lock mode. If the lock mode is exclusive (mode 6) then it’s most likely a classic row lock where two sessions are trying to modify the same row. On the other hand if the lock mode is share (mode 4)  it’s typically going to be

  • Primary/Unique Key: inserting a unique key when  someone else has already inserted that key but not committed
  • Foreign Key: Inserting a foreign when then parent value has been inserted but not committed or deleted and not commited (not to be confused with locks due to un-indexed foreign key which cause a “enq: TM – contention”  wait not a TX wait)
  • Bitmap Index: bitmap index chunk contention

Now how to tell which of these is happening? Well here is a query on ASH (I’ve commented out some of the useful fields to limit the output)  that will generate output to help us distinguish between the 3 different cases

col object for A15
col otype for A10
select
       substr(event,0,20)    lock_name,
       --ash.session_id        waiter,
       --mod(ash.p1,16)        lmode,
       --ash.p2                p2,
       --ash.p3                p3,
       o.object_name           object,
       o.object_type           otype,
       CURRENT_FILE#           filen,
       CURRENT_BLOCK#          blockn,
       --ash.SQL_ID            waiting_sql,
       BLOCKING_SESSION        blocker
       --,ash.xid
from
         v$active_session_history ash,
         all_objects o
where
           event like 'enq: TX%'
   and   mod(ash.p1,16)=4
   and o.object_id (+)= ash.CURRENT_OBJ#
/

Here is the output from the 3 different cases

Unique Index

  OBJECT   OTYPE FILEN BLOCKN BLOCKER
  ------  ------ ----- ------ -------
      -1             0      0     158
      -1             0      0     158
      -1             0      0     158
      -1             0      0     158

Foreign Key

  OBJECT   OTYPE FILEN BLOCKN BLOCKER
  ------  ------ ----- ------ -------
   CHILD   TABLE     1  60954       1
   CHILD   TABLE     1  60954       1
   CHILD   TABLE     1  60954       1

Bitmap Index

  OBJECT   OTYPE FILEN BLOCKN BLOCKER
  ------  ------ ----- ------ -------
      I1   INDEX     0      0     144
      I1   INDEX     0      0     144
      I1   INDEX     0      0     144
      I1   INDEX     0      0     144

Each case has a different footprint.

  • Unique key index issue object of “-1″
  • Foreign key case has a blocker of “1″
  • Bitmap index case as filen and blockn “0″

These cases were run on 10.2.0.3 thus the “footprint” could change on other versions.

The above ASH query and many other useful ASH queries are maintained on GitHub at

https://github.com/khailey/ashmasters

MP900302924

Oracle, performance, wait events , ,

Oracle I/O latency monitoring

August 29th, 2013


stopwatchOne thing that I have found sorely missing in the performance pages of Enterprise Manager is latency values for various types of I/O. The performance page or top activity may show high I/O waits but it won’t indicated if the latency of I/O is unusually high or not. Thus I put together a shell script that shows latency for the main I/O waits

  • db file sequential read
  • db file scattered read
  • log file parallel write
  • direct path reads
  • direct path reads temp

Of course it would be nice to add a few others like direct path writes, direct path writes temp and log file sync but there is only so much room in the screen width.

oramon.sh

 

The script is called oramon.sh and is available on github at

https://github.com/khailey/oramon/blob/master/oramon.sh

Example:

$  oramon.sh
Usage: oramon.sh [username] [password] [host] [sid] <port=1521> <runtime=3600>

$ ./oramon.sh system sys 172.16.100.81 vsol
RUN_TIME=-1
COLLECT_LIST=
FAST_SAMPLE=iolatency
TARGET=172.16.100.81:vsol
DEBUG=0
Connected, starting collect at Wed Apr 18 18:41:13 UTC 2012
starting stats collecting

   single block       logfile write       multi block      direct read   direct read temp
   ms      IOP/s        ms    IOP/s       ms    IOP/s       ms    IOP/s       ms    IOP/s
   20.76    27.55    32.55      .71     3.50      .00      .00      .01      .00      .00
     .00      .20               .00               .00               .00               .00
   34.93   369.64   116.79     3.55               .00               .00               .00
   31.43   640.33    92.40     8.33               .00               .00               .00
   39.39   692.33   111.69     8.00               .00               .00               .00

The first line of output is the average since the database started up.
The subsequent lines are the averages since the last line which is 5 seconds by default.
One should be able to see immediately how much activity there is on the database and the latency for the basic types of database I/O.

Reads
Single block reads are the typical I/O from a database which would happen for example when reading a row in a table with indexes in place.
Multi block reads are common as well is which would happen when for example summing the values over all rows in a table.
Direct reads are less common but quite normal and happen almost exclusively for parallel query though may be used for other activities especially in newer version of Oracle such as 11.2. Direct reads are multiblock reads that by pass the Oracle buffer cache. The size varies from a datablock, such as 8k to 1MB.
Direct read temp happens when a sort has overflowed memory limits and been written to disk. Direct reads temp are multiblock reads that by pass the Oracle buffer cache. The size varies from a datablock, such as 8k to 1MB.

Writes
Logfile writes are the only writes that database users wait for in general. Actually users only wait when the commit, which then is a wait for a signal from the log writer that their particular redo data is on disk which could have already happened. Typically the user wait time is a bit slower than the logwrite time but in general it’s close, ie within a few milliseconds. The farther apart the user wait time is from the log write time the more likely there is a CPU, paging or other concurrency problem on the VDB host slowing down the users signalling and wake up time.

Rising profits

oramon.sql : Oracle Latency Query

If for some reason the shell script isn’t able to connect to the database, then the same data can be collected manually by running the SQL query in SQL*Plus by hand.
The following two SQL queries, oramon_setup.sql and oramon.sql are available on github at

https://github.com/khailey/oramon

If you want to see the latencies over periods shorter than 60s, then you have to collect the values of the cumulative counters at time A, then again at time B and take the difference. The following two queries, oramon.sql and oramon_setup.sql, are available on ftp site

Run oramon_setup.sql *once*
  column seq_ms for 9999.99
   column seq_ct for 9999.99
   column lfpw_ms for 9999.99
   column lfpw_ct for 9999.99
   column seq_ms for 9999.99
   column scat_ct for 9999.99
   column dpr_ms for 9999.99
   column dpr_ct for 9999.99
   column dprt_ms for 9999.99
   column dprt_ct for 9999.99
   column prevdprt_ct new_value prevdprt_ct_var
   column prevdprt_tm new_value prevdprt_tm_var
   column prevdpwt_ct new_value prevdpwt_ct_var
   column prevdpwt_tm new_value prevdpwt_tm_var
   column prevdpr_ct new_value prevdpr_ct_var
   column prevdpr_tm new_value prevdpr_tm_var
   column prevdpw_ct new_value prevdpw_ct_var
   column prevdpw_tm new_value prevdpw_tm_var
   column prevseq_ct new_value prevseq_ct_var
   column prevseq_tm new_value prevseq_tm_var
   column prevscat_ct new_value prevscat_ct_var
   column prevscat_tm new_value prevscat_tm_var
   column prevlfpw_ct new_value prevlfpw_ct_var
   column prevlfpw_tm new_value prevlfpw_tm_var
   column prevsec new_value prevsec_var
   select 0 prevsec from dual;
   select 0 prevseq_tm from dual;
   select 0 prevseq_ct from dual;
   select 0 prevscat_ct from dual;
   select 0 prevscat_tm from dual;
   select 0 prevlfpw_ct from dual;
   select 0 prevlfpw_tm from dual;
   select 0 prevdprt_ct from dual;
   select 0 prevdprt_tm from dual;
   select 0 prevdpwt_ct from dual;
   select 0 prevdpwt_tm from dual;
   select 0 prevdpr_ct from dual;
   select 0 prevdpr_tm from dual;
   select 0 prevdpw_ct from dual;
   select 0 prevdpw_tm from dual;
   column prevdprt_ct noprint
   column prevdprt_tm noprint
   column prevdpwt_ct noprint
   column prevdpwt_tm noprint
   column prevdpr_ct noprint
   column prevdpr_tm noprint
   column prevdpw_ct noprint
   column prevdpw_tm noprint
   column prevseq_ct noprint
   column prevseq_tm noprint
   column prevscat_ct noprint
   column prevscat_tm noprint
   column prevlfpw_ct noprint
   column prevlfpw_tm noprint
   column prevsec noprint

Run following query to see the current latency for

  • single block read
  • log file parallel write
  • multi-block read

oramon.sql

select
        round(seqtm/nullif(seqct,0),2) seq_ms,
        round(seqct/nullif(delta,0),2) seq_ct,
        round(lfpwtm/nullif(lfpwct,0),2) lfpw_ms,
        round(lfpwct/nullif(delta,0),2) lfpw_ct,
        round(scattm/nullif(scatct,0),2) scat_ms,
        round(scatct/nullif(delta,0),0) scat_ct,
        round(dprtm/nullif(dprct,0),2) dpr_ms,
        round(dprct/nullif(delta,0),2) dpr_ct,
        round(dprttm/nullif(dprtct,0),2) dprt_ms,
        round(dprtct/nullif(delta,0),2) dprt_ct,
        prevseq_ct, prevscat_ct, prevseq_tm, prevscat_tm, prevsec,prevlfpw_tm,prevlfpw_ct
        , prevdpr_ct, prevdpr_tm , prevdprt_ct, prevdprt_tm , prevdpw_ct, prevdpw_tm
        , prevdpwt_ct, prevdpwt_tm
from
(select
       sum(decode(event,'db file sequential read', round(time_waited_micro/1000) -  &prevseq_tm_var,0)) seqtm,
       sum(decode(event,'db file scattered read',  round(time_waited_micro/1000) - &prevscat_tm_var,0)) scattm,
       sum(decode(event,'log file parallel write',  round(time_waited_micro/1000) - &prevlfpw_tm_var,0)) lfpwtm,
       sum(decode(event,'db file sequential read', round(time_waited_micro/1000) ,0)) prevseq_tm,
       sum(decode(event,'db file scattered read',  round(time_waited_micro/1000) ,0)) prevscat_tm,
       sum(decode(event,'log file parallel write',  round(time_waited_micro/1000) ,0)) prevlfpw_tm,
       sum(decode(event,'db file sequential read', total_waits - &prevseq_ct_var,0)) seqct,
       sum(decode(event,'db file scattered read',  total_waits - &prevscat_ct_var,0)) scatct,
       sum(decode(event,'log file parallel write',  total_waits - &prevlfpw_ct_var,0)) lfpwct,
       sum(decode(event,'db file sequential read', total_waits ,0)) prevseq_ct,
       sum(decode(event,'db file scattered read',  total_waits ,0)) prevscat_ct,
       sum(decode(event,'log file parallel write',  total_waits ,0)) prevlfpw_ct,
       sum(decode(event,'direct path read',  round(time_waited_micro/1000) - &prevdpr_tm_var,0)) dprtm,
       sum(decode(event,'direct path read',  round(time_waited_micro/1000) ,0)) prevdpr_tm,
       sum(decode(event,'direct path read',  total_waits - &prevdpr_ct_var,0)) dprct,
       sum(decode(event,'direct path read',  total_waits ,0)) prevdpr_ct,
       sum(decode(event,'direct path write',  round(time_waited_micro/1000) - &prevdpw_tm_var,0)) dpwtm,
       sum(decode(event,'direct path write',  round(time_waited_micro/1000) ,0)) prevdpw_tm,
       sum(decode(event,'direct path write',  total_waits - &prevdpw_ct_var,0)) dpwct,
       sum(decode(event,'direct path write',  total_waits ,0)) prevdpw_ct,
       sum(decode(event,'direct path write temp',  round(time_waited_micro/1000) - &prevdpwt_tm_var,0)) dpwttm,
       sum(decode(event,'direct path write temp',  round(time_waited_micro/1000) ,0)) prevdpwt_tm,
       sum(decode(event,'direct path write temp',  total_waits - &prevdpwt_ct_var,0)) dpwtct,
       sum(decode(event,'direct path write temp',  total_waits ,0)) prevdpwt_ct,
       sum(decode(event,'direct path read temp',  round(time_waited_micro/1000) - &prevdprt_tm_var,0)) dprttm,
       sum(decode(event,'direct path read temp',  round(time_waited_micro/1000) ,0)) prevdprt_tm,
       sum(decode(event,'direct path read temp',  total_waits - &prevdprt_ct_var,0)) dprtct,
       sum(decode(event,'direct path read temp',  total_waits ,0)) prevdprt_ct,
       to_char(sysdate,'SSSSS')-&prevsec_var delta,
       to_char(sysdate,'SSSSS') prevsec
from
     v$system_event
where
     event in ('db file sequential read',
               'db file scattered read',
               'direct path read temp',
               'direct path write temp',
               'direct path read',
               'direct path write',
               'log file parallel write')
) ;

Output looks like

  SEQ_MS   SEQ_CT  LFPW_MS  LFPW_CT   SEQ_MS  SCAT_CT   DPR_MS   DPR_CT  DPRT_MS  DPRT_CT
-------- -------- -------- -------- -------- -------- -------- -------- -------- --------
  115.71   422.67    76.17    12.00               .00               .00               .00

The first execution of the query is I/O since database startup, so should most likely be ignored.
Subsequent executions are the I/O since the last execution

The columns are

  1. SEQ_MS: single block latency
  2. SEQ_CT: single block reads per second
  3. LFPW_MS: log file parallel write latency
  4. LFPW_CT: log file parallel write count per second
  5. SCAT_MS: multi-block latency
  6. SCAT_CT: multi-block reads per second
  7. DPR_MS: direct path read latency
  8. DPR_CT: direct path read count
  9. DPRT_MS: direct path read temp latency
  10. DPRT_CT: direct path read temp count
Instead of running the query by hand the script “oramon.sh” available at https://github.com/khailey/oramon/blob/master/oramon.sh (see top of page) will collect this info ever 5 seconds in a loop and output to standard out at the UNIX prompt

Simple but only once a minute

NOTE: the following is a simpler query but the data only updates once a minute
select
       n.name event,
       m.wait_count  cnt,
       10*m.time_waited ms,
       nvl(round(10*m.time_waited/nullif(m.wait_count,0),3) ,0) avg_ms
  from v$eventmetric m,
       v$event_name n
  where m.event_id=n.event_id
        and (
              wait_class_id= 1740759767 --  User I/O 
                   or
              wait_class_id= 4108307767 --  System I/O  
             )
        and m.wait_count > 0 ;

.

io, Oracle, performance, wait events , ,

CURSOR_SHARING : a picture is worth a 1000 words

August 28th, 2013


Anyone who has been around Oracle performance over the years knows the grief that hard parsing SQL queries can cause on highly concurrent applications. The number one reason for hard parsing has been applications that don’t use bind variables. Without bind variables queries that would otherwise be shared get recompiled because their text is different and Oracle treats them as different queries. Oracle addressed this issue with a parameter called cursor_sharing. The parameter cursor_sharing has three values

  1. exact – the default
  2. similar – replace literals with bind variables, if a histogram keep literal in place
  3. force – replace literals with bind variables and use existing plan if it exists

Here is what the load looks like going from the default, exact, to the value force on a load of the same query but a query that doesn’t use bind variables:

looks like a significant load savings – impressive!
Now many people tell me that they think there are bugs with “force” and that you should use “similar”. The value similar does a similar thing but if there are histograms on the column, then Oracle will attempt, in certain cases, to have different plans based on different values. Sounds cool huh? Well their are bugs. Here is the same load with similar:
If we look at the different child cursors for this statement we find that Oracle, instead of sharing the children creates a different one for each execution:
This bug still seems to exist on 11gR2 :
Here is the code for the examples I (run by 8 users on 10g and 12 users on 11g)
--alter session set cursor_sharing=exact;
--alter session set cursor_sharing=force;
--alter session set cursor_sharing=similar;
declare
 l_cursor integer default 0;
 stmt varchar2(400);
 ret number;
BEGIN
 select hparse.nextval into ret  from dual;
 dbms_random.seed(ret);
 FOR i IN 1..1000  LOOP
   l_cursor:=dbms_sql.open_cursor;
   stmt:='SELECT  count(*) FROM t1 where c1  < '|| 
     dbms_random.value()||' and c2  < '||dbms_random.value();
   execute immediate stmt into ret;
   dbms_sql.close_cursor(l_cursor);
 END LOOP;
END;
/
Table t1 has no histograms. In the case above it had one row, but results were similar with no rows:
create table t1 (c1 number, c2 number);
insert into t1 values (0,0);
commit;
The issue should be addressed in 11g with a combination of cursor_sharing and adaptive cursor sharing
Also see Charles Hooper’s blog post on this topic at

Oracle, performance, sql, wait events , , ,

Finding the slowest SQL execution of the same query

August 28th, 2013


When running the same query multiple times, several questions come to mind:

  • Does the query always execute in the same amount of  time?
  • If some executions are slower, what is the slowest execution time?
  • When did the slowest exectution happen?
  • What more can I find out about the slowest exectution?

All of this can be answered from data in Active Session History or ASH.   The following query finds the maximum, minimum and average execution times in seconds as well as the time of the slowest execution which is given by start time and end time of the  slowest exectuiton:

col av for 9999999
col mx for 9999999
col mn for 9999999

select
   sql_id,
   count(*),
   round(min(delta),0) mn,
   round(avg(delta),0) av,
   round(max(delta),0) mx,
   substr(max(times),12) max_run_time
from (
   select
        sql_id,
        sql_exec_id,
        max(delta) delta ,
        -- lpad sets a fixed width on delta so it can be
        -- stripped off above with substr
        -- delta in "times" is just for sorting not displaying
        lpad(round(max(delta),0),10) || ' ' ||
        to_char(min(start_time),'YY-MM-DD HH24:MI:SS')  || ' ' ||
        to_char(max(end_time),'YY-MM-DD HH24:MI:SS')  times
   from ( select
                                            sql_id,
                                            sql_exec_id,
              cast(sample_time as date)     end_time,
              cast(sql_exec_start as date)  start_time,
              ((cast(sample_time    as date)) -
               (cast(sql_exec_start as date))) * (3600*24) delta

           from
              dba_hist_active_sess_history
           where sql_exec_id is not null
        )
   group by sql_id,sql_exec_id

)
group by sql_id
having count(*) > 10
order by mx
/

SQL_ID          COUNT(*)       MN       AV       MX MAX_RUN_TIME
------------- ---------- -------- -------- -------- -----------------------------------
155vjqzjqp58f         27        0      115      786 11-04-11 07:39:47 11-04-11 07:41:20
7nf5917x3tq5x        113        0      129      791 11-04-11 01:03:19 11-03-02 04:16:09
asg774pz0kqga         11       11      225      799 11-04-14 12:20:10 11-03-02 12:18:58
1kf1jsykb1nk6        529        0      107      800 11-04-09 18:41:28 11-03-01 23:13:56
8rua4c9agcqkb        978        0       73      820 11-03-01 23:13:50 11-03-01 23:13:56
fdagtr1sdk8tn        313        0      242      841 11-04-08 21:30:31 11-04-08 21:30:38
7b5w34atn8q1h        124        0       25      898 11-03-07 20:53:17 11-03-07 20:53:23
9w7mbuv734tfy         11       30      509      954 11-04-14 09:00:55 11-03-02 04:27:33
88qxmg220db9s         23        4      357     1040 11-04-13 22:08:20 11-04-13 02:20:09
83ngzw213v0cs        302        1      207     1122 11-04-11 16:26:32 11-03-07 16:12:11
34cgtc9xkgxny         61      978     1163     1272 11-03-02 10:06:24 11-03-02 10:06:30
g5u58zpg0tuk8         97        1       62     1359 11-04-13 02:49:33 11-04-12 02:23:39
6tkqndn1tzf23         29        7      187     1370 11-04-13 06:56:43 11-03-02 06:56:54
g4wrnvdmf23b1         11       19      458     1397 11-04-14 12:22:28 11-03-02 12:21:19
bpkg6y9wvf717         31        0      189     1427 11-04-13 21:19:51 11-03-02 21:08:00
4d6m2q3ngjcv9        320        3      485     1701 11-04-10 18:00:54 11-03-01 23:10:55
882tp5sbq1tq5         17       39      367     1795 11-04-14 14:00:30 11-03-01 23:04:33
5k7vccwjr5ahd       2653        0       33     1963 11-04-12 09:06:08 11-03-01 23:10:15
44034jty9vnrg         19        2      188     1968 11-03-03 09:17:34 11-03-03 09:17:43
g9n69a23s3wxz         40        6      280     2009 11-04-08 13:47:23 11-03-03 09:37:38
ds8cz0fb8w147        161       13      274     2531 11-04-12 16:10:21 11-03-01 23:11:55
bzyny95313u12        114        0       47     2599 11-03-03 03:06:18 11-03-03 03:06:25
0fvrpk7476b7y         26        0      133     3068 11-03-07 07:57:56 11-03-07 07:58:00
1pjp66rxcj6tg         15       57      768     3106 11-04-09 17:25:21 11-03-07 08:58:26
8r5wuxk1dprhr         39       24      841     3510 11-04-14 14:00:47 11-03-02 09:54:37
0w5uu5kngyyty         21        0      442     3652 11-04-11 08:00:21 11-04-11 09:01:13
0hbv80w9ypy0n        161        0     1184     4089 11-04-12 22:26:09 11-03-02 04:36:15
71fwb4n6a92fv         49       30      677     4481 11-04-10 17:01:17 11-03-01 23:40:45
0bujgc94rg3fj        604        0       25     4929 11-04-08 10:53:34 11-04-08 12:15:43
64dqhdkkw63fd       1083        0        7     7147 11-03-07 04:01:01 11-03-07 06:00:08
990m08w8xav7s        591        0       52     7681 11-04-13 00:39:27 11-04-13 00:39:37
2n5369dsuvn5a         16      303     5727    10472 11-04-14 11:18:11 11-04-14 14:12:43
2spgk3k0f7quz        251        0      546    29607 11-04-12 12:11:47 11-04-12 12:11:48
36pd759xym9tc         12     1391    23862    37934 11-04-13 19:25:49 11-03-01 23:00:12
497wh6n7hu14f         49        0     5498    69438 11-04-12 19:01:17 11-04-13 12:03:54

We can add a histogram of execution spread to the query

col 1 for 99999
col 2 for 99999
col 3 for 9999
col 4 for 999
col 5 for 99
col av for 99999.9
col ct for 99999
col mn for 999
col av for 99999.9

with pivot_data as (
select
      sql_id,
      ct,
      mxdelta mx,
      mndelta mn,
      avdelta av,
      -- high_boundary is set to maxdelta + 1 so that
      -- nothing ends up in the overflow bucket
      -- otherwise the max value ends up in the overflow bucket
      width_bucket(delta,mndelta,mxdelta+1,5) as bucket  ,
      substr(times,12) max_run_time
from (
select
   sql_id,
   delta,
   count(*) OVER (PARTITION BY sql_id) ct,
   max(delta) OVER (PARTITION BY sql_id) mxdelta,
   min(delta) OVER (PARTITION BY sql_id) mndelta,
   avg(delta) OVER (PARTITION BY sql_id) avdelta,
   max(times) OVER (PARTITION BY sql_id) times
from (
   select
        sql_id,
        sql_exec_id,
        max(delta) delta ,
        -- lpad sets a fixed width on delta so it can be
        -- stripped off above with substr
        -- delta in "times" is just for sorting not displaying
        lpad(round(max(delta),0),10) || ' ' ||
        to_char(min(start_time),'YY-MM-DD HH24:MI:SS')  || ' ' ||
        to_char(max(end_time),'YY-MM-DD HH24:MI:SS')  times
   from ( select
                                            sql_id,
                                            sql_exec_id,
              cast(sample_time as date)     end_time,
              cast(sql_exec_start as date)  start_time,
              ((cast(sample_time    as date)) -
               (cast(sql_exec_start as date))) * (3600*24) delta
           from
              dba_hist_active_sess_history
           where sql_exec_id is not null
        )
   group by sql_id,sql_exec_id
)
)
where ct > &min_repeat_executions_filter
)
select * from pivot_data
PIVOT ( count(*) FOR bucket IN (1,2,3,4,5))
order by mx,av;
/
SQL_ID            CT         MX   MN       AV MAX_RUN_TIME                             1      2     3    4   5
------------- ------ ---------- ---- -------- ----------------------------------- ------ ------ ----- ---- ---
8rua4c9agcqkb    978        820    0     72.7 11-03-01 23:13:50 11-03-01 23:13:56    830    116    21    9   2
fdagtr1sdk8tn    313        841    0    241.8 11-04-08 21:30:31 11-04-08 21:30:38    194     45     0    0  74
7b5w34atn8q1h    124        898    0     24.8 11-03-07 20:53:17 11-03-07 20:53:23    122      0     0    0   2
83ngzw213v0cs    302       1122    1    207.5 11-04-11 16:26:32 11-03-07 16:12:11    200     68    21    6   7
g5u58zpg0tuk8     97       1359    1     62.1 11-04-13 02:49:33 11-04-12 02:23:39     92      3     1    0   1
4d6m2q3ngjcv9    320       1701    3    484.7 11-04-10 18:00:54 11-03-01 23:10:55     92    168    50    9   1
5k7vccwjr5ahd   2653       1963    0     33.4 11-04-12 09:06:08 11-03-01 23:10:15   2623     15     8    4   3
ds8cz0fb8w147    161       2531   13    273.8 11-04-12 16:10:21 11-03-01 23:11:55    136     18     5    1   1
bzyny95313u12    114       2599    0     46.5 11-03-03 03:06:18 11-03-03 03:06:25    113      0     0    0   1
0hbv80w9ypy0n    161       4089    0   1183.9 11-04-12 22:26:09 11-03-02 04:36:15     27    116     9    6   3
71fwb4n6a92fv     49       4481   30    676.9 11-04-10 17:01:17 11-03-01 23:40:45     38      6     2    2   1
0bujgc94rg3fj    604       4929    0     24.7 11-04-08 10:53:34 11-04-08 12:15:43    601      1     1    0   1
64dqhdkkw63fd   1083       7147    0      7.2 11-03-07 04:01:01 11-03-07 06:00:08   1082      0     0    0   1
990m08w8xav7s    591       7681    0     51.8 11-04-13 00:39:27 11-04-13 00:39:37    587      0     0    2   2
2spgk3k0f7quz    251      29607    0    546.1 11-04-12 12:11:47 11-04-12 12:11:48    247      2     0    0   2
497wh6n7hu14f     49      69438    0   5498.2 11-04-12 19:01:17 11-04-13 12:03:54     44      1     0    1   3

Finally add the SQL_EXEC_ID of the longest running execution. This SQL_EXEC_ID can then be used to go look at the events in ASH for that particular execution

col 1 for 99999
col 2 for 99999
col 3 for 9999
col 4 for 999
col 5 for 99
col av for 99999
col ct for 99999
col mn for 999
col av for 99999.9
col longest_sql_exec_id for A10

WITH pivot_data AS (
SELECT
      sql_id,
      ct,
      mxdelta mx,
      mndelta mn,
      round(avdelta) av,
      WIDTH_BUCKET(delta_in_seconds,mndelta,mxdelta+.1,5) AS bucket  ,
      SUBSTR(times,12) max_run_time,
      SUBSTR(longest_sql_exec_id, 12) longest_sql_exec_id
FROM (
SELECT
   sql_id,
   delta_in_seconds,
   COUNT(*) OVER (PARTITION BY sql_id) ct,
   MAX(delta_in_seconds) OVER (PARTITION BY sql_id) mxdelta,
   MIN(delta_in_seconds) OVER (PARTITION BY sql_id) mndelta,
   AVG(delta_in_seconds) OVER (PARTITION BY sql_id) avdelta,
   MAX(times) OVER (PARTITION BY sql_id) times,
   MAX(longest_sql_exec_id) OVER (PARTITION BY sql_id) longest_sql_exec_id
FROM (
   SELECT
        sql_id,
        sql_exec_id,
        MAX(delta_in_seconds) delta_in_seconds ,
        LPAD(ROUND(MAX(delta_in_seconds),0),10) || ' ' ||
        TO_CHAR(MIN(start_time),'YY-MM-DD HH24:MI:SS')  || ' ' ||
        TO_CHAR(MAX(end_time),'YY-MM-DD HH24:MI:SS')  times,
        LPAD(ROUND(MAX(delta_in_seconds),0),10) || ' ' ||
        TO_CHAR(MAX(sql_exec_id)) longest_sql_exec_id
   FROM ( SELECT
                                            sql_id,
                                            sql_exec_id,
              CAST(sample_time AS DATE)     end_time,
              CAST(sql_exec_start AS DATE)  start_time,
              ((CAST(sample_time    AS DATE)) -
               (CAST(sql_exec_start AS DATE))) * (3600*24) delta_in_seconds
           FROM
              dba_hist_active_sess_history
           WHERE sql_exec_id IS NOT NULL
        )
   GROUP BY sql_id,sql_exec_id
)
)
where ct > &min_repeat_executions_filter
 and  mxdelta > &min_elapsed_time
)
SELECT * FROM pivot_data
PIVOT ( COUNT(*) FOR bucket IN (1,2,3,4,5))
ORDER BY mx DESC,av DESC
;

SQL_ID            CT         MX   MN       AV MAX_RUN_TIME                        LONGEST_SQ      1      2     3    4   5
------------- ------ ---------- ---- -------- ----------------------------------- ---------- ------ ------ ----- ---- ---
2spgk3k0f7quz    251      29607    0    546.0 11-04-12 12:11:47 11-04-12 20:25:14 16781748      247      2     0    0   2
990m08w8xav7s    591       7681    0     52.0 11-04-13 00:39:27 11-04-13 02:47:28 16786685      587      0     0    2   2
64dqhdkkw63fd   1083       7147    0      7.0 11-03-07 04:01:01 11-03-07 06:00:08 16777218     1082      0     0    0   1
0bujgc94rg3fj    604       4929    0     25.0 11-04-08 10:53:34 11-04-08 12:15:43 16814628      601      1     1    0   1
0hbv80w9ypy0n    161       4089    0   1184.0 11-03-02 04:36:10 11-04-12 23:34:18 16777290       27    116     9    6   3
bzyny95313u12    114       2599    0     47.0 11-03-03 03:06:18 11-03-03 03:49:37 16781191      113      0     0    0   1
ds8cz0fb8w147    161       2531   13    274.0 11-03-01 23:11:48 11-04-12 16:10:37 16777285      136     18     5    1   1
5k7vccwjr5ahd   2653       1963    0     33.0 11-03-01 23:10:12 11-04-12 09:06:12 16778244     2623     15     8    4   3
4d6m2q3ngjcv9    320       1701    3    485.0 11-03-01 23:10:53 11-04-10 18:01:26 16777261       92    168    50    9   1
g5u58zpg0tuk8     97       1359    1     62.0 11-04-12 02:23:37 11-04-13 02:51:09 16777217       92      3     1    0   1
34cgtc9xkgxny     61       1272  978   1163.0 11-03-02 10:06:24 11-03-02 10:27:36 16777250        4      4    14   30   9
09-02-05-623_950x510

Oracle, performance, sql

Oracle CPU Time

August 27th, 2013


 

 

oracle_cpu_bertrand_drouvot.png

Image from 

UPDATE: thanks to a discussion in the comments it’s come to my attention that there should be some more clarification on values used and what the mean at the beginning of this post.

Ever wonder where CPU wait comes from in EM performance screens as seen above? well the following discussion will give you a SQL query to calculate CPU wait. In the above image we see both CPU and CPU Wait as two colors (light green and dark green) in EM as well as two values from the output of the SQL script (CPU_ORA and CPU_ORA_WAIT).

Oracle CPU statistics are measured from calls to the OS to see how much CPU is burned over a given elapsed time (i.e. certain quantity of code). The CPU values are cycles used and not time, thus it does not include time on the run queue waiting to get the CPU.

Oracle ASH, on the other hand, lists all Oracle sessions that want to run from Oracle’s perspective, i.e. they aren’t idle and they aren’t waiting for a non-idle wait event like I/O. Thus ASH includes time spent both running on CPU burning cycles and time spent waiting to get on the CPU.

Thus we can take the the amount of time “On CPU” from ASH and subtract the amount of CPU in Oracle statistics for CPU usage then the remainder is roughly time spent by Oracle sessions waiting to get onto the CPU.

The two challenges to getting the value of “Wait for CPU” are getting  CPU cycles burned and ASH time “ON CPU” into the same units and making sure that we are measuring both over the same interval.

Oracle already reports CPU in % used and in centi-seconds used.  I like to transform this into Average Active Sessions on CPU so I can compare it to my main way of showing ASH data which is Average Active Sessions on “ON CPU” which includes wait for CPU.

There are 3 kinds of CPU in the Oracle stats.

  1. Oracle CPU used
  2. System CPU used
  3. Oracle demand for CPU

Starting in 10g Oracle records both the CPU used by the instance as well as the load on the system in v$sysmetric. This is awesome as we can see how busy the system is and how much of the CPU Oracle is responsible for:

col metric_name for a25
col metric_unit for a25
select metric_name, value, metric_unit from v$sysmetric where metric_name like'%CPU%' and group_id=2;
METRIC_NAME                         VALUE METRIC_UNIT
------------------------------ ---------- ------------------------------
CPU Usage Per Sec              251.067016 CentiSeconds Per Second
CPU Usage Per Txn              5025.52477 CentiSeconds Per Txn
Host CPU Utilization (%)       11.6985845 % Busy/(Idle+Busy)
Database CPU Time Ratio        76.3291033 % Cpu/DB_Time

Now the question is how do we convert these to something useful? For me I put it into the equivalent of AAS and compare it to the core count:

   select 'CPU_ORA_CONSUMED'                                     CLASS,
                    round(value/100,3)                             AAS
             from v$sysmetric
             where metric_name='CPU Usage Per Sec'
               and group_id=2
          union
            select 'CPU_OS'                                        CLASS ,
                    round((prcnt.busy*parameter.cpu_count)/100,3)    sAAS
            from
              ( select value busy from v$sysmetric
                where metric_name='Host CPU Utilization (%)'
                 and group_id=2 ) prcnt,
             ( select value cpu_count
                 from v$parameter
                where name='cpu_count' )  parameter;

CLASS                  AAS
---------------- ----------
CPU_ORA_CONSUMED       .002
CPU_OS                 .022

An AAS of 1 is equivalent to 100% of a core, so, OS CPU is about 2% of a core and of that Oracle used 0.2% of a core.
Not a very active system, and we can look at an active system later, but what I wanted to point out is that this query is missing an important statistic: the demand for CPU by Oracle. We can only add that, AFAIK, by joining in ASH:

   select 'CPU_ORA_CONSUMED'                                     CLASS,
                    round(value/100,3)                             AAS
             from v$sysmetric
             where metric_name='CPU Usage Per Sec'
               and group_id=2
          union
            select 'CPU_OS'                                         CLASS ,
                    round((prcnt.busy*parameter.cpu_count)/100,3)     AAS
            from
              ( select value busy from v$sysmetric
                 where metric_name='Host CPU Utilization (%)'
                   and group_id=2 ) prcnt,
              ( select value cpu_count from v$parameter
                 where name='cpu_count' )  parameter
          union
             select
               'CPU_ORA_DEMAND'                                            CLASS,
               nvl(round( sum(decode(session_state,'ON CPU',1,0))/60,2),0) AAS
             from v$active_session_history ash
             where SAMPLE_TIME > sysdate - (60/(24*60*60));

CLASS                   AAS
---------------- ----------
CPU_ORA_CONSUMED       .001
CPU_ORA_DEMAND          .02
CPU_OS                 .019

So the demand for CPU was higher than the amount consumed. Now the demand for CPU is coming from ASH which is sampled so the accuracy is weak, but in larger sample sets or busier systems it’s pretty darn good. The demand alerts us to CPU starvation on a busy  system.

I like to wrap all this up into a query with all the wait classes to see the overall load on Oracle including CPU consumed by Oracle, CPU demanded by Oracle and CPU used at the OS level:

select
                 decode(n.wait_class,'User I/O','User I/O',
                                     'Commit','Commit',
                                     'Wait')                               CLASS,
                 sum(round(m.time_waited/m.INTSIZE_CSEC,3))                AAS
           from  v$waitclassmetric  m,
                 v$system_wait_class n
           where m.wait_class_id=n.wait_class_id
             and n.wait_class != 'Idle'
           group by  decode(n.wait_class,'User I/O','User I/O', 'Commit','Commit', 'Wait')
          union
             select 'CPU_ORA_CONSUMED'                                     CLASS,
                    round(value/100,3)                                     AAS
             from v$sysmetric
             where metric_name='CPU Usage Per Sec'
               and group_id=2
          union
            select 'CPU_OS'                                                CLASS ,
                    round((prcnt.busy*parameter.cpu_count)/100,3)          AAS
            from
              ( select value busy from v$sysmetric where metric_name='Host CPU Utilization (%)' and group_id=2 ) prcnt,
              ( select value cpu_count from v$parameter where name='cpu_count' )  parameter
          union
             select
               'CPU_ORA_DEMAND'                                            CLASS,
               nvl(round( sum(decode(session_state,'ON CPU',1,0))/60,2),0) AAS
             from v$active_session_history ash
             where SAMPLE_TIME > sysdate - (60/(24*60*60));

CLASS                   AAS
---------------- ----------
CPU_ORA_CONSUMED       .002
CPU_ORA_DEMAND          .03
CPU_OS                 .023
Commit                    0
User I/O                  0
Wait                      0

Ideally I’d want the CPU stats to be subsets of each other so that I could have a graphically stack-able set of statistics

now rolling it all together
with AASSTAT as (
           select
                 decode(n.wait_class,'User I/O','User I/O',
                                     'Commit','Commit',
                                     'Wait')                               CLASS,
                 sum(round(m.time_waited/m.INTSIZE_CSEC,3))                AAS
           from  v$waitclassmetric  m,
                 v$system_wait_class n
           where m.wait_class_id=n.wait_class_id
             and n.wait_class != 'Idle'
           group by  decode(n.wait_class,'User I/O','User I/O', 'Commit','Commit', 'Wait')
          union
             select 'CPU_ORA_CONSUMED'                                     CLASS,
                    round(value/100,3)                                     AAS
             from v$sysmetric
             where metric_name='CPU Usage Per Sec'
               and group_id=2
          union
            select 'CPU_OS'                                                CLASS ,
                    round((prcnt.busy*parameter.cpu_count)/100,3)          AAS
            from
              ( select value busy from v$sysmetric where metric_name='Host CPU Utilization (%)' and group_id=2 ) prcnt,
              ( select value cpu_count from v$parameter where name='cpu_count' )  parameter
          union
             select
               'CPU_ORA_DEMAND'                                            CLASS,
               nvl(round( sum(decode(session_state,'ON CPU',1,0))/60,2),0) AAS
             from v$active_session_history ash
             where SAMPLE_TIME > sysdate - (60/(24*60*60))
)
select
       ( decode(sign(CPU_OS-CPU_ORA_CONSUMED), -1, 0, (CPU_OS - CPU_ORA_CONSUMED )) +
       CPU_ORA_CONSUMED +
        decode(sign(CPU_ORA_DEMAND-CPU_ORA_CONSUMED), -1, 0, (CPU_ORA_DEMAND - CPU_ORA_CONSUMED ))) CPU_TOTAL,
       decode(sign(CPU_OS-CPU_ORA_CONSUMED), -1, 0, (CPU_OS - CPU_ORA_CONSUMED )) CPU_OS,
       CPU_ORA_CONSUMED CPU_ORA,
       decode(sign(CPU_ORA_DEMAND-CPU_ORA_CONSUMED), -1, 0, (CPU_ORA_DEMAND - CPU_ORA_CONSUMED )) CPU_ORA_WAIT,
       COMMIT,
       READIO,
       WAIT
from (
select
       sum(decode(CLASS,'CPU_ORA_CONSUMED',AAS,0)) CPU_ORA_CONSUMED,
       sum(decode(CLASS,'CPU_ORA_DEMAND'  ,AAS,0)) CPU_ORA_DEMAND,
       sum(decode(CLASS,'CPU_OS'          ,AAS,0)) CPU_OS,
       sum(decode(CLASS,'Commit'          ,AAS,0)) COMMIT,
       sum(decode(CLASS,'User I/O'        ,AAS,0)) READIO,
       sum(decode(CLASS,'Wait'            ,AAS,0)) WAIT
from AASSTAT)
/

    CPU_OS    CPU_ORA CPU_ORA_WAIT     COMMIT     READIO       WAIT
---------- ---------- ------------ ---------- ---------- ----------
       .02       .002            0          0          0          0

Now let’s run up some load on a machine and database.
Take two databases, run up the CPU demand on both and add some wait contention. The machine has 24 cores so there is a definitely a problem when the CPU_TOTAL goes over 24. I’m running 14 sessions each trying to burn a core on two different databases. The first few lines the test is ramping up

SQL> /

 CPU_TOTAL     CPU_OS    CPU_ORA CPU_ORA_WAIT     COMMIT     READIO       WAIT
---------- ---------- ---------- ------------ ---------- ---------- ----------
    14.887       .387     13.753         .747          0          0       .023

SQL> /

 CPU_TOTAL     CPU_OS    CPU_ORA CPU_ORA_WAIT     COMMIT     READIO       WAIT
---------- ---------- ---------- ------------ ---------- ---------- ----------
    21.989      7.469     12.909        1.611          0          0       .044

SQL> /

 CPU_TOTAL     CPU_OS    CPU_ORA CPU_ORA_WAIT     COMMIT     READIO       WAIT
---------- ---------- ---------- ------------ ---------- ---------- ----------
    26.595     12.125     11.841        2.629          0          0       .025

SQL> /

 CPU_TOTAL     CPU_OS    CPU_ORA CPU_ORA_WAIT     COMMIT     READIO       WAIT
---------- ---------- ---------- ------------ ---------- ---------- ----------
    27.045     12.125     11.841        3.079          0          0       .025

Historically CPU used by Oracle was derived from

v$sysstat.name=’CPU used by this session’

but this statistic had problems as the value was only updated every time a call ended. A call could be a 1 hour PL/SQL procedure which would thus report zero cpu usage in the stats until it finished and the CPU would spike off the scale.

ASH had always been the most stable way to gather CPU demand, though Oracle has made improvements in gathering CPU statistics. I believe that the time model gathers CPU every 5 seconds in 10g, and in 11g it’s possible that CPU stats are gathered every second

Here is a visual example of a machine that has server memory contention, massive amounts of paging. There is OS CPU being used, but hardly any CPU being used by Oracle which makes sense as it’s an idle database, but what is revealing is the massive amount of CPU wait by Oracle. Oracle only has a little bit of work to do to take care of an idle database but we can see that most of Oracle’s CPU time is wait for CPU time as when it wants to work, pages have to be read back in,

I have my doubts as to the clarity of the layout of the above graph. A possibly clearer graph would be simply adding a line representing available CPU and take out the OSCPU bars. In the above graph I’ve charted OSCPU usage as AAS, ie average active sessions, mixing AAS of the database with AAS at the OS level. I think a  possible clear representation would be to show the Core count line, and draw the OSCPU usage shown upside down from the # of core lines, thus the space from the bottom axis to where the OSCPU reaches down would be available CPU.

 UPDATE

Thanks to the eagle eyes of John Beresniewicz a small error was identified in the above script. The last script didn’t correlate the time windows of v$sysmetric with v$active_session history. They both reported the last minute of statistics but the last minute reported in v$sysmetric could be up to a minute behind those in v$active_session_history, so here is a version that tries to correlate to two time windows so they are in sync

with AASSTAT as (
           select
                 decode(n.wait_class,'User I/O','User I/O',
                                     'Commit','Commit',
                                     'Wait')                               CLASS,
                 sum(round(m.time_waited/m.INTSIZE_CSEC,3))                AAS,
                 BEGIN_TIME ,
                 END_TIME
           from  v$waitclassmetric  m,
                 v$system_wait_class n
           where m.wait_class_id=n.wait_class_id
             and n.wait_class != 'Idle'
           group by  decode(n.wait_class,'User I/O','User I/O', 'Commit','Commit', 'Wait'), BEGIN_TIME, END_TIME
          union
             select 'CPU_ORA_CONSUMED'                                     CLASS,
                    round(value/100,3)                                     AAS,
                 BEGIN_TIME ,
                 END_TIME
             from v$sysmetric
             where metric_name='CPU Usage Per Sec'
               and group_id=2
          union
            select 'CPU_OS'                                                CLASS ,
                    round((prcnt.busy*parameter.cpu_count)/100,3)          AAS,
                 BEGIN_TIME ,
                 END_TIME
            from
              ( select value busy, BEGIN_TIME,END_TIME from v$sysmetric where metric_name='Host CPU Utilization (%)' and group_id=2 ) prcnt,
              ( select value cpu_count from v$parameter where name='cpu_count' )  parameter
          union
             select
               'CPU_ORA_DEMAND'                                            CLASS,
               nvl(round( sum(decode(session_state,'ON CPU',1,0))/60,2),0) AAS,
               cast(min(SAMPLE_TIME) as date) BEGIN_TIME ,
               cast(max(SAMPLE_TIME) as date) END_TIME
             from v$active_session_history ash
              where SAMPLE_TIME >= (select BEGIN_TIME from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2 )
               and SAMPLE_TIME < (select END_TIME from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2 )
)
select
       to_char(BEGIN_TIME,'HH:MI:SS') BEGIN_TIME,
       to_char(END_TIME,'HH:MI:SS') END_TIME,
       CPU_OS CPU_TOTAL,
       decode(sign(CPU_OS-CPU_ORA_CONSUMED), -1, 0, (CPU_OS - CPU_ORA_CONSUMED )) CPU_OS,
       CPU_ORA_CONSUMED CPU_ORA,
       decode(sign(CPU_ORA_DEMAND-CPU_ORA_CONSUMED), -1, 0, (CPU_ORA_DEMAND - CPU_ORA_CONSUMED )) CPU_ORA_WAIT,
       COMMIT,
       READIO,
       WAIT 
       -- ,(  decode(sign(CPU_OS - CPU_ORA_CONSUMED), -1, 0, 
       --                (CPU_OS - CPU_ORA_CONSUMED))
       --    + CPU_ORA_CONSUMED +
       --  decode(sign(CPU_ORA_DEMAND - CPU_ORA_CONSUMED), -1, 0, 
       --             (CPU_ORA_DEMAND - CPU_ORA_CONSUMED ))) STACKED_CPU_TOTAL 
from ( 
        select 
                min(BEGIN_TIME) BEGIN_TIME,
                max(END_TIME) END_TIME, 
                sum(decode(CLASS,'CPU_ORA_CONSUMED',AAS,0)) CPU_ORA_CONSUMED, 
                sum(decode(CLASS,'CPU_ORA_DEMAND' ,AAS,0)) CPU_ORA_DEMAND, 
                sum(decode(CLASS,'CPU_OS' ,AAS,0)) CPU_OS, 
                sum(decode(CLASS,'Commit' ,AAS,0)) COMMIT, 
                sum(decode(CLASS,'User I/O' ,AAS,0)) READIO, 
                sum(decode(CLASS,'Wait' ,AAS,0)) WAIT 
         from AASSTAT) 
/

The output now looks like

BEGIN_TI END_TIME  CPU_TOTAL	 CPU_OS    CPU_ORA CPU_ORA_WAIT     COMMIT     READIO	    WAIT
-------- -------- ---------- ---------- ---------- ------------ ---------- ---------- ----------
07:23:35 07:24:35	.044	   .024       .002	   .018 	 0	    0	    .001

I’m still open that there might be some more tweaking to do, so your milage may vary. Test, Test, Test and only trust yourself!
As a reminder the code above is on Github as part of the ASH Masters project. Feel free to get a Github account, fork the code and make changes. If you find cool new code or errors in old code, let me know and we will merge it into the ASH Masters project

 

https://github.com/khailey/ashmasters/blob/master/cpu_consumed_verses_cpuwait.sql

Oracle, performance ,

Where to begin with Oracle and SQL

August 26th, 2013


Seeing more and more questions on “where do I start with Oracle if I want to be a DBA?”  My perspective is a bit off since I’ve been surrounded by Oracle for over 20 years.  I hardly remember what it was like to start with Oracle and starting with Oracle now in 2013 is quite different than starting with Oracle in 1990.

Here is my list and everything on this list is excellent. I’m sure I missed a few good ones, but maybe people can add them in the comments.

Start with Oracle Docs, they are free and good!

Get the best books and read them

A bit old, but this is a great overview of Oracle: Practical Oracle 8i by Jonathan Lewis

After you have read the above, dive into the important details:

If you know SQL but want to get great at SQL performance, read these books

Join Online discussions and follow blogs

Create your own Oracle blog aggregator, here is mine: http://oracle.collected.info/

Go to conferences. First of all your local Oracle user groups and if you have the time and resources then go to the bigger conferences:

If you are in the Bay Area, the Northern California Oracle User Group has exceptional quarterly meetings

Oracle, performance, wait events , ,

TCP Trace Analysis for NFS

August 23rd, 2013


How do we know where latency comes from when  there is a disparity in reported I/O latency on  the I/O subsystem and that of the latency reported on the  client box requesting the I/O.

For example if I have an Oracle database requesting I/O  and Oracle says an 8Kb request takes 50 ms yet the I/O storage subsystem says 8Kb I/Os are taking 1ms (averages) , then where does the 49  extra ms come from?

When the I/O subsystem is connected to Oracle via NFS  then there are a lot of layers that could be causing the extra latency.

Screen Shot 2013-08-23 at 1.35.20 PM

Where does the difference in latency come from between NFS Server and Oracle’s timing of pread?

One strategy is to take tcp traces on NFS server and NFS client  (the Oracle Host) at the same time, during a period of load that shows the disparity.

The two trace files can then be analyzed to show latency at each side and the delta of latency between packets found in both traces

 

Files Required

 

Parsing script is ( thanks to Matt Amdur from Delphix for the  core code)

https://github.com/khailey/tcpdump/blob/master/parsetcp.pl

The script requires access to  tethereal command which comes with the wireshark install on LINUX:

      yum install wireshark

Example situation

 

For example in the following table stats were collected on the NFS server   (running Open Solaris) with DTrace and on the Oracle database using   oramon.sh.  The stats show the average latency for 8Kb read requests

Avg latency of 8Kb reads measured in ms


NFS Server Oracle
30.2 126.15
24.4 103.91
21.9 117.33
23.7 96.46
32 78.43
25.5 91.94

There is a substantial difference between the latency seen by NFS server and the latency seen by Oracle. Where is the latency coming from? Is it a problem on the NFS server, the NFS client or the network? How do we find out? One strategy is to collect TCP dumps on the NFS client and NFS server and compare them.

Setup problem situation

First, set up a situation where there is a load running on the database and there is a discrepancy between latency on NFS server and as seen by Oracle. During this period of latency discrepancy collect TCP traces. The traces will be large, so run them for 20 second for example, to limit the size.

 

Collecting TCP dumps

TCP dumps can be collected on Linux with tcpdump or  on Solaris with snoop:

NFS server (Solaris NFS server example)

snoop -q -d aggr0 -s 512 -o nfs_server.cap 172.16.100.102
  • -d : device
  • -o : output file, give “.cap” extension to be recognized automatically by wireshark
  • -s : amount of packet to keep

NFS client host is the IP at the end of the command line

Linux (linux client example)

tcpdump -s 512  host 172.16.100.87 -w nfs_client.cap
  • -s : amount of packet to keep
  • -w : output file, give “.cap” extension to be recognized automatically by wireshark
  • host : IP of NFS server

Analyzing the Traces

 

The trace files can be analyzed on a linux machine with wireshark installed and with the script parsetcp.pl  on the two traces.

$ ./parsetcp.pl nfs_server.cap nfs_client.cap
 ==================== RAW DATA ============

Parsing NFS server trace: nfs_server.cap
type       avg ms count
   READ : 18.10,  12493  
  WRITE : 30.08,   4918   

Parsing NFS client trace: nfs_client.cap
type       avg ms count  
   READ : 18.74,  25974 
  WRITE : 30.05,  10197   

 ==================== MATCHED DATA  ============

READ
type       avg ms 
nfs server: 18.12
nfs_client: 19.26
     diff :  1.14 
Processed 14010 packets (Matched: 11964 Missed: 2046)

WRITE
type       avg ms  
 server : 30.86    
 client : 31.83
   diff :  0.96 
Processed 5806 packets (Matched: 4391 Missed: 1415)

The script will find the average latency for reads and writes on both ends and then try to match the packets found both at the client and server and report the average latency for reads and writes on the packets that match.
In the above output the read time on on the NFS client receiving end is on par with what is seen on NFS server.
Thus if the receive time on the NFS client host is on par with the NFS server and Oracle latency is much greater than the receive time, then there looks to be some issue on the NFS client host machine such as heavy memory paging that would cause the latency seen by Oracle to be much higher than that seen by the tcp traces.

 

Example problem analysis

 

In the following 2 examples a swingbench load was run on the Oracle database on the NFS client. The database files were mounted via NFS from the NFS server. In one case the NFS client and database host was Open Solaris and in the other case it was LINUX. In both cases the Oracle database reported much slower latency than the NFS server threads.

test 1 ) Open Solaris is the NFS client

NFS Server is Open Solaris and NFS Server latency is measured with DTrace of using probes nfs start/done .

NFS client is a different machine running Open Solaris. Latency is measured from the Oracle perspective with oramon.sh:


NFS Server Oracle
1.4 29.92
1.8 32.62
1.1 50.38
1.1 56.38
1.5 65.45
0.9 46.88
0.7 45.13
1.1 44.93
1 39.48
0.9 55.8

 

Test 1 TCP trace analysis

What do the tcp traces say? Tracing TCP on the NFS server side, and NFS client side indicates that the slow latencies are coming from the NFS server

 ==================== Individual HOST DATA ============
Parsing nfs server trace: nfs_server.cap
type       avg ms count   
   READ : 44.60,   7731    

Parsing client trace: client.cap
type       avg ms count    
   READ : 46.54,  15282    

 ==================== MATCHED DATA  ============
READ
type       avg ms   
 server : 48.39,   
 client : 49.42,   
   diff :  1.03,    
Processed 9647 packets (Matched: 5624 Missed: 4023)

TEST 1 summary:

The NFS Server says it is responding with 1ms latency (from DTrace) but the TCP traces show latency more on the order of 48ms which is in alignment with what Oracle sees on the NFS client, thus the NFS server must be introducing the large latency somehow and not the client.

 

Test 2) LINUX NFS client

NFS Server is Open Solaris and NFS Server latency is measured with DTrace of using probes nfs start/done .

NFS client is a different machine running Linux. Latency is measured from the Oracle perspective with oramon.sh:


NFS Server Oracle
0.1 35.73
0.1 58.9
0.1 44.88
0.1 54.43
0.1 54.57
0.1 63.92

Again we see the same stats as the first test case, ie the NFS Server says it’s latency is low and the Oracle database on the NFS client says the latency is slow

TCP trace analysis

==================== Individual HOST DATA ============
Parsing NFS server trace: nfs_server.cap
type       avg ms count   
   READ :  1.17,   9042            

Parsing client trace: client.cap
type       avg ms count    
   READ :  1.49,  21984   

==================== MATCHED DATA  ============
READ
type       avg ms count    
 server :  1.03  
 client :  1.49
   diff :  0.46

Processed 13815 packets (Matched: 8169 Missed: 5646)
summary of test 2:

In this case the NFS Server TCP traces show fast latency as well as the NFS client TCP traces, thus the long latency times must be introduced on the client machine somehow and not the NFS Server.

 

Summary of test 1 and test 2

Tests are run with swingbench benchmark.
The Oracle databases are set with the parameter “filesystemio_options=SETALL” which enables direct I/O, so that all I/O requests from Oracle should be become NFS requests to NFS server.
Roughly the latency stats for 8k reads originating from the Oracle are

 

 Oracle on Linux  Oracle on Solaris  latency data source
 NFS Server  .1 ms   2 ms dtrace nfs:::op-read-start/op-read-done
 TCP trace NFS Server   1 ms  44 ms snoop
 TCP trace NFS Client   1.5  45 ms tcpdump on LINUX and snoop on Open Solaris
 Oracle  58 ms  47 ms “db file sequential read” wait (which is basically a timing of “pread” for 8k random reads specifically

Screen Shot 2013-08-23 at 2.09.43 PM

In the above examples there was nothing outstanding in the vmstat results on the NFS client and the NFS server.

In both cases the problem arose with high level of swingbench sessions. The latency disparity appeared above a user load of 16. The latency disparity increased as user load increased.

It turned out that in the case where the NFS server was introducing the disparity that the maximum NFS threads on the server was 16. The solution was to increase the NFS server thread max: sharectl set -p servers=512 nfs.

In the case where the LINUX client introduce the disparity it turns out that the maximum outstanding NFS requests was 16. The solution was to raise sunrpc.tcp_slot_table_entries = 128.

Screen Shot 2013-08-23 at 2.33.00 PM

The tcpdump.pl script didn’t find the answers but it did identify which side of the connection was at fault and it would identify if the network itself was the problem.

One thing to keep in mind is that snoop and tcpdump can be resource intensive and may introduce a major amount of latency. In the above cases with and without tcp tracing enabled showed no more than 1ms of difference and the latency being investigated was an order of magnitude greater. In other cases where there has been more I/O between the NFS server and client and/or the transfer sizes were higher (like 1M) then the impact of TCP tracing has eclipsed the latency disparity being investigated and  tcp tracing would not have been appropriate investigation path.

 


Other information

Network , , ,

Oracle time units in V$ views

August 23rd, 2013


Oracle has a crazy mix of units of time in various v$ views

  • seconds
  • centi-seconds
  • milliseconds
  • microseconds

Some are straight forward such as time_waited_micro, but what unit is “TIME_WAITED”  or “WAIT_TIME” in? For example

v$session
WAIT_TIME –  centi
SECONDS_IN_WAIT – seconds

v$session_wait
WAIT_TIME – centi
SECONDS_IN_WAIT – seconds

v$system_event
TIME_WAITED – centi
AVERAGE_WAIT – centi
TIME_WAITED_MICRO – micro

v$system_wait_class
TIME_WAITED – centi

v$eventmetric
TIME_WAITED – centi

v$waitclassmetric
DBTIME_IN_WAIT – “percentage of the measured wait time that was actually in foregrounds and therefore part of DB time” *
TIME_WAITED – centi

v$waitclassmetric_history
DBTIME_IN_WAIT – “percentage of the measured wait time that was actually in foregrounds and therefore part of DB time” *
TIME_WAITED – centi

dba_hist_system_event
TIME_WAITED_MICRO – micro

v$active_session_history
WAIT_TIME –  micro, not for general use
TIME_WAITED – micro, only the last sample is fixed up, the others will have TIME_WAITED=0*

dba_hist_active_sess_history
WAIT_TIME –  micro , not for general use
TIME_WAITED = micro

v$session_wait_history

WAIT_TIME  – centi
WAIT_TIME_MICRO  –  micro, 11g only
TIME_SINCE_LAST_WAIT_MICRO – micro, 11g only

in 10g, v$session_wait_history is pretty worthless IMO as one of the best uses of it would be to find average wait times for events, and even histograms of wait times and better yet,  correlating I/O sizes with I/O times, but alas as most interesting operations are in the micro to millisecond times and wait_time is in centi, most of the interesting data is lost, luckily this is fixed in 11g

 

With the list in one place it looks like everything is centi unless otherwise stated except for ASH  which is micro.

Please correct and/or add other examples to this list – thanks

* thanks to John Beresniewicz for this info.

 

Timings in SQL Trace files

 

 

Optimizing Oracle Performance
Cary Millsap,Jeff Holt
Chapter 5 Interpreting Extended SQL Trace Data
tim
If a tim value is 0, then TIMED_STATISTICS for the session was false when
the database call time would have been calculated. You can thus confirm whether TIMED_STATISTICS was true by observing tim values. In our field work, my colleagues and I have found that specific non-zero tim values associated with PARSING IN CURSOR sections are largely irrelevant.
In Oracle 9i [and higher] , tim is a value expressed in microseconds (1 us = 0.000 001 seconds).
On some systems (such as our Linux research servers), tim field values are unadulterated gettimeofday values. On other systems (like our Microsoft Windows research machines), the origin of tim field values can be much more mysterious.In releases prior to Oracle9i, tim is a V$TIMER.HSECS value expressed in centiseconds (1 cs = 0.01 seconds).

You can read more in
http://books.google.lv/books?id=l9eLIh2ylekC&lpg=PP1&dqĘry%20millsap%20optimizing%20oracle%20performance%20tim&hl=lv&pg=PT107#v=onepage&qĘry%20millsap%20optimizing%20oracle%20performance%20tim&fúlse

Thanks to Gints Plivna  for the reference to Cary’s book info

Oracle, wait events ,

Latency Heat Maps in SQL*Plus

August 23rd, 2013


This is so cool !

Screen Shot 2013-05-10 at 1.13.15 PM

The above is so cool.

The graphic shows the latency heatmap of “log file sync” on Oracle displayed in SQL*Plus! SQL*Plus ?! Yes, the age old text interface to Oracle showing colored graphics.

How did I do this? All I did was type

sqlplus / as sysdba
@OraLatencyMap_event 3 "log file sync"

The script  OraLatencyMap_event was created by  , see http://externaltable.blogspot.com/2013/05/latency-heat-map-in-sqlplus-with.htm

Now if we combine Luca’s  monitoring, with the I/O throttling documeted by Frits Hoogland here https://fritshoogland.wordpress.com/2012/12/15/throttling-io-with-linux/ , we can really have some fun and even draw latency words:

analytics-3-heatmaps-crop

 
In the graphic at the top of the page I put lgwr in an I/O write throttle group and played with the I/O throttle.  I was running a swingbench load and at the same as throttling I/O such that latencies started off good then got worse and then back to normal.
 
The full steps are:
 

Run an auto refresh color coded heatmap on “log file sync” in sqlplus by typing

sqlplus / as sysdba
@OraLatencyMap_event 3 "log file sync"

where OraLatencyMap_event.sql and OraLatencyMap_internal.sql are  your current directory or sqlpath

Now to play with LGWR latency with cgroup throttles see

https://fritshoogland.wordpress.com/2012/12/15/throttling-io-with-linux/

# install cgroups on 2.6.24 LINUX or higher
yum intall cgroup

# setup /cgroup/blkio
grep blkio /proc/mounts || mkdir -p /cgroup/blkio ; mount -t cgroup -o blkio none /cgroup/blkio
cgcreate -g blkio:/iothrottle

# find the device you want
df -k
# my Oracle log file divice was
ls -l /dev/mapper/vg_source-lv_home
lrwxrwxrwx. 1 root root 7 May  1 21:42 /dev/mapper/vg_source-lv_home -> ../dm-2

# my device points to /dev/dm-2
ls -l /dev/dm-2
brw-rw----. 1 root disk 253, 2 May  1 21:42 /dev/dm-2

# my device  major and minor numbers are "253, 2"
# create a write throtte on this device (for read just replace "write" with "read"
# this limits it to 10 writers per second
cgset -r blkio.throttle.write_iops_device="253:2 10" iothrottle

# look for lgwr
ps -ef | grep lgwr
oracle   23165     1  0 13:35 ?        00:00:19 ora_lgwr_o1123

# put lgwr pid into throttle group
echo 23165     >  /cgroup/blkio/iothrottle/tasks

# now play with different throttles
cgset -r blkio.throttle.write_iops_device="253:2 1" iothrottle
cgset -r blkio.throttle.write_iops_device="253:2 10" iothrottle
cgset -r blkio.throttle.write_iops_device="253:2 100" iothrottle
cgset -r blkio.throttle.write_iops_device="253:2 1000" iothrottle

# if you are finished then delete the throttle control group
cgdelete  blkio:/iothrottle

graphics, Uncategorized , , ,

Recovering a hacked wordpress site

August 22nd, 2013


A friend’s wordpress site just got hacked, so reposting this info which I have found useful a number of times.

OK,  so wordpress got hacked. I’ve had problems with this in the past and tried tactical surgery, but this time decided to do a full re-install. My first attempt today left me with the wordpress blank screen of death, so here I’m outlining the steps I took that finally got the new version working:

   back up wordpress database

       http://codex.wordpress.org/WordPress_Backups#Simple_Backup

   # go onto hosting box and downloaded new workdpress :
   # there are better ways to download wordpress than zip, but this worked for me
   lynx -source -dump http://wordpress.org/latest.zip > wordpress.zip
     or
   curl http://wordpress.org/latest.zip  --O latest.zip
   unzip wordpress.zip
   mv myblog.com oldmyblog.com
   mv wordpress myblog.com
   cd myblog.com
   cp wp-config-sample.php wp-config.php
   vi wp-config.php
      # change mysql info
      /** The name of the database for WordPress */
      define('DB_NAME', 'mydbname');
      /** MySQL database username */
      define('DB_USER', 'mydbuser');
      /** MySQL database password */
      define('DB_PASSWORD', 'mydbpassowrd');
      /** MySQL hostname */
      define('DB_HOST', 'mysql.myhost.com');
      # added new keys
      # get keys at
      #    https://api.wordpress.org/secret-key/1.1/salt/
      # replacing  these lines
      # define('AUTH_KEY',         'put your unique phrase here');
      # define('SECURE_AUTH_KEY',  'put your unique phrase here');
      # define('LOGGED_IN_KEY',    'put your unique phrase here');
      # define('NONCE_KEY',        'put your unique phrase here');
      # define('AUTH_SALT',        'put your unique phrase here');
      # define('SECURE_AUTH_SALT', 'put your unique phrase here');
      # define('LOGGED_IN_SALT',   'put your unique phrase here');
      # define('NONCE_SALT',       'put your unique phrase here');
      # save file

    cp .htaccess .htaccess.orig
    # the downloaded .htaccess was giving me 404 errors
    # so I used the one from the previous blog. Not sure
    # if it's fully up to date but at least it got me going
    vi .htaccess
      # include these lines
      # BEGIN WordPress
       RewriteEngine On
       RewriteBase /
       RewriteRule ^index\.php$ - [L]
       RewriteCond %{REQUEST_FILENAME} !-f
       RewriteCond %{REQUEST_FILENAME} !-d
       RewriteRule . /index.php [L]
       # END WordPress

    cp -R  ../oldmyblog.com/wp-content/themes        wp-content
    cp -R  ../oldmyblog.com/wp-content/uploads       wp-content
    cp -R  ../oldmyblog.com/wp-content/profile-pics  wp-content

    reinstall plugins, in my case I use these:
     amr-shortcode-any-widget/amr_shortcode_any_widget.php
     google-analytics-for-wordpress/googleanalytics.php
     profile-pic/profile-pic.php
     sbs-blogroll/sbs-blogroll.php
     syntax-highlighter-compress/syntax-highlighter-compress.php
     w3-total-cache/w3-total-cache.php
     wordpress-popular-posts/wordpress-popular-posts.php
     wp-widget-cache/widget-cache.ph

   For more info see

http://codex.wordpress.org/FAQ_My_site_was_hacked

http://smackdown.blogsblogsblogs.com/2008/06/24/how-to-completely-clean-your-hacked-wordpress-installation/

       suggest reinstall and checking wordpress database for these code usages for possible hacks
       SELECT * FROM wp_posts WHERE post_content LIKE '%<iframe%'
       UNION
       SELECT * FROM wp_posts WHERE post_content LIKE '%<noscript%'
       UNION
       SELECT * FROM wp_posts WHERE post_content LIKE '%display:%'
    http://ocaoimh.ie/2008/06/08/did-your-wordpress-site-get-hacked/
       check for usage of  base64_decode()

Uncategorized