Archive for May, 2014

SQL*Plus Procedures and Password Encryption

May 20th, 2014

One  small but bothersome  issue I’ve had for 20 years is  how to drive a program like SQL*Plus with a shell script to make it procedural.  One approach is to just run single commands to SQL*Plus causing a connection and exit for every SQL statement. Connecting and disconnecting is costly. Ideally, I just want to open up a  connection and send commands to the connection and get the response. Of course languages like java, perl, python all have Oracle connection APIs but what if I just want a simple shell script and don’t have access to perl or python with the Oracle APIs or access to java? Can’t I just do it in shell?  Can’t I just connect SQL*Plus to a named pipe and echo my commands into the pipe? Well yes but there is an annoying obstacle. After echoing the first command the pipe, which SQL*Plus dutifully executes, SQL*Plus then exits. I first ran into this problem about 20 years ago and didn’t solve it at the time.  A few years later, I figured out a trick to make it work and have been using it ever since. The trick is to have a process run a “tail -f” of an empty file into the pipe. With this second process tailing, SQL*Plus doesn’t exit when reading from the pipe. Since I first started using this I’ve never looked into exactly why. It think that when SQL*Plus tries to read from the pipe after the first command has been sent, the OS says, no more data , and SQL*Plus exits. With the second process doing the “tail -f”, then the OS tells SQL*Plus, waiting for more data to send you, and SQL*Plus waits. Would love a more detailed explanation.

function usage
       echo "Usage: $(basename $0)    [sid] [port]"
       echo "  username        database username"
       echo "  username        database password"
       echo "  host            hostname or IP address"
       echo "  sid             optional database sid (default: orcl)"
       echo "  port            optional database port (default: 1521)"
       exit 2
[[ $# -lt 3 ]] && usage
[[ $# -gt 5 ]] && usage
[[ $# -gt 0 ]] && UN=$1
[[ $# -gt 1 ]] && PW=$2
[[ $# -gt 2 ]] && HOST=$3
[[ $# -gt 3 ]] && SID=$4
[[ $# -gt 4 ]] && PORT=$5
  sh ./$CLEAN > /dev/null 2>&1
  echo "" > $CLEAN
  echo "rm $OPEN" >> $CLEAN
  echo "rm $PIPE" >> $CLEAN
  rm $OPEN $PIPE > /dev/null 2>&1
  touch  $OPEN
  cmd="$MKNOD $PIPE p"
  eval $cmd
  tail -f $OPEN >> $PIPE &
  echo "kill -9 $OPENID" >> $CLEAN
  # run SQLPLUS silent unless DEBUG is 2 or higher
  if [ $DEBUG -lt 2 ]; then
                     (ADDRESS_LIST=             \
                         (ADDRESS=              \
                             (PROTOCOL=TCP)     \
                             (HOST=$HOST)       \
                             (PORT=$PORT)))     \
                      (CONNECT_DATA=            \
                             (SERVER=DEDICATED) \
   cmd="sqlplus $SILENT \"$CONNECT\" < $PIPE > /dev/null &"
   cmd="sqlplus $SILENT \"$CONNECT\" < $PIPE  &"
   echo "$cmd"
   echo "PIPE $PIPE"
   eval $cmd
   echo "kill -9 $SQLID" >> $CLEAN

Example execution

$ ./  scott tiger
PIPE sqlplus.pipe
$ echo 'select * from dual;' > sqlplus.pipe
$ echo 'exit' > sqlplus.pipe
$ sh sqlplus.clean

The above code will create a SQL*Plus connection reading it’s input from a pipe. It also creates a cleanup file to remove the pipe file and kill the tail process. If creating multiple connections then the file names will have to be pre/post-pended with some string to keep them separate. could be a timestamp. Could be info about which target.

Now what does this have to do with password encryption?
Well there are packages that handle password encryption. Oracle has as cool thing called wallets, that can be setup so that SQL*Plus can connect without a password.
Oracle’s Doc
Here is a quick setup:
Unfortunately Oracle’s wallet method requires files that can’t be redistributed. These file dependencies legally have to be installed by the end user,  where as Oracle instant client can be redistributed. So what if I’m redistributing a tool that uses instant client? Then the wallet solution is out of the question, at least for easy installs.
Now what if I create my own binary to handle password encryption, like “Oracle Password Repository”
This doesn’t help that much for protection, because a user can actually get the text password. What I want is a way to encrypt passwords and hide  the unencrypted passwords from the user. Sure I want  them to connect, then they have access to the database,  but I want to prevent them from walking off with a file full of clear text passwords.  One solution, like the Oracle wallet with the “cwallet.sso” files, is to have a binary that creates the connections  for SQL*Plus over a pipe and then pass the pipe to the users. Bit of a hack, especially for an interactive users, but for scripts that run SQL*Plus it not only centralizes the passwords, but it encrypts and  helps prevent the user from getting access to and walking away with a set of clear text passwords.


Reguarding the beginning of this blog post and the problem of SQL*Plus exiting after receiving the first command via a named pipe, here is what truss looks like echoing ‘select * from dual’ into the pipe with and without having a second process tailing (nothing) into the pipe. First example has a second process doing a “tail -f ” of an empty file into the pipe while echoing ‘select * from dual’ into the pipe which SQL*Plus reads, executes and stays connected

fstat64(1, 0x08044440)                          = 0
write(1, "\n", 1)                               = 1
write(1, " D\n", 2)                             = 2
write(1, " -\n", 2)                             = 2
write(1, " X\n", 2)                             = 2
write(1, "\n", 1)                               = 1
read(0, 0x0813FAD4, 5120)       (sleeping...)

Second example, there is no “tail -f” and we just do “echo ‘select * from dual;’” into the pipe which SQL*Plus executes then exits:

write(1, "\n", 1)                               = 1
write(1, " D\n", 2)                             = 2
write(1, " -\n", 2)                             = 2
write(1, " X\n", 2)                             = 2
write(1, "\n", 1)                               = 1
write(1, " S Q L >  ", 5)                       = 5
read(0, 0x0813F714, 5120)                       = 0
write(4, "\0\r\0\006\0\0\0\0\003\t".., 13)      = 13
read(4, "\011\0\006\0\0\0\0\0\t01".., 2064)     = 17
write(1, " D i s c o n n e c t e d".., 95)      = 95
write(1, " W i t h   t h e   P a r".., 78)      = 78
write(4, "\0\n\0\006\0\0\0\0 @", 10)            = 10

Here is a good explanation

If all file descriptors referring to the write end of a pipe have been closed, then an attempt to read(2) 
from the pipe will see end-of-file (read(2) will return 0).

The part that isn’t explained, for me, is that a reader will wait until at the write end has been opened.  So the EOF doesn’t happened until there is an open and a close and all open write file descriptors have to be closed, thus adding a never finishing write will keep the reader from reading an EOF.

Here is the code for OpenSolaris (thanks to Erik Schrock for this links)

network security


Sampling Performance Data on Oracle

May 19th, 2014

Oracle wait events took years to become widely known for their power in performance tuning. Similarly it has taken years for most DBAs to adopt Active Session History (ASH). ASH represents a revolution in performance tuning data as radical as the introduction of wait events.

Wait events revolutionized tuning way back in Oracle 7 by identifying performance bottlenecks in the database and explicitly quantifying their impact. Still lacking in the wait events interface was the ability to identify the causes of the problems. The solution has come with ASH. ASH clearly identifies the SQL and Sessions at the root of wait bottlenecks.

ASH ironically has be technically feasible by a savvy DBA since Oracle 7 but it took Oracle 10g for anyone to put this cutting edge technology to work, and in no small way, but by actually implanting it in the Oracle kernel and basing Oracle 10g performance tools on ASH.

ASH is founded upon a data gathering technique called “Sampling.” Sampling is a method of collecting data by taking snapshots every second. The snapshots record the state of every active session in the database. For every active session, Oracle records the SQL being executed as well as the state of the session. A session state can be running on the CPU or waiting on any of several hundred wait events. With this simple amount of information, the session + state + sql , we can quickly identify SQL that are CPU intensive, IO intensive or involved in any number of the possible resource contention issues that can arise in an Oracle database.

Looking at the situation more concretely, when there is a slow down on the database we look at our trusted AWR or statspack report for the period of the slowdown. The first step in analyzing the statspack report is to look at the Top 5 Timed Events

The top 5 timed wait events will tell us if any wait event has crept up to cause a bottleneck. If we do find a wait event bottleneck we will want to know who or what is causing the problem in order to solve it. For example if there is a CPU bottleneck, we want to know what SQL statement is hogging the CPU. If there is an IO bottleneck we want to know what SQL statement is stuck on IO and needs tuning. If there is a complex situation like a buffer busy waits or latch contention we want to know which sessions were involved, what the wait event arguments were and what SQL they were executing. Statspack fails to give us the necessary detailed information but ASH does.

Real Life Example

Let’s look at a real life example. This example is one of the harder wait bottleneck to solve, so if you can follow this one then it’s downhill from here. This example is one of the more complex tuning problems and thus the analysis requires a deep understanding of Oracle wait events.

Imagine a situation where users call into the help desk complaining that the application has slowed down. The help desk determines it’s not the application and calls me, the DBA, to fix it. To solve the situation I run the statspack re-ports over the last hour and look at “Top 5 Timed Events”

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~                        % Total
Event                 Waits    Time (s) Call Time
--------------------------- ----------- ---------
buffer busy waits     2,748         250     78.72
CPU time                             32     10.16
free buffer waits     1,588          15      4.63
write complete waits     10           8      2.51
log buffer space        306           5      1.51

Sure enough there is a bottleneck and the bottleneck is on “buffer busy waits”, but now what do I do to solve it? There are some guesses I can make, and I can even use some data in statspack to help guess, but there is no way to conclusively analyze this wait bottleneck or most wait events with statspack. To solve most wait event bottlenecks we want more detailed information than is available in statspack. What information do we need? We mainly want to know what

  • SQL
  • Sessions
  • Objects
  • wait details

are involved. How do we get that information? We get that information by sampling v$session_wait, i.e. via ASH. We also have to understand how to solve each wait event and specifically in this case how to solve “buffer busy waits.” A complete explanation of buffer busy see the previous blog post on buffer busy waits.

How do we find the objects and SQL involved in the buffer busy waits? The information we want can be found in a table v$session_wait when the problem exists but v$session_wait only represents the current state of the system and if our problem is intermittent or in the past then the information we want from v$session_wait will be lost. V$session_wait’s lack of history is where the new Oracle 10g concept Active Session History (ASH) comes in. ASH at its simplest is just reading v$session_wait (actually v$session which includes v$session_wait starting in 10g) every second and storing the data. Oracle does this automatically in 10g but it has been possible in every version since Oracle 7 by running the following query:
column sid format 999
column username format a10
column serial format 99999
column hash_value format 9999999999
column event format a15
column p1 format 9999999999
column p2 format 99999999
column p3 format 9999
   nvl(s.username,s.program) username,
   s.sid sid,
   s.serial# serial,
   s.sql_hash_value hash_value,
          0, w.event,
         'ON CPU'),1,15) event ,
   w.p1  p1,
   w.p2  p2,
   w.p3  p3
from         v$session        s,
    v$session_wait   w
where     w.sid=s.sid
and s.status='ACTIVE'
and s.type='USER';

From the above query we get output like this.

USERNAME   SID   SERIAL   HASH_VALUE EVENT                P1       P2       P3
---------- ----- -------- ---------- -------------------- -------- -------- ----
SYS        64    8717     4116021597 PL/SQL lock timer    300      0        0
SYS        58    19467     961168820 ON CPU               16508152 1        0
STARGUS    71    6251     1311875676 direct path write    201      2155902  127
(CJQ0)     9     1                 0 rdbms ipc message    500      0        0

In the case of the buffer busy wait, I’ve been running the above query and storing it in a table that I call v$ash. I can now read v$ash and look for the detail information I need to solve the buffer busy wait problem. First I want to know what block type the buffer busy waits are on.

SQL> Select count(*), p3
from v$ash
where event = 'buffer busy waits'
group by p3;

COUNT(*)    P3
---------- ----
3423          1

All the waits are on block type 1. What is block type 1? we can check in v$waitstat translate block type 1 into the actual block type:

select rownum n,
 from v$waitstat;

--- ------------------
 1 data block
 2 sort block
 3 save undo block
 4 segment header
 5 save undo header
 6 free list
 7 extent map 
 8 1st level bmb
 9 2nd level bmb
 10 3rd level bmb
 11 bitmap block
 12 bitmap index block
 13 file header block
 14 unused
 15 system undo header
 16 system undo block
 17 undo header
 18 undo block

 Block type 1 is a data block.

When there are buffer busy waits on block type 1 it is typically an insert problem. The problem could be either on index or a table. Depending whether it is a table or an index with the problem we take different actions. Lets find out what object the buffer busy wait is on.

We can find this out by looking at p1 and p2 for the buffer busy waits which represents the file and the block:

select count(*),  p1 filen, p2 blockn , hash_value
from v$ash
where event='buffer busy waits'
group by p1, p2, hash_value;

---------- -------- --------
1   11       90644
2   11       90651
3   11       98233
1   11       104767
3   11       113291
1   11       119842
1   11       119856
3   11       121632
1   11       126334

I’ll just pick the last block and find out what object it is coming from:

ccolumn segment_name format a30
select owner,
block_id, blocks+block_id
from dba_extents
where file_id = 1
and 126334 between block_id AND block_id + blocks-1;

---------- ------------------ ---------- ---------- ---------------
SYSTEM     TOTO1               TABLE          125201          127249

From this result I know it’s a data block of a table and not a header block (because the header block would be the first block in the object). Now the question is what are the sessions doing? I can look at the sql and see what they are executing:

SQL> select count(*),  p1 filen, p2 blockn , hash_value
2  from v$ash
3  where event='buffer busy waits'
4  group by p1, p2, hash_value;

---------- -------- -------- ----------
3 1        94609     558666863
2 11       81163     558666863
2 11       87123     558666863
1 11       90644     558666863
2 11       90651     558666863
3 11       98233     558666863
1 11       104767    558666863
3 11       113291    558666863
1 11       119842    558666863
1 11       119856    558666863
3 11       121632    558666863
1 11       126334    558666863

I can see that the same SQL is causing all the “buffer busy waits” because the hash value is the same. Now I can find out what that sql statement is by:

select sql_text
from v$sqltext
where hash_value=558666863;

INSERT into toto1 values (:b1,lpad('a',1000,'a'))

With this final piece of information it looks like the segment is probably missing free lists or not in an ASSM tablespace. Let’s check these possibilities.


from dba_tables
where table_name='TOTO1' and owner='SCOTT';

---------- ------------------------------

Automatic Segment Space Management:

select tablespace_name, SEGMENT_SPACE_MANAGEMENT
from dba_tablespaces;

------------------------------ ------
SYSTEM                         MANUAL
UNDOTBS1                       MANUAL
TEMP                           MANUAL
USERS                          MANUAL
TOOLS                          AUTO

We see that that the table is in a table space, USERS, that does not have ASSM active.

In summary, the table TOTO1 has wait contention on writes to blocks and multiple sessions are concurrently inserting into it and it has no free lists nor is it in an Automatic Segment Space Managed (ASSM) table space. This causes all sessions inserting to insert into the same block, the last free one on the one free list and thus fight over exclusive write access to that last block. The solutions is to either put the segment into a ASSM table space and let Oracle handle the contention or put free lists on the segment which is more precise. If I choose free lists, then how many do I need? I need as many as there are concurrent users inserting into the table which I can get with

select count(*), sid, serial  from v$ash where event='buffer busy waits'
group by sid, serial;

---------- ----- --------
7 12    79
4 14    99
4 19    770
8 20    176

There were 4 sessions inserting during the period of ASH data I looked at, thus in my case I need 4 free lists.

Extended Power of ASH

The above example is simplified. Two other important pieces of data missing in the above example that also need to be tracked are sql_text and sample_time. Sample_time is required to group data by performance bottleneck time periods and sql_text should be collected on an ongoing basis in case the SQL found in v$session_wait gets kicked out of the shared pool. At the end of the day collecting ASH becomes quite complicated with the need to partition and purge data, send data to a different repository to reduce overhead on the target and the need to collect other bits of information like maps of user# to username and object# to object name. To see a package that does much of this, refer to


Even with a good collection running, it takes some nontrivial sql to mine the richness of the data.
Even with a good set of scripts to analyze the data it takes a number of well written sql statements to drill down sufficiently into the problem.


Luckily Oracle EM does all of the above for you. Oracle collects sampled data in v$active_session_history, displays is powerfully and graphically in Top Activity screen and provides automated analysis with ADDM.


On top of this, the data is displayed in a graphically in Top Activity condenses the equivalent dozens of sql statements that would take time and effort to run and comprehend into visual that can be immediately understood.
Even in this reduced graphic we can quickly and easily see that there is a performance problem on the machine. In this case there is clearly a CPU bottleneck, because the CPU (in the top chart, the green) is above the maximum CPU on the machine (the dotted red line). This means that the machine is at 100% CPU and there are processes in the run queue. We can also see the chart at the bottom left that one SQL statement is using much more CPU than any other sql statement, so this is the statement to tune. It only takes a second to scan the graphic and see where the problem is and it’s all based on ASH.


ASH is a leap in performance tuning technology. It took a clear vision of the future and a bit of a leap of faith to let go of the compulsive need to have exact measurements and instead embrace sampling a technology that statistically approximates the same value with little loss of accuracy but at the same time brings a great wealth of information necessary for analyzing the performance of a database.


ASH can be simulated on any version of Oracle since v7 with the simple act of repeatedly sampling and storing the values from v$session_wait into a historic table. Although the idea is easy, the implementation turns out to be fairly complex. Luckily Oracle implements not only ASH in version 10g but more importantly Oracle implements a diagnostic package that reads the data, analyzes the data and report the findings and solutions automatically so you don’t even need to know all the esoteric wait analysis. The best part of all is Oracle EM exposes all the data and analysis graphically provide an interface that communicates quickly and clearly the date of the system.


I want to be a simplifier – thanks Steve Jobs

May 16th, 2014


Any intelligent fool can make things bigger and more complex. It takes a touch of genius – and a lot of courage – to move in the opposite direction.

                                             Albert Einstein

I just turned up the short blog/article below on Steve Jobs. Steve Jobs has always been an inspiration for me even if I was a bit  jealous at times of Apple. Apple never interested me as a work place because for one I didn’t feel I had the experience to stand out at Apple and even at 21 years old when I graduated in 1988, 4 years after when Macintosh came out, I felt Apple was already too big be for me to make a difference especially when they attracted the best and the brightest designers. I wasn’t a designer. I wasn’t a code geek either. I wasn’t even a computer scientist for that matter. I never knew what I was. But now I know – I’m a simplifier or at least that’s what I want to be. I want to make things easy to understand and I want make things that engage the user to understand.  Thanks to Steve Jobs  who crystallized this insight about simplification and to the years of inspiration.

John Sculley on Steve Jobs:

Steve said, “If I asked someone who had only used a personal calculator what a Macintosh should be like, they couldn’t have told me. There was no way to do consumer research on it, so I had to go and create it, and then show it to people, and say now what do you think?”

A friend of mine was at meetings at Apple and Microsoft on the same day. … . He went into the Apple meeting (he’s a vendor for Apple), and as soon as the designers walked in the room, everyone stopped talking, because the designers are the most respected people in the organization. Everyone knows the designers speak for Steve because they have direct reporting to him. It is only at Apple where design reports directly to the CEO.

Later in the day he was at Microsoft. When he went into the Microsoft meeting, everybody was talking and then the meeting starts and no designers ever walk into the room. All the technical people are sitting there trying to add their ideas of what ought to be in the design. That’s a recipe for disaster.

He’s a minimalist and constantly reducing things to their simplest level. It’s not simplistic. It’s simplified. Steve is a systems designer. He simplifies complexity.

– John Sculley



Why did Google succeed? Yahoo shuts altavista down

May 15th, 2014

Yahoo shut  Altavista down a couple of years ago. Why did Google succeed when Altavista had index the web before Google ever started?

Dan Rosensweig former COO at Yahoo responding to the question: Why did Google succeed in search over Yahoo?

We were many things to many people.
They were one great thing to everybody
Sometimes it's just what you believe in.
It's how the question get's phrased.
When I came to yahoo people were not interested
in investing in search because there was no way
to make money.
Google didn't care about making money
They cared about improving peoples ability to search
and they were originally going to be enterprise.
Yahoo was actually using their search engine.
Then something happened.
Somebody created a business model around search.
To get around patents Google improved on Overtures business model.
The logic [Yahoo] had was you bid the most you come up first.
It was sort of self policing.
Why would you bid the most to come up first
if you weren't the most relevant?
Google figured something out.
It's not who bids the most,
its how you generate the most amount of value
off each click.
Its a combination of location and price and click through.
That enabled them to make more money on every click
than we did which allowed them to  invest a lot more money in increased
monetization engines and then buy distribution.
They could buy distribution at half the cost that we could
or a third the cost that we could.
Here we were as a public company trying to balance multiple
businesses and they were focused on one
and they were making more money on every click
and they were able to run the table successfully in search.

Why do companies succeed? Big question with lots of answers.
Google and the mass media seem to propagate the story that Google’s search is better and that’s why Google won. Clean and simple. Black and white. People like black and white, but the story is more complicated than that. The search space was competitive when Google came on the scene. I personally used Altavista before Google came one the scene and well after. I compared many of the search results and I was always impressed with Altavista. Google finally won my heart when the came out with The simple map interface was so logical and so simple that it put mapquest to shame for good reason.

But the story of search much more complicated and interesting. For more reading check out


Date conversions

May 14th, 2014

Dates are a surprising pain for correlating data an at the same time being understandable to the human reader.

For a generic form we can take seconds since 1970, on UNIX, with

$ date '+%s'

but then how to we convert it back? There is a neat trick for that, “date -d @”

$ date -d @1311372938  +%H:%M:%S

so in scripts we can load up the date into seconds, that can be subtracted to get deltas and at the same time converted easily to any date format :

date_secs=` date '+%s'`
date -d @$date_secs +%H:%M:%S

converting human back to seconds :

date -j -f "%Y-%m-%d" 2006-06-30 +"%a %F"

date -j -f “%Y-%m-%dT%H:%M:%SZ” 2016-12-23T14:12:28Z +”%s”


AWK takes a bit of a trick to convert seconds to human format:

# setup the UNIX command
d="date -d @"date_secs" +%H:%M:%S"
# run the command to getline and put it into "date_human_format"
d | getline date_human_format
# be sure and close the open descriptor

Dtrace will output nano seconds since 1970 in “walltimestamp” function, so just lop off the nanoseconds and you can convert it to date


Postgres pl/pgSQL

Date deltas in seconds ( multiply by 1000 to get ms as the values preserves the decimal)

extract(epoch from


Oracle  has the nice feature of returning the days along with fractional days when subtracting two dates, so it seems like just subtracting off 1970 should work and then multiplying by the seconds in a day

SELECT (SYSDATE - to_date('01-JAN-1970','DD-MON-YYYY')) * (86400) from dual;

But since Oracle can already subtract dates there is little need to do the conversion unless extracting the data to be used by an external consumer.

If Oracle is reading data from an external source in seconds since 1970 we can just reverse the process

select to_char((TO_DATE('01-JAN-1970','DD-MON-YYYY') + seconds_since_1970 / 86400 ),'DD-MON-YYYY') from dual;


SQL> select to_char((TO_DATE('01-JAN-1970','DD-MON-YYYY') +1311372938  / 86400 ),'DD-MON-YYYY') from dual;


SQL> SELECT (SYSDATE - to_date('01-JAN-1970','DD-MON-YYYY')) * (86400) from dual;



One issue on Oracle though, the variable type timestamp can be a bit frustrating to work with, but it can be cast into a date, for example:

select cast(begin_interval_time as date ) from DBA_HIST_SNAPSHOT;

UPDATE: just found this link which is pretty good at covering a wide range of languages and databases:

though the didn’t include AWK :)  – who uses AWK these days anyway?

NOTE:  highcharts uses epoch for X-axis which makes formatting and manipulation easy, but be aware that highcharts usesmilliseconds instead of seconds.

For example to format the hover tooltip of a point in highchart to hour colon minute. (highchart seems to use the standard date command formatting on UNIX)

 tooltip: {
      formatter: function () {
        return Highcharts.dateFormat("%H:%M ", this.x) ;

highchart date format info:


I/O wait histograms in Oracle

May 13th, 2014

DBA_HIST_EVENT_HISTOGRAM has a history of I/O latency histograms. DBA_HIST_EVENT_HISTOGRAM is only on 11g, so for 10g, it requires collecting the histogram deltas manually off of V$EVENT_HISTOGRAM, something like

set pagesize 150
col event format a25
col tm format a14
select event,
       to_char(sysdate,'J')||to_char(round(sysdate-trunc(sysdate),5)) tm,
from v$event_histogram
where event in ('log file parallel write' ,
                'db file scattered read' ,
                'db file sequential read' )
order by event;

and collecting this information every N seconds and taking the deltas.

If you want the buckets in columns instead of rows:

    to_char(sysdate,'J')||to_char(round(sysdate-trunc(sysdate),5)) tm,
    sum (case when WAIT_TIME_MILLI=1 then WAIT_COUNT else 0 end) b1,
    sum (case when WAIT_TIME_MILLI=2 then WAIT_COUNT else 0 end) b2,
    sum (case when WAIT_TIME_MILLI=4 then WAIT_COUNT else 0 end) b3,
    sum (case when WAIT_TIME_MILLI=8 then WAIT_COUNT else 0 end) b4,
    sum (case when WAIT_TIME_MILLI=16 then WAIT_COUNT else 0 end) b5,
    sum (case when WAIT_TIME_MILLI=32 then WAIT_COUNT else 0 end) b6,
    sum (case when WAIT_TIME_MILLI=64 then WAIT_COUNT else 0 end) b7,
    sum (case when WAIT_TIME_MILLI=128 then WAIT_COUNT else 0 end) b8,
    sum (case when WAIT_TIME_MILLI=256 then WAIT_COUNT else 0 end) b9,
    sum (case when WAIT_TIME_MILLI=512 then WAIT_COUNT else 0 end) b10,
    sum (case when WAIT_TIME_MILLI=1024 then WAIT_COUNT else 0 end) b11,
    sum (case when WAIT_TIME_MILLI=2048 then WAIT_COUNT else 0 end) b12,
    sum (case when WAIT_TIME_MILLI=4096 then WAIT_COUNT else 0 end) b13,
    sum (case when WAIT_TIME_MILLI=8192 then WAIT_COUNT else 0 end) b14,
    sum (case when WAIT_TIME_MILLI=16384 then WAIT_COUNT else 0 end) b15,
    sum (case when WAIT_TIME_MILLI=32768 then WAIT_COUNT else 0 end) b16,
    sum (case when WAIT_TIME_MILLI=65536 then WAIT_COUNT else 0 end) b17,
    sum (case when WAIT_TIME_MILLI=131072 then WAIT_COUNT else 0 end) b18,
    sum (case when WAIT_TIME_MILLI=262144 then WAIT_COUNT else 0 end) b19,
    sum (case when WAIT_TIME_MILLI=524288 then WAIT_COUNT else 0 end) b20,
    sum (case when WAIT_TIME_MILLI=1048576 then WAIT_COUNT else 0 end) b21,
    sum (case when WAIT_TIME_MILLI=2097152 then WAIT_COUNT else 0 end) b22,
    sum (case when WAIT_TIME_MILLI=4194304 then WAIT_COUNT else 0 end) b23,
    sum (case when WAIT_TIME_MILLI > 4194304 then WAIT_COUNT else 0 end) b24
from v$event_histogram
where event='log file parallel write'
group by event,to_char(sysdate,'J')||to_char(round(sysdate-trunc(sysdate),5)) ;

Exploring I/O histogram values in AWR. In the AWR repository I have several databases, all reading off the same LUNs, so below I try to  take the deltas between each snapshot and sum up all the waits for each bucket in the histogram

set pagesize  50000
set feedback off
set echo off
SET markup HTML on
spool hist.html
with hist as  (
       to_char(trunc(cast(begin_interval_time as date ))+
         (ROUND ((cast(begin_interval_time as  date) -
         TRUNC (cast(begin_interval_time as date))) * 24) / 24),
        'YYYY/MM/DD HH24:MI') btime,
  from dba_hist_event_histogram  h,
       dba_hist_snapshot sn
         h.instance_number = 1
     and sn.instance_number = 1
     and h.event_name like 'db file seq%'
     and sn.snap_id=h.snap_id
     and sn.dbid=h.dbid
select  a.btime,
        sum(b.wait_count - a.wait_count)
from hist a,
     hist b
where a.dbid=b.dbid
  and a.snap_id=b.snap_id-1
  and a.wait_time_milli = b.wait_time_milli
group by a.btime, a.wait_time_milli
having  sum(b.wait_count - a.wait_count) > 0;
spool off
SET markup HTML off

With the HTML output it’s easy to read into excel.

The data can be graphed in Excel using pivot tables or pivot charts, but if I simply save the data in an excel worksheet, then I can open it up in Tableau, which in some circumstances, can be easier to use than excel


The Y axis is in log scale. I don’t find that the data speaks to me immediately (other than there are some seriously slow I/Os) but it is easy to see the major outliers.

One advantage of Tableau is the easy color coordination. Using the “cyclic” pallet, the colors come out pretty good for the I/O histograms (this would be time consuming on Excel)

Other histogram display options

Screen Shot 2014-05-12 at 11.25.46 PM

Screen Shot 2014-05-12 at 11.26.03 PM


Version control a multi-terrabyte database?! Yes !

May 12th, 2014


Can you version control database?

How do you manage databases in application development?  For all the advances in application development such as machine virtualization , agile development, devops practices, the database still stands as a huge barrier to development agility.

For code  management  it’s easy to use source control such as git, svn, perforce etc and for provisioning environments it can be accomplished efficiently with chef, puppet,vagrant,  jenkins. The tools support source control code, spinning up environments and testing via continuous integration and continuous delivery.

But what about the database? Databases can be in the multi-terrabytes. How does one provision environments with multi-terrabyte databases and version control these databases as changes are made by multiple different developers and as multiple versions of code and database states need to be maintained?

Answer: companies don’t version control databases. They suffer. They suffer thinking that it’s just the way life is. They suffer the consequences of lack of databases for development environments and lack of version control for the databases. The result is more bugs and most importantly slow application development times. But ..

There is a solution. The solution is data virtualization. Data virtualization empowers companies to spin up even multi-terrabyte databases in minutes for almost no storage overhead and it allows them to easily tag, branch and version control their databases.

On average with the solution of Delphix data virtualization and DB Maestro code management tools we see a 2x improvement in application development output. It’s like taking a team of 100 developers and getting the output of 200 developers. It’s actually better than that because just adding more people to a project doesn’t necessarily make the project go faster where as with Delphx and DB Maestro we do.

Find out more in a webinar Manage Database like Codebase with

  • Tim O’Brian analyst at Gleanster
  • Kyle Hailey of Delphix
  • Yaniv Yehuda  of DB Maestro

Industry analyst Tim O’Brian will share insights into problems and solutions they have analyzed in the industry and then Kyle Hailey and Yaniv Yehuda will explain how the new technologies of data virtualization and database code control work together to eliminate bottlenecks and problems in application development projects that work with databases.

Register Here


Companies rely on applications to supply the information that business uses to make decisions that generate revenue. Companies are constantly improving applications or creating new applications.
Application development depends upon provisioning environments for developers and QA.

The slower and more costly provisioning development and QA environments the more delays, bugs there will be in applications and the less revenue businesses will generate.
Businesses are constantly trimming IT budgets which leads to slower and few environment being provisioned which means more delays in the applications that the business depends on to generate revenue which in turn puts more pressure to trim IT budgets.
As examples
  • costly and slow environment provisioning means most of QA costs are spent on provisioning instead of actually QA’ing applications
  • Slow QA environment provisioning means it takes longer to find bugs and the longer it takes to find bugs the costlier it is to fix them (and sometimes the bugs actually have to go do production because of time pressures)
  • Developers sharing environments means developers block each other over those shared resources slowing down development
  • Developers who have their own synthetic , non-representational of production, environments create bugs that are not caught until late in the development cycle or even in production.
  • slow environment builds in general mean delays
In an interview with Gene Kim, author of The Phoenix Project, he said

“One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need it“

Delphix agile data platform transform environment provisioning from an expensive slow process that requires days or weeks and teams of people and TB of disk to an automated, self service process that can be done in minutes for almost no storage overhead.
With Delphix, all developers  can be given full size copies of the production database and make changes as fast as they want.

The trouble is how does one integrate all those changes?

DBMaestro tracks all the changes and and will merge the changes and highlight conflicts.



NFS max “rsize” on Solaris – gotcha

May 9th, 2014

Laptop and Stethoscope

When mounting NFS file systems there is an option to set the max rsize requested. For example:

mount -o rsize=1048576,wsize=1048576,proto=tcp,vers=3 /foo

The general rsize used is 32K,  for example in Oracle documentation, but for large sequential I/O the larger rsize can make a big difference. In some tests the larger rsize was twice as fast.

Unfortunately though, if the client is Solaris the larger rsize is ignored because of a kernel parameter. That parameter is nfs3_bsize which defaults to 32K and limits the rsize. To change the value either add it to /etc/system for use on reboot or for changing it immediately, use mdb:

mdb -kw
> nfs3_bsize/D
nfs3_bsize:     32768
> nfs3_bsize/W 100000
nfs3_bsize:     0xd             =       0x100000


Query slower on exact clone

May 8th, 2014

A customer  was working with a clone copy of their production database for reports, but the cloned database reports were running slower than the same reports on production and they wanted to know why.

Ostensibly the machines were similar and the memory was the same, though it’s always worth verifying.  The first thing I checked was the load profile. If you have OEM you can look at the performance page and/or top activity page to see the load profile.  The load profile of production was much more CPU bound and much less I/O bound than the clone and their was clearly a difference.

The customer knew the time they the report was run on the clone  but was unsure exactly when it was run on production, though knew it had been run in the past day.

On the clone box, during the report run the load profile using SQL script to reproduce the performance graph looked like :

----- ----  ----------------------------------------------------
11:00   .2  +              6
12:00   .2  -              6
13:00   .1                 6
14:00   .1                 6
15:00   .1                 6
16:00 14.9  ++++++ooooooooo6ooooooooooooooooooooooooooooooooooo-
17:00   .0                 6

The reports were run between 4pm and 5pm  (ie 16:00-17:00) and that’s easy to see from the load chart.
The “+” represent CPU, “o” represent I/O, and “-” other wait time.
Now the reports on production were suppose to be run around 7am but the client wasn’t sure.
Here is what the load profile looked like on production looked like

----- --- ---------------------------------------------------------------------
00:00 1.5 ++++o
01:00 1.1 +++
02:00  .4 +
03:00  .2 +
04:00  .4 ++
05:00 1.5 ++ooo
06:00 1.6 +++oo
07:00 3.2 ++++++++ooo-
08:00 3.6 ++++++++++ooo--
09:00 6.1 +++++++++++++++++oooooo----
10:00 4.7 +++++++++++++++++ooo
11:00 5.3 +++++++++++++++++++ooo-
12:00 0.0 +++++++++++++++++++++++++++++++++oooooooo---------------------------
13:00 0.5 ++++++++++++++++++++++++oooooooooooooooooooo
14:00 19.2++++++++++++++++++++++++++++++++oooooooooooooooooooooooooooooooooooooo
15:00 6.5 +++++++++++++++oooooo

The customer thought the report had been run at 8am on production and at 8am there is little I/O wait and some  CPU. Looking at the top SQL from the clone during the report verses the production database at 8am didn’t show any of the same top SQL.  At this point I ask the customer to send me AWR exports of the two databases. I imported the AWR exports giving the clone database DBID=1 and production DBID=2

The first thing I check was the I/O latency in the AWR repository data. Both production and the clone showed I/O averaging 6ms, so latency was unlikely to be the problem.

Clone Latency  during the report:

BTIME           EVENT_NAME                        AVG_MS         CT
--------------- ------------------------------ --------- ----------
07-NOV-11 16:00 db file scattered read              6.48        4,246
07-NOV-11 16:00 db file sequential read             8.15    4,760,454   **
07-NOV-11 16:00 direct path read                    3.50      680,192
07-NOV-11 16:00 direct path write                  10.22           14
07-NOV-11 16:00 direct path write temp               .62            2

Production looked like (for example)

BTIME           EVENT_NAME                        AVG_MS         CT
--------------- ------------------------------ --------- ----------
07-NOV-11 14:00 db file scattered read              2.93    148,226
07-NOV-11 14:00 db file sequential read             6.35  4,961,761  **
07-NOV-11 14:00 direct path read                    2.32  2,706,322
07-NOV-11 14:00 direct path write                    .63        204
07-NOV-11 14:00 direct path write temp               .86        820

Thus the clone looks slightly slower,  but not enough to explain I/O load seen on the clone.

Then I check for the top SQL during the report on the clone which gave:

Top SQL on clone during the report

     SQL_ID ,
     sum(decode(session_state,'ON CPU',1,0))     "CPU",
     sum(decode(session_state,'WAITING',1,0))    -
     sum(decode(session_state,'WAITING', decode(wait_class, 'User I/O',1,0),0))    "WAIT" ,
     sum(decode(session_state,'WAITING', decode(wait_class, 'User I/O',1,0),0))    "IO" ,
     sum(decode(session_state,'ON CPU',1,1))     "TOTAL"
from   dba_hist_active_sess_history
where SQL_ID is not NULL
 and  dbid=1
group by sql_id
order by sum(decode(session_state,'ON CPU',1,1))   desc
   SQL_ID           CPU   WAIT     IO  TOTAL
   ------------- ------ ------ ------ ------
  4sfx95xxxxc03   31.0     .0  736.0    767

I then looked for the reports most expensive query on the clone in the AWR from production. The load profile for the same query on production was completely different.

Load on Production of top SQL from report on Clone

     SQL_ID ,
     sum(decode(session_state,'ON CPU',1,0))     "CPU",
     sum(decode(session_state,'WAITING',1,0))    -
     sum(decode(session_state,'WAITING', decode(wait_class, 'User I/O',1,0),0))    "WAIT" ,
     sum(decode(session_state,'WAITING', decode(wait_class, 'User I/O',1,0),0))    "IO" ,
     sum(decode(session_state,'ON CPU',1,1))     "TOTAL"
from   dba_hist_active_sess_history
where SQL_ID is not NULL
 and  dbid=2
and sql_id = '4sfx95xxxxc03',
group by sql_id
order by sum(decode(session_state,'ON CPU',1,1))   desc
   SQL_ID           CPU   WAIT        IO  TOTAL
   ------------- ------ ------ --------- ------
   4sfx95xxxxc03   12.0     .0     39.00     51

So the load profile of the most expensive report query on the clone looks completely different than production. On production the overall load is much lower and in particular I/O is drastically lower.


Now I wanted verify the stats for each SQL statement to make sure the query was doing the same work:

   SNAP_ID      EXECS       DISK        LIO        RWS        CPU     ELAPSE    IO_TIME       READS
---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------  ----------
     18798      25648    2746798   21669637      12554 1768040000 7867477859 7430523627     2746594
     18887     116449      52123    3026366      77023   67260000  377033758  313874605       52122

So query is being executed more on Prod yet  doing much less disk and less CPU.


Let’s fine the sql plan hash for production and the clone  :

--------- ------------- ---------------
    clone 4sfx95xxxxc03      1805550729
     Prod 4sfx95xxxxc03      3038004819

They are different!

Lets look at the two execution plans

col operation for a75
      LPAD(' ',depth)||
      OPTIONS||' '||
      OBJECT_NAME  operation
    and DBID=&DBID

(I used a direct query on DBA_HIST_SQL_PLAN but you can also used the package provided by Oracle, for example:

select * from table(dbms_xplan.display_awr('4sfx95xxxxc03',1805550729,1)); -- sql_id, plan_hash, DBID


Plan Hash
CLONE                                           Production
-----------                                     ------------
1805550729                                      3038004819  

----------------------------------              --------------------
SELECT STATEMENT_                               SELECT STATEMENT_
 SORT_ORDER BY                                    SORT_ORDER BY FILTER_
  FILTER_                                           FILTER_
   NESTED LOOPS_                                    NESTED LOOPS_
    NESTED LOOPS_                                    NESTED LOOPS_

So the execution plans have changes. Now why have they changed? That’s a story for another day, but the plan would only change if the data and/or the statistics on the tables had changed.

The clone in fact wasn’t an exact clone. Either it had different object statistics or the actual data was changed. Ih this case and with more investigation, it turns out the statistics were run more recently on the clone than on production.



photo by Daniel Sjöström


Full table scan runs way slower today!

May 7th, 2014

Every so often a DSS query that usually takes 10 minutes ends up taking over an hour.  (or one that takes an hour never seems to finish)

Why would this happen?

When investigating the DSS query, perhaps with wait event tracing,  one finds that the query which is doing full table scans and should be doing large multi-block reads and waiting for “db file scattered read” is instead waiting for single block reads, ie “db file sequential read”.  What the heck is going on?

Sequential reads during a  full table scan scattered read query is a classic sign of reading rollback and reading rollback can make that minute(s) full table scan take hours.

What can happen especially after over night jobs, is that if an overnight job fails to finished before the DSS query is run and if that overnight job  does massive updates without committing till then end, then the DSS query will have to rollback any changes made by the updates to the tables the DSS query is accessing.

How do we quickly identify if this our issue?

ASH is good at identify it. On the other hand it’s often impractical to whip up from scratch an ASH query and that’s where ashmasters on Github comes in. This ASH query and others are on Github under ashmasters.


For this case specifically see:

Here is the output (slight different format than in the github repository) of a query I used in my Oracle Performance classes

AAS SQL_ID           %  OBJ              TABLESPACE
----- -------------  ---  ---------------  ----------
  .18 0yas01u2p9ch4    6  ITEM_PRODUCT_IX  SOEINDEX
                       6  ORDER_ITEMS_UK   SOEINDEX
                      88  ITEM_ORDER_IX    SOEINDEX
  .32 6v6gm0fd1rgrz    6  MY_BIG_Table     SOEDATA
                      94  UNDO             UNDOTBS1

i.e. 95% of the second SQL_ID’s i/o was coming from UNDO. The reads will be single block reads and tremendously slow down the full table scans.

Time business concept.

Oracle, sql