Archive for July, 2014

ASH visualized in R, ggplot2, Gephi, Jit, HighCharts, Excel ,SVG

July 29th, 2014

There is more and more happening in the world of visualization and visualizing Oracle performance specifically with v$active_session_history.

Of these visualizations,  the one pushing the envelope the most is Marcin Przepiorowski. Marcin is responsible for writing S-ASH , ie Simulated ASH versions 2.1,2.2 and 2.3. See

Here are some examples of what I have seen happening out there in the web with these visualizations grouped by the visualization tool.


The first example is using Gephi. The coolest example of Gephi I’ve seen is Greg Rahn’s analysis of the voting for Oracle World mix sessions.

Here is Marcin’s example using Gephi with ASH data:


Here are two more examples from Marcin Przepiorowski using JIT or Javascript InfoVis Toolkit

Click on the examples to go to the actual HTML and not just the image. On the actual page from Marcin you can double click on nodes to make them the center of the network.

TOP 5 SQL_ID, SESSIONS and PROGRAMS joined together with additional joins to points not included in top 5. ex. TOP 5 SQL ID with list of sessions and programs TOP 5 SESSIONS with list of sql_id’s and programs TOP 5 PROGAMS with list of sessions and sql id’s

TOP 5 SQL_ID, SESSIONS and PROGRAMS joined together with additional joins to points not included in top 5. ex. TOP 5 SQL ID with list of sessions and programs TOP 5 SESSIONS with list of sql_id’s and programs TOP 5 PROGAMS with list of sessions and sql id’s



Frits Hoogland gives a great blog entry on getting started with R and then using R to analyze 10046 tracefiles (sort of ASH on steroids)

Here from Greg Rahn again is one of the cooler examples of R. In this case it’s the ASH data of a parallel query execution showing the activity of the different processes:


Here is an example basically reproducing the Top Activity screen with highcharts, a sqlscript on v$session and a cgi script to feed the data to the web page:


OEM 12c shows the new face of Enterprise Manager with the load maps

PL/SQL and SVG : EMlite

(quick apex example:



Karl Arao has been doing a good bit of ASH visualization using his own Excel work as well as Tanel’s Excel Perf Sheet  ( see a video  here)





To subset or not to subset

July 28th, 2014

What I see over and over again are development and QA teams using subsets of data. A full set of data is not used for testing until near the end of the development release cycle. Once a full set of data is used, testing flushes out more bugs than can be fixed in the time remaining forcing release dates to be pushed or releases to include bugs:

Screen Shot 2015-04-22 at 10.19.28 AM
Why do people use subsets? Here is one story. There was a problem at a customer in application development where using full copies for developers and QA was causing excessive storage usage and they wanted to reduce costs , so they decided to use subsets of the production development and QA
  • Data growing, storage costs too high, decided to roll out subsetting
  • App teams and IT Ops teams had to coordinate and manage the complexity of the  shift to subsets in dev/test
  • Scripts had to be written to extract the correct and coherent data, such as correct date ranges and respect referential integrity
  • It’s difficult to get 50% of data 100% of skew instead of  50% of data 50% of skew
  • Scripts were constantly breaking as production data evolved requiring more work on the subsetting scripts
  • QA teams had to rewrite automated test scripts to run correctly on subsets
  • Time lost in ADLC, SDLC to enable subsets to work (converting CapEx into higher OpEx) put pressure on release schedules
  • Errors were caught late in UAT, performance, and integration testing, creating “integration or testing hell” at the end of development cycles
  • Major incidents occurring post deployment, forcing more detailed tracking of root cause analysis (RCA)
  • Production bugs causing downtime were  due 20-40% to non-representative data sets and volumes.
Moral of the story, if you roll out subsetting,  it’s worth holding the teams accountable and tracking the total cost and impact across teams and release cycles. What is the real cost impact of going to subsetting? How much extra time goes into building and maintaining the subsets and more importantly what is the cost impact of letting bugs slip into production because of the subsets? If on the other hand you can test on full size data sets you will flush bugs out early where they can be fixed fast and cheaply.
Screen Shot 2015-04-22 at 10.12.32 AM
A robust, efficient and cost savings alternative solution would be to use database virtualization. With database virtualization, database copies take up almost no space, can be made in minutes and all the over head and complexities listed above go way. In addition database virtualization will reduce CapEx/OpEx in many other areas such as
  • Provisioning  operational reporting environments
  • Providing  controlled backup/restore  for DBAs
  • Full scale test environments.
And subsets do not provide the data control features that database virtualization provides to accelerate application projects (including analytics, MDM, ADLC/SDLC, etc.). Our customers repeatedly see 50% acceleration on project timelines and cost, which generally dwarf the CapEx, OpEx storage expense lines, due to the features we make available in our virtual environments:
  • Fast data refresh
  • Integration
  • Branching (split a copy of dev database off for use in QA in minutes)
  • Automated secure branches (masked data for dev)
  • Bookmarks for version control or compliance preservation
  • Share (pass errors + data environment from test to dev, if QA finds a bug, they can pass a copy of db back to dev for investigation)
  • Reset/rollback (recover to pre-test state or pre-error state)
  • Parallelize all steps: have multiple QA databases to run QA suites in parallel. Give all developers their own copy of the database so they can develop without impacting other developers.


Finding the blocking SQL in a lock wait

July 25th, 2014

One of my pet peeves on Oracle is the inability to find out what SQL took out a lock that another user is waiting. It’s easy to find the waiting user and their SQL with v$session by looking at v$session.event where the event is an “enqueue” (v8 and v9) or “enq: TX – row lock contention” and then looking up their SQL via the v$session.sql_hash_value which joins to v$sql.hash_value for the v$sql.sql_text.

So far so good and easy.
Second step of finding the blocker is really easy starting in 10g because Oracle has a new field v$session.blocking_session which can be joined back to v$session.sid to find information on that user.
The rub is that there is no way to find the SQL text that the blocking session ran that took out the original blocking lock.
For the 2 day course I teach on Active Session History (ASH) and Oracle wait events, I wanted to show students how to actually get the blocking SQL text if they really had to.
I went as far as looking at log miner to try and get the blocking SQL text and this works sometimes and sometimes it doesn’t. At that point I gave up, knowing the next step was dumping the redo logs which was more research than I felt like doing at the time.Luckily someone has picked up the torch – Doug Burns!
On the Oaktable email list I shared my research with Doug and Doug took it even farther and posted it on his blog:
Long story short, the best way to try and see what changed (when there was a change and not a “select for update”) to cause the lock is to use flashback information. For example if or contention table was TEST_TAB and our field that we knew was modified “VAL1″ then we could try to find what it was changed from:

Session 1

update test_tab set val1=’aa’ where id=1;

Session 2

update test_tab set val1=’aaa’ where id=1;

Blocking info from ASH where wait is enq: TX – row lock contention

      to_char(p2,'XXXXXXXX') p2hex,
      to_char(p3,'XXXXXXXX') p3hex,
      trunc(p2/65536) usn,
      mod(p2,65536) slot,
      p3 sqn, xid wait_xid
from v$active_session_history
where event like 'enq: T%'
and sample_time > sysdate - &v_minutes/(60*24)

BLOCK_XID	      P2HEX     P3HEX	    USN         SLOT      SQN  WAIT_XID
----------------  --------- --------- ---------- ---------- ---------- ----------------
0A0001007264000       A0001      6472	      10          1      25714

Data from flashback, after session 1 commits (before the commit there is no data returned)

       ,      VERSIONS_ENDTIME
       ,      VERSIONS_ENDSCN
       ,      id
       ,      val1
       FROM   TEST_TAB 
              VERSIONS BETWEEN
     where VERSIONS_XID=HEXTORAW('0A0001007264000')
---------------- --------------------- ------- -------- -------  - --------------
0A00010072640000 15-OCT-13 06.46.30 PM         17042888	         U            aa

Now that’s not the blocking SQL but at least you can see what the value of the field was that the blocker changed it to, so you can guess to some degree what the actual SQL was. Not great, but better than nothing.


Excel connect to Oracle – 64bit and 32bit issues

July 24th, 2014

Wow, thanks to

Process Monitor

I was able track down why I couldn’t connect to Oracle from Excel.

I had wanted to try some of the examples Charles Hooper has posted on connecting to and monitoring Oracle, for example

I kept getting the error “Provider not found”
Now what kind of trace info is there for an error like this in Excel? None AFAIK. Time to start guessing.
I’m on windows 7 64 bit. I have the 64bit 11gR2 Oracle installed.  Excel shows up in task manager as “EXCEL.EXE  *32″. My first guess was, “oh, excel must want the 32bit libraries” so I got the 32 bit instant client from Oracle. Unzipped them into a directory and put them first into the path. Still no dice.
Did a lot of google searches and turned up that I needed


but this wasn’t in any of the instant client zips that I downloaded from

Turns out it’s in a download halfway down the page:

*Instant Client Package – ODAC: Includes ODP.NET, Oracle Services for MTS, Oracle Providers for ASP.NET, Oracle Provider for OLE DB, and OO4O with Oracle Instant Client

Downloaded this, put all the dlls in the first directory in my path. Still no dice.

I tried “strace”, but it gave no output. Rrrr.

Then I tried process monitor – bingo.

With process monitor, I filtered on processes containing “excel”, ran the connect, got tons of output, but knew it was a problem with libraries. I found “oraoledb10.dll” among the output. It was listed in an old Oracle 10 install directory. Oracle 10 32bit had been initially been installed. The install gave no warnings but bombed out late in the game so I remvoed the 10g and I installed Oracle 11gR2 64bit. (Oracle 11gR2 32bit won’t even begin the install)
So now, I scoured the registry with regedit and found the old oraoledb10.dll here


I changed this to the new location of oraoledb11.dll
and now it works.


Delphix 4.1 releases! Oracle 12c PDBs, Sybase, Amazon AWS and Developer Jetpack

July 23rd, 2014

Delphix 4.1 just came out last week. It may sound only like a point release but there is an amazing amount of new technology:

I’m most excited about Amazon AWS support, Oracle 12c PDB support and developer jet pack , aka Jetstream. More coming on these features in upcoming blogs.

AWS Support

AWS support is super exciting because as a cloud enabling technology, Delphix is a perfect fit for AWS. AWS is currently hosting a replicated Delphix appliance replicating from our labs in Menlo Park. This means that I can provision in minutes into AWS any database linked into the Delphix appliance in our office in Menlo Park.  For customers wanting to migrate and/or sync database in the cloud this makes it a breeze. As I’ve blogged before, Delphix replication makes datacenter migration easy, fast and efficient. If I have a source database of 3 TB and 4 clones of the source that I want to move to the new data center then that would be 15 TB of data to move (3 TB x 5 copies – 1 source + 4 clones). In the case of Delphix replication it’s about 1 TB! yes 1/15 the space because Delphix compresses by 1/3 and all the clones use most of the same blocks in the source!  Also Delphix comes with masking so if you are using AWS for testing and QA those databases being replicated to AWS can all be masked databases.

If you want to use AWS for your database testing, QA or even for tbings like elastic compute, then how (the heck) are you going to get the data to AWS?  Delphix provides that solution, not to mention the speed and agility of using virtualized data in the cloud.


Screen Shot 2014-03-17 at 4.13.58 PM

Oracle 12c PDB support

Oracle 12c PDBs are cool. They reduce the memory consumption by about 400MB per database, but what about the storage space? The storage space of clone PDBs is still the full size unless you use Netapp or ZFS storage appliance. Now for anyone on any storage the clone storage can be almost negligible and the memory usage minimum, allowing one to put many databases as PDBs on the same hardware and storage infrastructure. This is awesome for teams of developers and QA  who want to spin up a either a PDB for each developer and/or multiple PDBs to to QA and testing in parallel.

Jet Stream

Jet stream will get it’s own blog post but for now suffice it to say that Jet Stream is an interface created specifically for developers. Up until now Delphix had self service access through the same interface that a Delphix admin would use. A Delphix admin of course sees more and has access to more than a developer. A developer has their own log in which limits what sources they see, how many virtual databases (VDB) they can create, how much storage they can use etc, but the developers use the same UI to take their actions of spinning up a VDB, represhing, rolling back etc. Now there is a UI made specifically for developers that will graphically show the lineage of their VDBs, allow them to graphically branch VDBs and share those branches with other developers.

Sybase Support

Delphix interestingly enough supports any database, but to support any database requires that a user of Delphix manually take a lot of time consuming steps. Delphix automation for supported databases makes all the steps required a few clicks of the mouse. Delphix continues adding automation and management support to more and more databases. Sybase support is awesome for those companies using Sybase and SAP with Sybase.

Performance Benchmarking

In past blogs I’ve talked about I/O subsystem benchmarking we’ve been working at Delphix. Now Delphix includes the other half of the performance benchmarking which is built in network latency  and throughput benchmarking. The network benchmarking combined with the I/O benchmarking immediately informs us upfront what the performance characteristics of the installation hardware will be before launching into a deployment.

Additional Features



Advance your career contest

July 21st, 2014

Want to advance your career ?

We’ve seen DBAs become managers, managers become directors, directors become VPs and CIOs go from lesser known companies to some of the best known in the world. Why did they get promoted? Because they brought in Delphix.

Delphix increases the speed, the agility of IT often enabling development teams to go twice as fast, an increase that is unprecedented.

Companies that have this advantage will outperform the competitors.

How do you learn Delphix? Up to now you had to buy Delphix but now for a short time we will be giving a few people copies of Delphix for learning purposes.

Here’s the deal:

   – We will  provide 15 smart techies with a copy of the Delphix Engine good for 6 months 
   – Then, we want to see who can demonstrate the coolest or whackiest use-case for Delphix involving…
        * creating virtual environments
        * securing or hardening environments
        * improving analytics
        * improving DevOps using Puppet, Chef, or your favorite scripting package
   – Demonstrate and blog about it

The three coolest use-cases will be awarded prizes at Oracle Open World, featured in video interviews, and their blogs will be promoted by Delphix.

More information coming.

For now feel free to send your information, who your are, what your blog is,  to if you are interested in being 1 of the 15.

What is Delphix?


Delphix is a software solution to enable thin-cloning of Unix/Windows file-systems and databases (i.e. Oracle, SQL Server, PostgreSQL, and Sybase) to enable self-service provisioning of entire application stacks, eliminating the biggest infrastructure constraints in development and testing, thus increasing the tempo of DevOps for project, and allowing dedicated environments even for the most trivial of tasks (such as testing changes for tuning a single SQL statement).  This technology also provides new alternatives for backup, high-availability, and analytics/reporting/ETL, as well as data masking to reduce the surface area of risk in non-production environments.

Of course, that’s just me saying all that.  I work for Delphix, so you’d expect us to say any old thing, right?

But it really is true, and it really changes a lot of things.  Think cold fusion.  Think sliced bread.

And we’re looking for a few good folks to prove it.

This technology is fast becoming the new norm.  Right now, shops using Delphix have a distinct competitive advantage, but a year or two from now, shops not using Delphix will be falling behind faster, because they will be at a distinct disadvantage as more people settle into the new norm.

The same is true for database administration skills.  As talented as you are personally, you’re only one person, and even if you did nothing but script and automate all day every day, you can’t fight the changes in the very laws of physics that virtualized storage brings.  You need to learn new tools, to stay ahead.

Businesspeople Running Towards Finish Line Two businessmen jumping and celebrating on the beach


I/O benchmarking chat with Uday Vallamsetty from Delphix

July 17th, 2014

Uday Vallamsetty from Delphix performance group just posted a great blog post on evaluating I/O performance in Amazon AWS with EBS.  I had a chance to talk with him a bit about I/O benchmarking and some of the surprises and challenges of I/O benchmarking as well as discuss the importance of producing a report card on any I/O subsystem one is using.


DevOps & Delphix : Chef recipes

July 16th, 2014

Delphix Engines expose all features via a stable WEB API built on top of HTTP and JSON.

Clients choose an HTTP client to interact with Delphix and integrate within their environment.

Delphix Engines are bundled with a command line interface which guides users for automation and integration with third party tools.

Delphix CLI example

Adding a SQL Server Source Environment:

Enter these commands through the command line interface:


   set type=HostEnvironmentCreateParameters;

   set hostEnvironment.type=WindowsHostEnvironment;
   set<Source environment name>;
   set hostEnvironment.proxy=<target host name>;

   set hostParameters.type=WindowsHostCreateParameters;
   set"<Source host IP address or hostname>";

   set primaryUser.credential.type=PasswordCredential;
   set primaryUser.credential.password=<password>;




Delphix Curl API example

Example of refreshing a virtual database back to the newest version of the source database

  curl -v -X POST -k --data @- http://delphix-server/resources/json/delphix/database/ORACLE_DB_CONTAINER-13/refresh  \
    -b ~/cookies.txt -H "Content-Type: application/json" <<EOF
       "type": "OracleRefreshParameters",
       "timeflowPointParameters": {
               "type": "TimeflowPointSemantic",
               "timeflow": "ORACLE_TIMEFLOW-13",
               "location": "LATEST_SNAPSHOT"



 Chef as a database provisioning tool

Chef is an automation platform for provisioning physical or virtual environments to a specific state in a controlled and repeatable way.

Chef may install binaries, control users, configuration.

Chef as a database provisioning tool:

  • Install binaries of the DBMS
  • Configure the DBMS
  • Provision data

Installing the binaries and configuring the database might be standard work for Chef but actually provisioning the data such as a copy of a source database is generally out of the purview of Chef, at least until now. Now with Delphix provisioning of the data, no matter the size, can be done in minutes with a few API calls.

With chef, one usually does not describe actions to take (“provision a VDB”), but the desired state of the system (“There must be a VDB running on that port”). This allows rules to be applied to heterogeneous systems in different states and continuously compared the actual with the desired state of the system (for instance, if a new server is added after the fact, it will notice that it needs to provision a VDB to get to the desired state). This is achieved through the separation of resources provider and recipes. The resource provider is the “library” which knows about the low level details of the implementation (is there a VDB running? how do I connect to a Delphix Engine? Which steps needs to be taken to go from the current state to a VDB being provisioned?). The “recipe” describe the state leveraging the resource providers.


Chef: Provisioning data from Delphix

Chef cookbooks (recipes) can use the Delphix Engine API to provision data.

Screen Shot 2014-07-14 at 2.10.14 PM

Chef recipes describe the desired state of the system.

Chef & Delphix example to provision a virtual database

In the following “dlpx_pgsql_vdb” is the block in the recipe which indicates the state of the system. It means “We want a postgresql (pgsql) VDB running”.

dlpx_pgsql_vdb "HR" do
        action :provision
        port node[:dlpx][:port] || 5443
        container "HR"
        delphix_server node[:dlpx][:delphix_server]
In this code details the following means:
dlpx_pgsql_vdb “HR”; do
We are describing the desired state of a postgresql Delphix VDB, and the Chef name for that is “HR”
action :provision
We want the VDB to be provisioned (as opposed to deleted or refreshed daily for instance)
port node[:dlpx][:port] || 5433
The port is read from the host configuration and defaults to 5433 (the recipes could be shared by many hosts or servers, each of them with their own config)
container “HR”
We want Delphix to name that VDB “HR”
delphix_server node[:dlpx][:delphix_server]
Look at the configuration for the hostname and credentials of the Delphix engine

Chef Providers build the library of utilities which can be leveraged in recipes. A Delphix Chef Provider can be built on top of the HTTP API.

def provision(group, mount_base, name)
    response ="#{@url}/database/provision", {
      :type => "OracleProvisionParameters",
      :container => {
        :type => "OracleDatabaseContainer",
        :name => name,
        :group => group
      :source => {
        :type => "OracleVirtualSource",
        :mountBase => mount_base
      :content_type => 'application/json',
      :cookies => @cookies

In this exmple

  • source –  hard coded as OracleVirtualSource
  • group – the delphix user group
  • mount_base –  nfs mount location on the target
  • name –  the new VDB name




SAP deployed with Delphix

July 15th, 2014

How does Delphix benefit a SAP project?

  • Speed up time to delivery for ASAP implementation methodology
  • Enable adoption of “Continuous Application Delivery” methodology
  • Reduce infrastructure overhead
  • Deliver higher quality projects

ASAP methodology is a framework for delivering large IT projects. SAP professionals are familiar with this, however given the size and complexity of SAP projects, many fail to fully adopt it. Delphix helps customers adopt and use ASAP methodology.

SAP Projects today are big, hairy beasts. They are 18 months + and it is extremely difficult to do more than one project at a time. Delphix helps SAP customers move to a continuous development model where they are no longer delivering one or two massive projects a year, but are delivering smaller, more nimble projects on an on-going basis.

Once we’ve establish what Delphix does for projects, we want to give a few examples.

  • Delphix dramatically cuts down the time it takes to deploy new SAP projects by parallelizing testing
  • Delphix improves the quality of SAP training and maximizes system up time
  • A real life example of Delphix

Delphix with SAP: Bringing success to otherwise tragic projects

We all know the story. In the beginning there is much excitement as the new ERP project is getting kicked off. Now the company will go from old inefficient methods to the latest and greatest processes, technology and fast data access. Then as the projects gets underway deadlines start to be missed, hidden obstacles arise, customizations are demanded,  project sprawl creeps in, etc.

The project that was targeted for 18 months either ends up taking 36 months to complete or the decision is made to go live at 18 months with half the functionality.



Screen Shot 2014-07-14 at 3.29.32 PM
An ASAP Project example

The general approach for an ASAP methodology has the five columns. We want to introduce the methodology first and then use this as a framework for where Delphix helps and how we shrink the individual project pieces.

Screen Shot 2014-07-14 at 3.39.06 PM



Assuming that companies have some level of ASAP methodology in place, their SAP projects. will look something like this. They have at least an 18 month long project and they are mostly single threaded.

Screen Shot 2014-07-14 at 3.40.54 PM


Delphix reduced significant time



The areas in red are where Delphix can help. Note Delphix can dramatically impact the first four, but the last column is mostly just standard end of project tasks.

Screen Shot 2014-07-14 at 3.41.50 PM


Delphix keeps the exact same rollout processes in place but speeds up certain steps significantly:

  • Elimination of data constraints shortens total project time
  • Deliver same project scope in less time
  • Deliver higher quality projects


Screen Shot 2014-07-14 at 3.43.44 PM


Delphix Productivity Gains Enables Continuous Application Development

  • Delphix enables continuous, parallel development
  • Enables move from monolithic, infrequent projects to parallel, small projects
  • Business realizes ROI much sooner

Screen Shot 2014-07-15 at 5.04.59 AM

Case Studies

Phase 1 : Storage reduction

Fortune 50  company reduced managed storage costs by $12M over 3 years for SAP by moving 1,500 SAP basis developers to VDBs.

Screen Shot 2014-07-15 at 5.10.22 AM

Phase 2 : Data Conversion

Major US retailer uses Delphix to convert data from legacy systems to SAP reducing conversion effort from 6 months to 2 months and reducing conversion errors at go live to zero.

  • Speed up iterative testing with VDBs
  • Automate cutover to production
  • Maximize test time and improve data quality

Screen Shot 2014-07-15 at 5.16.19 AM

Phase 3 :  Data Integration

Device manufacturer uses Delphix to make virtual copies of all systems and create integrated system for business-process testing.

Problem Delphix Solution
21 Systems comprise quote-to-cash process Virtual copy of all systems
New apps limited by proper testing Create business-process baseline for regression testing
Transactional testing required through full system Upgrade apps to support business opportunities
Delayed releases due to complexity 35% more projects completed , with flat budget

Phase 4 : Training Rollout

Major pet supplies retailer users Delphix for their business users training. By using enabling trainers to use Delphix self service data reset they are able to reduce training effort form 3 months to 1 month

Screen Shot 2014-07-15 at 5.29.08 AM



Phase 5 : Break Fix

Fortune 50 company used Delphix to identify a currency corruption in hours instead of days by creating a VDB in mutes with valid production data before the corruption and used that data to resolve the error and extract the correct data.
Delphix is used to ensure protection on an on-going basis. Without Delphix in place, customer rely on intermittent backups that have low RPO and RTO. With Delphix, we maintain a continuous set of data, meaning that in the case of data corruption or error, Delphix can provision data to any point in time.

Live  Archive

Screen Shot 2014-07-15 at 5.33.23 AM



Benefits Delphix delivers in a SAP implementation:

Conversion – Delphix enables developers to have multiple staging areas where they can test their ETL programs. The ability to rewind environments in minutes allows developers to test multiple versions in a day. This reduces conversion effort from months to weeks.

Development/Sandbox – Delphix allows developers to have their own individual sandbox environments for functional testing at no infrastructure overhead. In addition, Delphix allows different functional areas to develop in parallel for various projects. The result is a N^N architecture which improves productivity by 200 to 300% on average. Delphix also allows testing against full production dataset so all use cases (known or unknown) can be tested before rolling out to QA and production. This enables shorter development cycle with better quality deliverables.

Training – You can have full training environment with real life, production like data(instead of synthetic) in a matter of minutes. This allows multiple training environments created on demand. The result is a better change management and user acceptance in the eventual production rollout.

Production backup and breakfix – Delphix backups all source data by default. In case of data corruption, you can simply use Delphix to diagnose the issue and troubleshoot it




Much thanks to Mick Shieh for the majority of the above content.




Orion I/O calibration tool bug

July 14th, 2014

I use fio for all my I/O testing. Why not Orion from Oracle since almost all of my I/O testing and benchmarking has been geared toward Oracle? Several reasons

fio is

  • super flexible – able to configure it for almost all types of test
  • active community – updates almost every week, many by Jens Axobe (who wrote much of the Linux I/O layer)
  • reliable – if there are problems, it’s open source and one can discuss on the fio commuity email list
  • easy to distribute – just one executable, doesn’t require getting for example a full Oracle distribution

Orion on the other hand unfortunately has had some problems that have made it too undependable for me to trust.

In some cases Orion re-reads the same blocks covering a much smaller data set size than requested.The following strange behavior is with orion on X86 Solaris. The orion binary was from an 11g distribution. The root of the strange behavior is that orion seems to revisit the same blocks over and over when doing it’s random read testing.

A dtrace script was used to trace which blocks orion was reading. The blocks in the test were on /domain.

    #!/usr/sbin/dtrace -s
    #pragma D option quiet
    / strstr((args[0])->v_path, "/domain") != NULL /
    {  printf("%lld\n", args[1]->_uio_offset._f); }
  1. Created a 96GB file and put it’s path in /domain/mytest.lun
  2. Modified io.d to filter for /domain .
  3. Ensure no non-orion I/O is going to the filesystem.
  4. Start running io.d > blocks-read.txt
  5. Kicked off orion with:
    export LD_LIBRARY_PATH=.  
    ./orion -testname mytest -run advanced -matrix row -num_disks 5 -cache_size 51200 \
               -duration 60 -simulate raid0 -write 0 -num_large 0

-run advanced : users can specify customizations
-matrix row : only small random I/O
-num_disks 5 : actual number of physical disks in test. Used to generate a range of loads
-cache_size 51200 : defines a warmup period
-duration 60 : duration of each point
-simulate raid0 : simulate striping across all the LUNs specified. There is only one LUN in this test
-write 0 : percentage of I/O that is write, which is zero in this test
-num_large 0 : maximum outstanding I/Os for large Random I/O. There is no large random I/O in this test.

Once the test is finished, stopped the dtrace script io.d .

Example output from a run
   Command line:
   -testname mytest -run advanced -matrix row -num_disks 5 -cache_size 51200 -duration 60 
   -simulate raid0 -write 0 -num_large 0 

   These options enable these settings:
   Test: mytest
   Small IO size: 8 KB
   Large IO size: 1024 KB
   IO types: small random IOs, large random IOs
   Sequential stream pattern: RAID-0 striping for all streams
   Writes: 0%
   Cache size: 51200 MB
   Duration for each data point: 60 seconds
   Small Columns:,      0,      1,      2,      3,      4,      5,      6,      7,      8,      9,     10,     11,     12, 
                       13,     14,     15,     16,     17,     18,     19,     20,     21,     22,     23,     24,     25
   Large Columns:,      0
   Total Data Points: 26

   Name: /domain0/group0/external/lun96g	Size: 103079215104
   1 files found.

   Maximum Small IOPS=62700 @ Small=16 and Large=0
   Minimum Small Latency=81.81 usecs @ Small=2 and Large=0
Things look wrong right away.
The average latency is in 100s microseconds (above the fastest minute was average of 81us) over a file that is 96G which is twice as big as the cache of 48G.
The max throughput was 489MB/s
Total blocks read
    # wc -l blocks-touched.txt
    78954834 blocks-touched.txt
Unique blocks read
    # sort blocks-touched.txt | uniq -c | sort -rn > block-count.txt
    # wc -l block-count.txt
    98305 block-count.txt
We only hit 98,305 unique offsets in the file yet a 96GB file has 12,582,912 unique 8k offsets.
The unique block hits totals around 768 MB of data which is easily cached.
Blocks  access frequency
    # tail block-count.txt
    695 109297664
    694 34532360192
    693 76259328
    693 34558271488
The least frequently hit blocks were hit almost 700 times and the average was over 800 yet there were 78,954,834 block access in a file of
12,582,912 unique blocks , so the average should have been about 6 hits per block.

This may be caused by having multiple steams starting from the beginning of the  file  or at the same “random” offset every test duration of 60 seconds. I’m not sure. If this is the case, the only work around would be to increase the duration to an amount of time that would insure kicking out most of the blocks from the beginning of the test. If each thread starts out at the same location and reads the same set of “random” blocks, then there is no workaround. Ideally I’d want each stream to be starting from a different random location and reading a different set of random blocks.