Archive for September, 2013

Oaktable World give aways Day 2

September 23rd, 2013

Thanks to a generous gift from Apress, every 10th arrival tomorrow to Oaktable World will win a copy of

Pro Oracle Database 12c Administration

by  Darl Kuhn



schedule for Oaktable WOrld Day 2

Screen Shot 2013-09-16 at 2.47.27 PM







Oaktable World 2013 Day 1

September 23rd, 2013

A few vignettes from this mornings Oaktable World.

Kickoff from Mogens Nørgaard  on some of the origins of Oaktable Network.




John Beresniewicz , father of the ASH tree map, talking on ASH analytics



DevOps round table with Jonathan Lewis, Cary Millsap, James Murtagh, Eric Schrock and Kent Graziano



Gwen Shapira talking on Hadoop



Francisco Sanchez talking on cross platform converison



Riyaj Shamsudeen answering questions after his live hacking on RAC with truss pstack wireshark pmap strace presentation




Christian Antognini talking on row chaining and migration




It’s an Oracle World



Crowds waiting outside to get into Oaktable World


Iggy Fernandez comparing a Amazon NoSQL implementation to an equivalent Ebay implementation using an Oracle relational model.


Fabian Pascal talking about some of the short comings, should I say errors, in Codd’s relational model.



Riyaj Shamsudeen live hacking on RAC with truss pstack wireshark pmap strace



Women in Technology (WIT) discussion.



Oaktable World 2013

September 23rd, 2013



A few vignettes from this mornings Oaktable World.

Kickoff from Mogens Nørgaard  on some of the origins of Oaktable Network.

Fabian Pascal talking about some of the short comings, should I say errors, in Codd’s relational model.

Iggy Fernandez comparing a Amazon NoSQL implementation to an equivalent Ebay implementation using an Oracle relational model.

Riyaj Shamsudeen live hacking on RAC with truss pstack wireshark pmap strace

Women in Technology (WIT) discussion.

db48c106247411e38c8322000a9e08d3_7 0becd360247511e3aa5e22000aa80116_7





Oracle World talks of interest

September 22nd, 2013



Here are some talks of interest during Oracle Open World other than Oaktable World  where I’ll be all  Monday and Tuesday of course

Publicly editable version of  OOW talks 

Please feel free to modify the above spread sheet and/or email me with talks to add to this blog post – thanks!


Sunday IOUG talks:

Moscone West Level 3 Room 3002 – 3004 Room 3003 Room 3005 Room 3014 Room 3016 Room 3018 room 3001
Room Focus Database 12c/Upgrades Big Data & Business Analytics Oracle Enterprise Manager Database RAC/Engineered Systems Middleware/Cloud
8:00 am – 9:00 am UGF9757 – An Exploration of the Next Generation of the Oracle Database – Jim Czuprynski UGF9810 – Big Data Panel Discussion – Alex Gorbachev UGF9793 – Extending OEM12c’s features and functionality with APEX – Todd Bellaver UGF9776 – Optimizing Internal Serialization Control (latches and mutexs) – Craig Shallahamer UGF9804 – RAC performance with OEM 12c and AWR data. Deep interconnect troubleshooting – Tim Quinlan UGF9799 – WebCenter: Displaying Documents and Site Studio Content in Sites and Portal – Brian Huff
9:15 am – 10:15 am UGF9760 – Oracle 12c Best New Features – Rich Niemiec UGF9755 – Deep Dive to OBIEE / Oracle Business Intelligence Applications – Shyam Nath UGF9772 – I have to patch again? : Patch Management with Oracle Enterprise Manager 12c – Bobby Curtis UGF9769 – How to survive a disaster with RMAN – Gustavo Antunez UGF9742 – Oracle Database Appliance – RAC in a box Some Strings Attached. – Fuad Arshad UGF9797 – WebCenter Portal : Level of Customization (0- 100%) – Soy Joseph
10:30 am – 11:30 am UGF9759 – In The Dark Of The Night (While You Were Sleeping) – Kyle Quass UGF9754 – Deep Dive into Oracle BI Applications using Oracle Data Integrator 11g – Mark Rittman UGF9794 – Managing Oracle Enterprise Manager Cloud Control 12c with Oracle Clusterware – Leighton Nelson UGF9740 – RDBMS Forensics: Troubleshooting using ASH – Tim Gorman UGF9731 – Metadata Governance, the Next Evolution in Data Modeling and Management – Henry Olson UG9798 – WebCenter Portal ADF Designing for Desktop, Mobile & Tablet – Responsive And Adaptive Design Integration – John Sim
11:45 am – 12:45 pm UGF9765 – The Next Generation of the Oracle Database and 11gR2 New Features for Developers: What You Need to Know – John King UGF9722 – Understanding Data Movement in Enterprise Platforms – George Trujillo UGF9771 – The Essence of Great Leadership – John Matelski, Maria Anderson, Michelle Malcher, Rich Niemiec, Matt Vranicar UGF9739 – PHYSICAL ORDERING OF DATA: IS IT EVER USEFUL? – Mark Farnham UGF9795 – How to minimize the brownout, due to RAC reconfiguration – Amit Das UGF9775 – Using Oracle SOA Suite 11g to Integrate with PeopleSoft 9.1 – Brad Bukacek
1:00 pm – 2:00 pm UGF9807 – Simplifying Database Performance Analysis using OEM12c to Monitor Database and Storage Together – Sam Lucido UGF9764 – Oracle Enterprise Manager 12c, Oracle Database 12c, and You! – Bobby Curtis UGF9721 – Agile, Real-Time BI with Business Intelligence, Data Integrator and GoldenGate – Stewart Bryson UGF9801 – DBA Expert Panel – Ray Smith, Seth Miller and Maria Anderson UGF9803 – Clustering/HA with Oracle VM & OEM12c – 360 Degrees – Kai Yu UGF9796 – De- Mystifying Weblogic for Application Users – Abhinav Banerjee
2:15 pm – 3:15 pm UGF9758 – Flashback Techniques for Oracle Database 11g and The Next Generation of the Oracle Database – Carl Dudley UGF9806 – Manage the Data Lifecycle of Big Data Environments – Brian Vile UGF9773 – OEM Command Line Interface Illustrated – Ray Smith UGF9741 – You probably don’t need RMAN catalog database – Yury Velikanov UGF9791 – Accelerate your Oracle Exadata deployment with DBA skills you already have – Marc Fielding UGF9756 – Cloud Computing, Oracle VM & OEM12c – Expert Panel – Michael Ault UGF9792 – Leadership: Not Just For Managers – Maria Anderson
3:30 pm – 4:30 pm UGF9753 – Data Wrangling with Oracle Big Data Connectors – Gwen Shapira UGF9808 – OEM Panel Discussion – Kris Cook, Bobby Curtis, Ray Smith, Seth Miller UGF9802 – Exadata Panel Discussion – Shyam Nath and Aaron Werman UGF9790 – Where did my CPU go? – Karl Arao UGF9774 – Oracle RAC SIG Expert Panel UGF9809 – Virtualization Panel Discussion

Oracle Open World

Sunday (9/22)

10:30 – 11:30 Tim Gorman: RDBMS Forensics UGF9740
11:45 – 12:45 Kyle Hailey: Thin Cloning UGF9966 Moscone West 2009 Christian Antognini: The Query Optimizer in Oracle Database 12c: What’s New?, Moscone West – 3002
13:00 – 14:00 Ray Smith, Seth Miller, Tim Hall, Leighton Nelson, Maria Anderson: DBA Expert Panel UGF9801
14:15 – 15:15 Connor, Analytic functions, Moscone West – 2000
15:30 – 16:30 Ray Smith, Seth Miller, Bobby Curtis, Kris Cook: Oracle Enterprise Manager Panel Discussion UGF9808 Gwen Shapira: Data Wrangling with Oracle Big Data Connectors. Moscone West – 3003
17:15-19:00 Larry Ellison Oracle Database 12c In-Memory Database and M6 Big Memory Machine Moscone North, Hall D

Monday (9/23)

8:00 a.m.–9:45 a.m. Mark Hurd Transforming Businesses with Big Data and Analytics Moscone North, Hall D
10:45 a.m.–11:45 a.m.
12:15 p.m.–1:15 p.m. Mark Scardina, Markus Michalewicz: Understanding Oracle RAC Internals CON8806
1:45 p.m.–2:45 p.m. Tim Gorman: Making Star Transformations Sing CON7387 Mauro Pagano Pagano: SQL Tuning Made Easier with SQLTXPLAIN [CON8257], Moscone South – 305
3:15 p.m.–4:15 p.m. Graham Wood: Automatic Workload Repository Soup to Nuts: Fundamentals of Database Performance Analysis [CON9578]
4:45 p.m.–5:45 p.m. Mahesh Subramaniam, Kodi Umamageswaran: Oracle Exadata Technical Deep Dive: Architecture and Internals [CON8384], Moscoe South – 305

Tuesday (9/24)

10:15 a.m.–11:15 a.m.
11:45 a.m.–12:45 p.m.
1:15 p.m.–2:15 p.m. 1:30 p.m.–3:15 p.m. Larry Ellison The Oracle Cloud Platform: Database as a Service. Java as a Service. Moscone North, Hall D
5:15 p.m.–6:15 p.m. Eric Grancher: Consolidation Without Tears! Oracle Database 12c and Database Replay CON4366 Jonathan Lewis (with Maria Colgan), CON8643: Session Title: Oracle Optimizer Boot Camp: 10 Optimizer Tips You Can’t Do Without. 102 Moscone South

Wednesday (9/25)

10:15 a.m.–11:15 a.m.
11:45 a.m.–12:45 p.m.
1:15 p.m.–2:15 p.m.
3:30 p.m.–4:30 p.m. Tanel Poder: Moving Data Between Oracle Exadata and Hadoop. Fast. Moscone South 305. CON7586 Connor, Optimizer Statistics: A fresh approach, Moscone South 308
Gwen Shapira, Marc Fielding: It’s a Solid-State World: How Oracle Exadata X3 Leverages Flash Storage. Westin San Francisco – Metropolitan I Active Session History Deep Dive: Advanced Performance Analysis Tips. Moscone South 104. CON9577 (co-speaker Kellyn Pot’vin, Enkitec and Oaktable) the journey to DBaaS , part 1 , early lessons room ?? speaker ??
5:00 p.m.–6:00 p.m. Jeremiah Wilton: Being Sure: Confident Consolidations with Oracle Real Application Testing 12c [CON3255], Moscone South – 306  the journey to DBaaS , part 2 , early lessons room ?? speaker ??

Thursday (9/26)

9:00 a.m.–10:00 a.m.
11:00 – 12:00pm Connor, SQL Tuning 101, Moscone South 102 Ryan Lolli, Nick Wagner, Maria Pratt, Marie Couch: Deep Dive into Oracle GoldenGate HOL10026 Alex Gorbachev: Under the Hood of Pluggable Databases [CON7593], Moscone South – 305
12:00 p.m.–1:00 p.m.
1:30 p.m.–2:30 p.m.
2:00 p.m. – 3:00 p.m Gwen Shapira, Data Wrangling with Oracle Big Data Connectors, Moscone West 300
3:00 p.m.–4:00 p.m.


Screen Shot 2013-09-21 at 8.05.37 AM

Screen Shot 2013-09-16 at 2.47.27 PM


Alleviate load on SANs with Data Virtualization

September 20th, 2013


by Matt Hayward

One of the less obvious advantages of database virtualization is a reduction in the read I/O issued against the underlying physical storage (henceforth called “the SAN”) that ultimately stores the data for virtual databases.

In practice, Delphix prevents around 60% of all non-production database I/O* from ever being issued to the SAN with the Delphix cache.

This is possible because Delphix Server accommodates large amounts of RAM**, which is used as an auxiliary cache above and beyond the database buffer cache that resides on the database server.  When the database needs to read some data that is not present in the database buffer cache, an I/O is issued to the Delphix Server. Delphix then checks its own cache, and only passes the I/O request down to the SAN if the necessary data is not already in the Delphix cache.

Delphix cache hits bring dual performance benefits:

  • Virtual databases I/O service times for reads are fast: in the range of a few hundred microseconds plus network latency.  These ~1 millisecond latencies are 5-20 times faster than traditional SAN random read access times.
  • By serving these I/O requests from Delphix cache, the load on the SAN is reduced

Delphix’s minimum system configuration requires 16 GB of RAM, however most customers configure their Delphix Servers with 64 GB or more.  Thanks to these large caches, Delphix Servers consistently have a 60-70% cache hit ratio.

In the last two years, I’ve collected database I/O statistics from 469 production and non-production Oracle databases across 25 companies in a variety of industries and with diverse applications.  Studying these statistics gives the following findings for this particular sample:

  • Non-production databases account for 70% of all databases

Focusing on these non-production databases, which are the first candidates for database virtualization:

  • Reads accounted for 87% of database I/O, writes for only 13%.
  • Grouped by company, the highest read proportion observed was 97%, the lowest 61%
  • With Delphix’s typical 60-70% cache hit ratio, this means full virtualization of non-production environments would eliminate between:
    • 52.2 and 60.9% of total database I/O from the SAN

To give those findings a sense of scope, in the two years I’ve been working at Delphix, eliminating 60% of non-production I/O would amount to eliminating 59 petabytes of I/O.

To put that into perspective, Amazon Elastic Block Store charges $0.10 per million I/O requests.  Assuming an average I/O request size of 8 kilobytes it would cost you around$800,000 to do 59 petabytes of I/O in Amazon EBS.

That’s a truckload of I/O – literally: it takes around 120 milliwatt hours of energy to read 256 megabytes of data from disk, reading 59 petabytes would require around 29 megawatt hours – roughly equivalent to the energy in 2.55 tons of oil (or 25 tons of TNT*** if that’s more your style).

By using inexpensive server RAM as a secondary cache for virtual databases, Delphix can dramatically improve database read I/O performance, and eliminate 50-60% of non-production database I/O from the SAN.  This improves performance of virtual databases, and allows other applications to get more out of a shared SAN.



* Delphix eliminates around 60% of I/O associated with the operation of non-production databases.  It also eliminates huge amounts of I/O on both production and non-production systems formerly used to create, copy, and restore full backups – although this is not quantified in this blog post.

** The upper bound on RAM assignable the Delphix Server is constrained by limitations of the hypervisor or underlying physical host long before it approaches limits inherent in the Delphix OS.


Share this:


Cache Everything Without Breaking the Bank

September 19th, 2013



by Matt Hayward

I’m often surprised at how small the buffer caches allocated to Oracle databases are. When I started as a DBA at Amazon in the early 2000′s, we were moving our Oracle databases to 32-bit Linux.  I shudder to recall all the crazy things we used to do, such as fiddling with the stack address space, to get an SGA over 1.2 GB or so.  It’s alarming to me that these techniques are still documented for Oracle 11gR2 – I hope no one is still doing this – if you are, you have my sympathies.

In 64-bit architectures, it would seem that there is no reason to have such a small buffer cache.  RAM prices have dropped precipitously, the datasets within databases themselves have grown like kudzu. Nonetheless, I still find small Oracle buffer caches wherever I go. The median buffer cache size I observe in our customer base is 1,040 megabytes – which would fit in my 2002 32-bit Linux machine without too much bother.

Here is a histogram of customer buffer cache sizes I’ve encountered since 2011:

The implication of unchecked dataset growth and relatively static buffer cache sizes is that a smaller and smaller proportion of the database is cached.  Here is a histogram showing the percentage of a database that can be cached, from the same customers as above (although here I’ve excluded databases smaller than 64 GB):

How much performance impact would increasing these tiny caches have? It is hard to say with certainty – but there are some clues.  Oracle has a cache advice feature that populates the v$db_cache_advice table.  This feature forecasts the reduction (or increase) in physical reads that would come from growing (or shrinking) the buffer cache.  The table includes forecasts for buffer cache sizes up to around twice the current size.

The results are often shockingly non-linear, here is an example table from a customer database showing the forecast proportion of physical reads as the buffer cache increases to twice the current size of 928 MB:

928 100%
960 99.91%
1,040 99.80%
1,120 99.72%
1,200 99.65%
1,280 99.59%
1,360 99.56%
1,440 99.52%
1,520 99.23%
1,600 6.43%

You can see that at around 1,600 MB of buffer cache physical reads drop off dramatically.  Evidently, the working set of the database would basically fit inside 1,600 MB, but not 1,520 MB. Increasing our buffer cache from 928 to 1520 gets us basically nothing (less than 1% reduction in physical reads), while going from 1,520 to 1,600 would eliminate over 90% of physical reads.

When I look at the output from v$db_cache_advice on customer databases I see a wide range of impacts from doubling the buffer cache, from basically no impact, to eliminating pretty much all read I/O:

It’s hard to know exactly what the working set size of a database will be, and you can see from the table above that the differences between caching nearly the entire working set and the entire working set can be enormous.  As discussed in a prior blog post, reads accounted for 87% of database I/O on the sample I have access to.

Of course, an upper bound on the size of the working set would be the size of the database.

Why wouldn’t you just cache the entire database?

Doing so would eliminate ~87% of database I/O to the SAN in typical cases.

One obvious reason is cost.  Assuming I’m interpreting this price list correctly, the cost difference for a single storage engine in a VMAX 40K with 48 GB of RAM, versus one with 256 GB of RAM, is $222,165 – or $1.04 per megabyte.  It’s not likely you’ll be able to spend a cool million in order to cache your 1 TB database.

Delphix uses cheap server RAM to cache database data.  HP, Cisco, and Dell all offer servers that can house 1 TB or more of RAM.  You can price out a Dell PowerEdge M820 with 32 cores and 1.0 TB or RAM for around $45,000.

Delphix’s cache consolidation feature stretches that server RAM a long way, getting much more benefit than you would if you placed it in the database servers themselves.  Just as Delphix consolidates shared physical storage blocks among the databases it manages, it’s cache also consolidates the blocks in memory.

Consider a typical scenario with a ~1 TB production database, that has several copies for:

  • Development
  • QA
  • Staging
  • Production firefighting
  • Offloading production reports

If each of these copies were managed by the same Delphix Server with 1 TB of RAM, you would be able to cache the entire working set for all the systems:

As an aside, another nice feature of this kind of caching is that each database benefits from the warm Delphix cache created by the others.

vSphere 5.1 has a 1 TB limit for the RAM available to a single VM. Because Delphix is deployed as a virtual machine, it also inherits this 1 TB limit on its cache size.

This 1 TB limit is not very constraining as it turns out.  72% of databases I see at our customers are 1 TB or less, so this approach can cache the entirety of a substantial portion of all databases out there.


A decade ago RAM was expensive and 32 bit architectures (when used) limited Oracle buffer cache sizes.  Despite exponential decreases in the cost of RAM, and the elimination of architectural barriers to assigning more RAM to Oracle, buffer cache sizes remain tiny as a proportion of the total database size.

Delphix’s consolidated cache feature combined with the low cost of server RAM makes it possible to put entire non-production database environments in cache at modest prices.



Supercharged performance at a fraction of the price

September 18th, 2013


Virtual Flash Database  Delivers 10x Price/Performance Improvement


See Press release at 

Databases drive almost every application that powers any enterprise business process. Historically, enterprise databases have resided on slow, expensive disk-based storage, burning energy in the data center and bogging down IT operations. Developers, testing teams, and business analysts can wait for days or weeks to receive copies with the data they need.

Flash storage promises to fix many of these problems. Flash-based databases run faster, use less power, and are more flexible as architecture needs change. But organizations have been unable to run most of their databases on flash due to cost concerns. Pure Storage and Delphix bring the speed of flash and the power of virtualization to one of the most important components of the enterprise: the database.

Delphix and Pure Storage deliver virtual flash database copies that transform the way application and analytics projects operate. The combination enables customers to create multi-terabyte database copies on demand, to refresh or rewind these copies in minutes, and to link individual versions of databases to code builds with almost no effort or cost.

Performance Benchmarking


Delphix and Pure Storage recently benchmarked the performance of the combined all— flash solution, comparing to running databases on ordinary enterprise spinning disk. The test simulated an example from a leading e-commerce company, where the firm makes between 25 and 30 copies of its products database for development and test purposes.

The traditional option, a disk array storing physical copies of the database files, consumes 1 TB per copy and completes 35,000 transactions per minute (TPM), maxing out the disk array’s capabilities. Each copy requires a full duplicate: an additional 1TB of storage with enough performance to service 35,000 TPM. In total, supporting the 26 copies required by the customer requires over 26 TB of storage, and the associated number of disks, to achieve 910,000 TPM (26 X 35K), at a cost of approximately $2 Million.

Screen Shot 2013-09-17 at 4.48.16 PM

The Delphix-Pure Storage solution created and supported the same 26 copies, each with full data, in a total space of 1.5 TB, and performing at approximately 1 Million TPM. The cost of this higher- performing solution is 1/10th the cost of the above disk-based option. Unlike the disk-based option, the virtual flash databases can be created, refreshed, or rolled back in minutes, via self-service, by development or test teams. Block sharing across copies enables creation of large databases without duplication or movement of data across networks or arrays


Visit Delphix and Pure Storage at Oracle OpenWorld


  • Delphix: Booth 1132
  • Pure Storage: Booth 7


Delphix and Pure Storage are both Gartner 2013 Cool Vendors in Storage


Benchmarking done at Delphix in collaboration with Pure Storage by Uday Vallamsetty





Lies, Damned Lies, and I/O Statistics

September 16th, 2013



by Matt Hayward

Given a description of gas dynamics and the atmosphere, you would be hard to pressed forecast tornadoes. The term emergence denotes the phenomena of surprising behaviors arising in complex systems.  Modern storage systems are complex, and full of emergent behavior that make forecasting application I/O performance fiendishly difficult.

In collaboration with Kyle Hailey, Adam Leventhal, and others I’ve learned some rules of thumb for how to make accurate I/O performance forecasts.  I’ve also stepped on every rake imaginable along the way. For those of you who also may wish to forecast the storage performance an application will receive, this post summarizes some lessons learned.

When I began evaluating storage performance, I had a naive idea that the process would be like this:

I quickly discovered that the actual process was much more like this:

In going through this process, I’ve compiled a bestiary of performance anomalies you may expect to encounter if you interact with a variety of storage platforms, along with root causes for those anomalies and some explanatory notes. Following that are some approaches for designing I/O simulations or tests that avoid them.

 Bestiary of I/O Performance Anomalies

Caching Impossibly good performance:

  • Higher throughput than the connection to storage could provide
  • Latencies which imply faster than light travel over cables to the storage
Often the operating system and the underlying storage array will have large memory caches. Drivers will also tend to cache small amounts of data. This mostly occurs during read tests, but depending on the application write semantics can also occur during write tests.
Shared drives Inconsistent performance It is common in storage systems to allocate LUNs or file systems from storage pools that are composed of large numbers of physical drives shared with other LUNs.
Shared connection to storage Inconsistent performance, especially for:

  • High throughput tests
  • NAS storage with a 1 GB Ethernet connection
For storage tests being done within a VM, other VMs on the same physical server can contend with your tests for access to the storage.
I/O request consolidation Somewhat paradoxically both higher latency and higher throughput than expected. Particularly common for small sequential non-O_[D]SYNC writes Various I/O layers can group together multiple I/Os issued by your application before issuing them to the storage or a lower layer.
I/O request fragmentation Higher latency and lower throughput than expected, particularly for large I/Os or NFS based NAS storage Large application I/O requests can be broken down into multiple, smaller I/Os that are issued serially by intervening layers.
Read ahead
  • Improbably good sequential read performance
  • Unexpectedly poor random I/O performance
  • Performance that changes dramatically midway through a test
Many layers may decide to Read Ahead – that is to optimistically fetch data adjacent to the requested data in case it is needed. If you have a sequential read workload, read ahead will substantially improve performance. If you have a random read workload, read ahead ensures the storage subsystem components will be doing a lot of unnecessary work that may degrade performance. Finally, some systems will try to discern the random or sequential nature of your workload and dynamically enable / disable read ahead. This can lead to inconsistent behavior, for example a sequential read test may start slowly and then speed up once read ahead kicks in.
Tiered storage migration Unexpectedly bad performance, especially during initial tests on high powered SANs such as EMC VMAX Some storage systems cleverly use a mix of very high performance flash drives, fast hard disks, and slower large capacity hard disks. These systems dynamically move data among these tiers depending on their access patterns. Often data newly created for a test will be initially located on the slow high capacity disks – I have seen 8 kB random read latencies averaging around 20 ms, with spikes to around 100 ms, for initial tests on these kinds of ‘high performance’ ‘Enterprise’ storage systems.
First write penalty Unexpectedly bad write performance, especially if it happens early in testing and is not reproducible Many storage systems, volume managers, and some file systems will use some form of thin provisioning. In these systems when an initial write happens into some region, additional overhead is required, such as adjusting some meta-data and formatting the region. Subsequent writes to the same region will be faster. For example, a thin provisioned VMDK on VMware must be zeroed on first write – so a 1 kB application write can trigger a write of an entire VMFS block of 1 megabyte or more.
Elided reads Unexpectedly good read performance on raw devices or regions that have not been written Some file systems and storage systems know whether a region has been written to. Attempts to read from uninitialized regions can result in an immediate software provided response of: “Here you go, all zeros!” – without actually engaging the disk hardware at all. Both VMFS and ZFS will do this, depending on configuration.
Compressed I/O Unexpected, or even impossibly good write or read performance Some file systems will compress data. If your I/O test is writing out a pattern that compresses well (such as all 0s or all 1s), the amount of I/O submitted to and read from the physical storage will be a tiny fraction of your test’s intended I/O workload.
Storage Maintenance Unexpectedly poor performance Often when I speak to a storage administrator after getting unacceptable performance results, I learn there was some kind of maintenance happening at the time, such as migration of data to another storage pool, rebuilding of RAID configurations, etc.

 Avoiding Anomalies While Testing

Here is a summary of how to avoid these anomalies, with details below:

  • Use a real workload if possible
  • When simulating, be sure to simulate the actual application workload
  • Evaluate latencies using histograms, not averages
  • Verify your performance tests give reproducible results
  • Run test at the same time as the production application will run, and for sufficiently long durations
  • Ensure the test data is similar to what the application will use and produce

Use a real workload if possible. Unfortunately, often this isn’t be possible. For example, you probably won’t be able to determine the exact workload of the month end close for the new ERP system while that system is being architected, which is when you’ll need to design and select the storage.

When you must simulate, be sure to simulate what the application actually does in terms of I/O. This means understanding the read and write mix, I/O sizes and rates, as well as the semantics of the I/O that is issued.  For example: are writes O_SYNC or O_DSYNC, is Direct I/O used?  fio is an amazing tool for performing I/O simulation and tests, it can reproduce most application workloads, has good platform support, and an active development and user community.

When measuring I/O performance, be sure to use histograms to evaluate latencies rather that looking just at averages. Histograms show the existence of anomalies, and can clarify the presence of caches as well as the actual I/O performance that the disks are delivering. See, for example, these two images from an actual test on customer system:

First a sequential read I/O test was run, followed by a random read I/O test. If we looked only at averages, we would have seen a sequential read latency of around 4 ms, quite good. Looking at the histogram distribution however, it is clear we are getting a mix of 10-20 ms disk latencies, and 0.5-2 ms latencies, presumably from cache. In the subsequent random I/O test we see the improbably good performance with an average of 1 ms and I/Os ranging as low as 100 microseconds. Clearly our working set has been mostly cached here – we can see the few actual random read disk accesses that are occurring by looking at the small bar in the 10-20 ms range.  Without histograms it would be easy to mistakenly conclude that the storage was screaming fast and we would not see latencies over 5 ms.  For this reason, at Delphix, we use the 95th percentile latency as the guideline for how storage is responding to our tests.  Again,fio is an excellent tool for I/O testing that reports latency histogram information.

Run tests multiple times to verify reproducibility. If your second and third test runs show different results than the first, none are good basis for making a forecast of eventual application performance. If performance is increasing for later tests, most likely your data is becoming cached. If performance moves up and down, most likely you are on shared infrastructure.

Since shared infrastructure is common, run at least one test at the same time as when the key workload will run. On shared infrastructure it is important to test at the time when the actual application performance will matter. For example, test during the peak load times of your application, not overnight or on a weekend. As an aside, I am occasionally misinformed by customers that the infrastructure is not shared, only to learn later that it is.

For read tests, ensure your test data size is comparable with the size of the eventual application data – or at least much larger than any intervening caches. For example, if you are developing a 1 TB OLTP system try to test over 1 TB of data files. Typical SANs have order 10 GB of cache shared among multiple users. Many operating systems (notably Linux and Solaris) will tend to use all available system RAM as a read cache. This suggests 100 GB would be the absolute minimum test data size that wouldn’t see substantial caching.

Run tests for long enough so that ramp up or dynamic workload detection changes don’t contribute substantially to your result. In practice, I find an adequate duration by running the test workloads over and over while progressively doubling the duration until I get two runs whose performance is within a close margin of one another.

Initialize your test files with data similar to what your application will use. This avoids first write penalties in the test, and ensures your tests are consistent with application performance when the storage uses compression.


Architects must often forecast the I/O performance an application will receive from existing storage. Making an accurate forecast is surprisingly tricky.  If you are trying to use a test workload to evaluate storage, there is a risk that the test will trigger some anomaly that makes the results invalid.  If you are lucky, these invalid results will be clearly impossible and lead you to do more testing, if you are unlucky they will appear reasonable problems will arise during the production roll out of the application.

An awareness of common root causes of I/O performance anomalies, and some rules of thumb for avoiding them while testing, can improve the accuracy of a performance forecast and reduce risks to an application roll out.

ping back:



Best Oracle Performance Tools

September 13th, 2013

To get Delphix Express, the free version of Delphix, go to

and create an account, then log in. You should see a DelphixExpress folder.

See Full Demo of Delphix


What tools do you use? What are your favorites and why? As someone who has been working  Oracle performance tools for the last 15 years, I’m always interested in what the best tools are.

For the past 4 blog posts I’ve reviewed some tools I like such as Method-R tools, Confio, Lab128 and EMlite.

There are other tools that I especially like and are totally free such as Tanel Poder’s Moats and snapper which are two important tools for a DBA to know about.

Here is my list of tools that I’m aware of (what other tools are out there?)


price would I use it ? db perf multi-db sql tune notes
Specialty free yes x sql trace  10046 parser
orasrp free yes x sql trace analyzer
10053 Viewer free yes x replace .doc with .zip and unzip
sqltxplain free yes x 215187.1: SQLT (SQLTXPLAIN) – Enhanced Explain Plan and related diagnostic information for one SQL
xplan free yes x extend explain plan info from Alberto Dell Era
xtrace free yes x java trace file viewer from Alberto Dell Era
latchprof free yes x Tanel’s latch collection query
waitprof free yes x Tanel’s wait collection query
tvdxstat free yes x Trivadis Extended Tracefile Analysis Tool
spviewer free yes x this looks interesting for statspack/AWR viewing and comparison
awr formatter free yes x chrome plugin  for AWR browsing – seescreencast free yes x script to combine top with top session in same output
snapper free yes x Tanel’s command line performance analysis
moats free yes x character mode top activity screen !
Method-R tools (Cary Millsap) ?? yes x sql trace analyzer, other stuff,mrkewmrkew examplesI’ve never had access to these tools but from what I can gather they should be awesome
emlite free/$256 yes x super cool lite weight tool that replace OEM – ie DBA info in a web browser
Ashviewer free yes x collects ASH data itself or uses v$active_session_history – pretty cool
ASHMon free yes x reads ASH data (requires license) unlessSASH is installed and pointed toSee BASH as well
MyOra free x x Never tried it. Looks fast and covers broad functionalitybut UI looks busy
Mumbai free yes x integrates snapper and orasrp
tora free yes x seems like a cool tool – haven’t really used it myself but it’s been around a while
Richmon free x
Lab128 $500/seat yes x x
DB Optimizer $1500/seat yes x x visual sql tuning diagrams
Quest Spotlight $1000/cpu x x x
Quest SQL Optimizer $1690/seat x
 Lightly  $2000/seat  yes  x  x  looks cool, haven’t personally tried
QuestFoglight /Performance Analyzer $3000/cpu yes x x
Oracle Diag Pack $5000/cpu yes x x
Oracle SQL Tuning Pack $5000/cpu x x
Confio ignite $1500/core?  x x x
Precise Indepth I3 $2000/cpu? x x
Monicle ??
other stuff
fourth elephant

Some of the prices I got off of programmers paradise, others are word of mouth, so any clarifications on prices would be appreciated. The tools marked “yes” are ones that I use or would use.  I would use Quest’s Performance Analyzer given the chance. I’ve never used it but from the demos of the tool, I like what I see.  I have never used Mumbai, but plan to and like again what I see in the demos. All the other tools marked “yes” I use. All the tools except “other stuff”, I consider reasonable tools for one job or another. Some are better than others in different ways. The “other stuff” tools I don’t see much point in. I generally consider OEM with diagnostic pack too expensive and slow, but hey, if customers have it, I’ll use it !  Plus diag pack comes with v$active_session_history and all of AWR which I can query directly with OEM.

On Enterprise tools I like this quote form It’s 5 years old but not a lot has changed since then:

The lack of decent tools to manage your databases on an enterprise level is just staggering. It’s hard to believe that nobody can cross the finish line on delivering a monitoring solution.

This looks interesting :

Tracefile analyzer tools:

this looks quite cool:

another tool to check out

Link from Yong Huang on sql command tools like Toad:

Tools that come with Oracle support:

  • Tools – LTOM, OS Watcher, & HangFG
  • Scripts – SlqTExplain, TracAnalyzer, PL/SQL Profiler

Login to My Oracle Support, Click the Knowledge Tab, On the left, Tools/training > Diagnostics

Image from 


Horse Racing at Track



September 12th, 2013


EMlight might not be as important now with Oracle 12c coming out and with Oracle’s EM Express, but EMlight is still pretty cool.  I  tried it about year  ago, and was impressed.  EMlight  is a light weight  re-write of Enterprise Manager that is dependable and improved. How is it improved? Let’s look at my favorite part, the “Top Activity” screen. The “Top Activity” screen has added a section on I/O latencies which is sorely missing from OEM’s performance page and Top Activity page. The OEM light version also includes  a list of the top events (as I did in DB Optimizer).  Very cool.

How is EMlight more dependable? With EMlight you just connect to the database and voila it works. It doesn’t require any of the brittle configuration files that OEM depends on nor the agents and processes that OEM depends on.

Here is an example of the To Activity screen in EMlight:

I tested version 1.2. Since then versions 1.3 and 1.4 have been released. I’ve meant to test these but haven’t yet and didn’t want to wait any longer before posting about this promising tool.

Just gave a quick look at 1.4 and looks pretty much the same but obvious improvements such as the “top activity” screen above now has a super responsive slider window to choose the aggregation time range, and you might notice in the above screen shot that CPU is missing in two of the 3 aggregates which is now fixed in 1.4 (I think the fix actually made it into 1.3 originally)

Check out more screen shots here


Here are a few other screen shots fro my testing:

Datafiles along with single block I/O latency which is a nice addition


I/O along with I/O latency (I don’t think OEM has every added I/O latency )

Buffer cache contents and advisory:



Archive file date times sizes and rates


Tables space usage and attributes
Top  Session, PL/SQL, SQL, Events  (kind of like the top activity page without the timeline)