Oaktable World give aways Day 2
Thanks to a generous gift from Apress, every 10th arrival tomorrow to Oaktable World will win a copy of
Pro Oracle Database 12c Administration
by Darl Kuhn
schedule for Oaktable WOrld Day 2
Thanks to a generous gift from Apress, every 10th arrival tomorrow to Oaktable World will win a copy of
by Darl Kuhn
schedule for Oaktable WOrld Day 2
A few vignettes from this mornings Oaktable World.
Kickoff from Mogens Nørgaard on some of the origins of Oaktable Network.
John Beresniewicz , father of the ASH tree map, talking on ASH analytics
DevOps round table with Jonathan Lewis, Cary Millsap, James Murtagh, Eric Schrock and Kent Graziano
Gwen Shapira talking on Hadoop
Francisco Sanchez talking on cross platform converison
Riyaj Shamsudeen answering questions after his live hacking on RAC with truss pstack wireshark pmap strace presentation
Christian Antognini talking on row chaining and migration
It’s an Oracle World
Crowds waiting outside to get into Oaktable World
Iggy Fernandez comparing a Amazon NoSQL implementation to an equivalent Ebay implementation using an Oracle relational model.
Fabian Pascal talking about some of the short comings, should I say errors, in Codd’s relational model.
Riyaj Shamsudeen live hacking on RAC with truss pstack wireshark pmap strace
Women in Technology (WIT) discussion.
A few vignettes from this mornings Oaktable World.
Kickoff from Mogens Nørgaard on some of the origins of Oaktable Network.
Fabian Pascal talking about some of the short comings, should I say errors, in Codd’s relational model.
Iggy Fernandez comparing a Amazon NoSQL implementation to an equivalent Ebay implementation using an Oracle relational model.
Riyaj Shamsudeen live hacking on RAC with truss pstack wireshark pmap strace
Women in Technology (WIT) discussion.
Here are some talks of interest during Oracle Open World other than Oaktable World where I’ll be all Monday and Tuesday of course
Publicly editable version of OOW talks
Please feel free to modify the above spread sheet and/or email me with talks to add to this blog post – thanks!
Moscone West Level 3 | Room 3002 – 3004 | Room 3003 | Room 3005 | Room 3014 | Room 3016 | Room 3018 | room 3001 |
Room Focus | Database 12c/Upgrades | Big Data & Business Analytics | Oracle Enterprise Manager | Database | RAC/Engineered Systems | Middleware/Cloud | |
8:00 am – 9:00 am | UGF9757 – An Exploration of the Next Generation of the Oracle Database – Jim Czuprynski | UGF9810 – Big Data Panel Discussion – Alex Gorbachev | UGF9793 – Extending OEM12c’s features and functionality with APEX – Todd Bellaver | UGF9776 – Optimizing Internal Serialization Control (latches and mutexs) – Craig Shallahamer | UGF9804 – RAC performance with OEM 12c and AWR data. Deep interconnect troubleshooting – Tim Quinlan | UGF9799 – WebCenter: Displaying Documents and Site Studio Content in Sites and Portal – Brian Huff | |
9:15 am – 10:15 am | UGF9760 – Oracle 12c Best New Features – Rich Niemiec | UGF9755 – Deep Dive to OBIEE / Oracle Business Intelligence Applications – Shyam Nath | UGF9772 – I have to patch again? : Patch Management with Oracle Enterprise Manager 12c – Bobby Curtis | UGF9769 – How to survive a disaster with RMAN – Gustavo Antunez | UGF9742 – Oracle Database Appliance – RAC in a box Some Strings Attached. – Fuad Arshad | UGF9797 – WebCenter Portal : Level of Customization (0- 100%) – Soy Joseph | |
10:30 am – 11:30 am | UGF9759 – In The Dark Of The Night (While You Were Sleeping) – Kyle Quass | UGF9754 – Deep Dive into Oracle BI Applications using Oracle Data Integrator 11g – Mark Rittman | UGF9794 – Managing Oracle Enterprise Manager Cloud Control 12c with Oracle Clusterware – Leighton Nelson | UGF9740 – RDBMS Forensics: Troubleshooting using ASH – Tim Gorman | UGF9731 – Metadata Governance, the Next Evolution in Data Modeling and Management – Henry Olson | UG9798 – WebCenter Portal ADF Designing for Desktop, Mobile & Tablet – Responsive And Adaptive Design Integration – John Sim | |
11:45 am – 12:45 pm | UGF9765 – The Next Generation of the Oracle Database and 11gR2 New Features for Developers: What You Need to Know – John King | UGF9722 – Understanding Data Movement in Enterprise Platforms – George Trujillo | UGF9771 – The Essence of Great Leadership – John Matelski, Maria Anderson, Michelle Malcher, Rich Niemiec, Matt Vranicar | UGF9739 – PHYSICAL ORDERING OF DATA: IS IT EVER USEFUL? – Mark Farnham | UGF9795 – How to minimize the brownout, due to RAC reconfiguration – Amit Das | UGF9775 – Using Oracle SOA Suite 11g to Integrate with PeopleSoft 9.1 – Brad Bukacek | |
1:00 pm – 2:00 pm | UGF9807 – Simplifying Database Performance Analysis using OEM12c to Monitor Database and Storage Together – Sam Lucido | UGF9764 – Oracle Enterprise Manager 12c, Oracle Database 12c, and You! – Bobby Curtis | UGF9721 – Agile, Real-Time BI with Business Intelligence, Data Integrator and GoldenGate – Stewart Bryson | UGF9801 – DBA Expert Panel – Ray Smith, Seth Miller and Maria Anderson | UGF9803 – Clustering/HA with Oracle VM & OEM12c – 360 Degrees – Kai Yu | UGF9796 – De- Mystifying Weblogic for Application Users – Abhinav Banerjee | |
2:15 pm – 3:15 pm | UGF9758 – Flashback Techniques for Oracle Database 11g and The Next Generation of the Oracle Database – Carl Dudley | UGF9806 – Manage the Data Lifecycle of Big Data Environments – Brian Vile | UGF9773 – OEM Command Line Interface Illustrated – Ray Smith | UGF9741 – You probably don’t need RMAN catalog database – Yury Velikanov | UGF9791 – Accelerate your Oracle Exadata deployment with DBA skills you already have – Marc Fielding | UGF9756 – Cloud Computing, Oracle VM & OEM12c – Expert Panel – Michael Ault | UGF9792 – Leadership: Not Just For Managers – Maria Anderson |
3:30 pm – 4:30 pm | UGF9753 – Data Wrangling with Oracle Big Data Connectors – Gwen Shapira | UGF9808 – OEM Panel Discussion – Kris Cook, Bobby Curtis, Ray Smith, Seth Miller | UGF9802 – Exadata Panel Discussion – Shyam Nath and Aaron Werman | UGF9790 – Where did my CPU go? – Karl Arao | UGF9774 – Oracle RAC SIG Expert Panel | UGF9809 – Virtualization Panel Discussion |
Oracle Open World
Sunday (9/22) |
||||
10:30 – 11:30 | Tim Gorman: RDBMS Forensics UGF9740 | |||
11:45 – 12:45 | Kyle Hailey: Thin Cloning UGF9966 Moscone West 2009 | Christian Antognini: The Query Optimizer in Oracle Database 12c: What’s New?, Moscone West – 3002 | ||
13:00 – 14:00 | Ray Smith, Seth Miller, Tim Hall, Leighton Nelson, Maria Anderson: DBA Expert Panel UGF9801 | |||
14:15 – 15:15 | Connor, Analytic functions, Moscone West – 2000 | |||
15:30 – 16:30 | Ray Smith, Seth Miller, Bobby Curtis, Kris Cook: Oracle Enterprise Manager Panel Discussion UGF9808 | Gwen Shapira: Data Wrangling with Oracle Big Data Connectors. Moscone West – 3003 | ||
17:15-19:00 | Larry Ellison Oracle Database 12c In-Memory Database and M6 Big Memory Machine Moscone North, Hall D http://www.oracle.com/openworld/keynotes/index.html?src=7328809&Act=741 | |||
Monday (9/23) |
||||
8:00 a.m.–9:45 a.m. | Mark Hurd Transforming Businesses with Big Data and Analytics Moscone North, Hall D | |||
10:45 a.m.–11:45 a.m. | ||||
12:15 p.m.–1:15 p.m. | Mark Scardina, Markus Michalewicz: Understanding Oracle RAC Internals CON8806 | |||
1:45 p.m.–2:45 p.m. | Tim Gorman: Making Star Transformations Sing CON7387 | Mauro Pagano Pagano: SQL Tuning Made Easier with SQLTXPLAIN [CON8257], Moscone South – 305 | ||
3:15 p.m.–4:15 p.m. | Graham Wood: Automatic Workload Repository Soup to Nuts: Fundamentals of Database Performance Analysis [CON9578] | |||
4:45 p.m.–5:45 p.m. | Mahesh Subramaniam, Kodi Umamageswaran: Oracle Exadata Technical Deep Dive: Architecture and Internals [CON8384], Moscoe South – 305 | |||
Tuesday (9/24) |
||||
10:15 a.m.–11:15 a.m. | ||||
11:45 a.m.–12:45 p.m. | ||||
1:15 p.m.–2:15 p.m. | 1:30 p.m.–3:15 p.m. Larry Ellison The Oracle Cloud Platform: Database as a Service. Java as a Service. Moscone North, Hall D | |||
5:15 p.m.–6:15 p.m. | Eric Grancher: Consolidation Without Tears! Oracle Database 12c and Database Replay CON4366 | Jonathan Lewis (with Maria Colgan), CON8643: Session Title: Oracle Optimizer Boot Camp: 10 Optimizer Tips You Can’t Do Without. 102 Moscone South | ||
Wednesday (9/25) |
||||
10:15 a.m.–11:15 a.m. | ||||
11:45 a.m.–12:45 p.m. | ||||
1:15 p.m.–2:15 p.m. | ||||
3:30 p.m.–4:30 p.m. | Tanel Poder: Moving Data Between Oracle Exadata and Hadoop. Fast. Moscone South 305. CON7586 | Connor, Optimizer Statistics: A fresh approach, Moscone South 308 | ||
Gwen Shapira, Marc Fielding: It’s a Solid-State World: How Oracle Exadata X3 Leverages Flash Storage. Westin San Francisco – Metropolitan I | Active Session History Deep Dive: Advanced Performance Analysis Tips. Moscone South 104. CON9577 (co-speaker Kellyn Pot’vin, Enkitec and Oaktable) | the journey to DBaaS , part 1 , early lessons room ?? speaker ?? | ||
5:00 p.m.–6:00 p.m. | Jeremiah Wilton: Being Sure: Confident Consolidations with Oracle Real Application Testing 12c [CON3255], Moscone South – 306 | the journey to DBaaS , part 2 , early lessons room ?? speaker ?? | ||
Thursday (9/26) |
||||
9:00 a.m.–10:00 a.m. | ||||
11:00 – 12:00pm | Connor, SQL Tuning 101, Moscone South 102 | Ryan Lolli, Nick Wagner, Maria Pratt, Marie Couch: Deep Dive into Oracle GoldenGate HOL10026 | Alex Gorbachev: Under the Hood of Pluggable Databases [CON7593], Moscone South – 305 | |
12:00 p.m.–1:00 p.m. | ||||
1:30 p.m.–2:30 p.m. | ||||
2:00 p.m. – 3:00 p.m | Gwen Shapira, Data Wrangling with Oracle Big Data Connectors, Moscone West 300 | |||
3:00 p.m.–4:00 p.m. |
by Matt Hayward
One of the less obvious advantages of database virtualization is a reduction in the read I/O issued against the underlying physical storage (henceforth called “the SAN”) that ultimately stores the data for virtual databases.
In practice, Delphix prevents around 60% of all non-production database I/O* from ever being issued to the SAN with the Delphix cache.
This is possible because Delphix Server accommodates large amounts of RAM**, which is used as an auxiliary cache above and beyond the database buffer cache that resides on the database server. When the database needs to read some data that is not present in the database buffer cache, an I/O is issued to the Delphix Server. Delphix then checks its own cache, and only passes the I/O request down to the SAN if the necessary data is not already in the Delphix cache.
Delphix cache hits bring dual performance benefits:
Delphix’s minimum system configuration requires 16 GB of RAM, however most customers configure their Delphix Servers with 64 GB or more. Thanks to these large caches, Delphix Servers consistently have a 60-70% cache hit ratio.
In the last two years, I’ve collected database I/O statistics from 469 production and non-production Oracle databases across 25 companies in a variety of industries and with diverse applications. Studying these statistics gives the following findings for this particular sample:
Focusing on these non-production databases, which are the first candidates for database virtualization:
To give those findings a sense of scope, in the two years I’ve been working at Delphix, eliminating 60% of non-production I/O would amount to eliminating 59 petabytes of I/O.
To put that into perspective, Amazon Elastic Block Store charges $0.10 per million I/O requests. Assuming an average I/O request size of 8 kilobytes it would cost you around$800,000 to do 59 petabytes of I/O in Amazon EBS.
That’s a truckload of I/O – literally: it takes around 120 milliwatt hours of energy to read 256 megabytes of data from disk, reading 59 petabytes would require around 29 megawatt hours – roughly equivalent to the energy in 2.55 tons of oil (or 25 tons of TNT*** if that’s more your style).
By using inexpensive server RAM as a secondary cache for virtual databases, Delphix can dramatically improve database read I/O performance, and eliminate 50-60% of non-production database I/O from the SAN. This improves performance of virtual databases, and allows other applications to get more out of a shared SAN.
* Delphix eliminates around 60% of I/O associated with the operation of non-production databases. It also eliminates huge amounts of I/O on both production and non-production systems formerly used to create, copy, and restore full backups – although this is not quantified in this blog post.
** The upper bound on RAM assignable the Delphix Server is constrained by limitations of the hypervisor or underlying physical host long before it approaches limits inherent in the Delphix OS.
*** http://en.wikipedia.org/wiki/Tonne_of_oil_equivalent,http://en.wikipedia.org/wiki/Tons_of_TNT
by Matt Hayward
I’m often surprised at how small the buffer caches allocated to Oracle databases are. When I started as a DBA at Amazon in the early 2000′s, we were moving our Oracle databases to 32-bit Linux. I shudder to recall all the crazy things we used to do, such as fiddling with the stack address space, to get an SGA over 1.2 GB or so. It’s alarming to me that these techniques are still documented for Oracle 11gR2 – I hope no one is still doing this – if you are, you have my sympathies.
In 64-bit architectures, it would seem that there is no reason to have such a small buffer cache. RAM prices have dropped precipitously, the datasets within databases themselves have grown like kudzu. Nonetheless, I still find small Oracle buffer caches wherever I go. The median buffer cache size I observe in our customer base is 1,040 megabytes – which would fit in my 2002 32-bit Linux machine without too much bother.
Here is a histogram of customer buffer cache sizes I’ve encountered since 2011:
The implication of unchecked dataset growth and relatively static buffer cache sizes is that a smaller and smaller proportion of the database is cached. Here is a histogram showing the percentage of a database that can be cached, from the same customers as above (although here I’ve excluded databases smaller than 64 GB):
How much performance impact would increasing these tiny caches have? It is hard to say with certainty – but there are some clues. Oracle has a cache advice feature that populates the v$db_cache_advice table. This feature forecasts the reduction (or increase) in physical reads that would come from growing (or shrinking) the buffer cache. The table includes forecasts for buffer cache sizes up to around twice the current size.
The results are often shockingly non-linear, here is an example table from a customer database showing the forecast proportion of physical reads as the buffer cache increases to twice the current size of 928 MB:
CACHE SIZE – MB | PERCENTAGE OF PHYSICAL READS RELATIVE TO BASELINE |
---|---|
928 | 100% |
960 | 99.91% |
1,040 | 99.80% |
1,120 | 99.72% |
1,200 | 99.65% |
1,280 | 99.59% |
1,360 | 99.56% |
1,440 | 99.52% |
1,520 | 99.23% |
1,600 | 6.43% |
You can see that at around 1,600 MB of buffer cache physical reads drop off dramatically. Evidently, the working set of the database would basically fit inside 1,600 MB, but not 1,520 MB. Increasing our buffer cache from 928 to 1520 gets us basically nothing (less than 1% reduction in physical reads), while going from 1,520 to 1,600 would eliminate over 90% of physical reads.
When I look at the output from v$db_cache_advice on customer databases I see a wide range of impacts from doubling the buffer cache, from basically no impact, to eliminating pretty much all read I/O:
It’s hard to know exactly what the working set size of a database will be, and you can see from the table above that the differences between caching nearly the entire working set and the entire working set can be enormous. As discussed in a prior blog post, reads accounted for 87% of database I/O on the sample I have access to.
Of course, an upper bound on the size of the working set would be the size of the database.
Doing so would eliminate ~87% of database I/O to the SAN in typical cases.
One obvious reason is cost. Assuming I’m interpreting this price list correctly, the cost difference for a single storage engine in a VMAX 40K with 48 GB of RAM, versus one with 256 GB of RAM, is $222,165 – or $1.04 per megabyte. It’s not likely you’ll be able to spend a cool million in order to cache your 1 TB database.
Delphix uses cheap server RAM to cache database data. HP, Cisco, and Dell all offer servers that can house 1 TB or more of RAM. You can price out a Dell PowerEdge M820 with 32 cores and 1.0 TB or RAM for around $45,000.
Delphix’s cache consolidation feature stretches that server RAM a long way, getting much more benefit than you would if you placed it in the database servers themselves. Just as Delphix consolidates shared physical storage blocks among the databases it manages, it’s cache also consolidates the blocks in memory.
Consider a typical scenario with a ~1 TB production database, that has several copies for:
If each of these copies were managed by the same Delphix Server with 1 TB of RAM, you would be able to cache the entire working set for all the systems:
As an aside, another nice feature of this kind of caching is that each database benefits from the warm Delphix cache created by the others.
vSphere 5.1 has a 1 TB limit for the RAM available to a single VM. Because Delphix is deployed as a virtual machine, it also inherits this 1 TB limit on its cache size.
This 1 TB limit is not very constraining as it turns out. 72% of databases I see at our customers are 1 TB or less, so this approach can cache the entirety of a substantial portion of all databases out there.
A decade ago RAM was expensive and 32 bit architectures (when used) limited Oracle buffer cache sizes. Despite exponential decreases in the cost of RAM, and the elimination of architectural barriers to assigning more RAM to Oracle, buffer cache sizes remain tiny as a proportion of the total database size.
Delphix’s consolidated cache feature combined with the low cost of server RAM makes it possible to put entire non-production database environments in cache at modest prices.
See Press release at http://www.digitaljournal.com/pr/1475286
Databases drive almost every application that powers any enterprise business process. Historically, enterprise databases have resided on slow, expensive disk-based storage, burning energy in the data center and bogging down IT operations. Developers, testing teams, and business analysts can wait for days or weeks to receive copies with the data they need.
Flash storage promises to fix many of these problems. Flash-based databases run faster, use less power, and are more flexible as architecture needs change. But organizations have been unable to run most of their databases on flash due to cost concerns. Pure Storage and Delphix bring the speed of flash and the power of virtualization to one of the most important components of the enterprise: the database.
Delphix and Pure Storage deliver virtual flash database copies that transform the way application and analytics projects operate. The combination enables customers to create multi-terabyte database copies on demand, to refresh or rewind these copies in minutes, and to link individual versions of databases to code builds with almost no effort or cost.
Delphix and Pure Storage recently benchmarked the performance of the combined all— flash solution, comparing to running databases on ordinary enterprise spinning disk. The test simulated an example from a leading e-commerce company, where the firm makes between 25 and 30 copies of its products database for development and test purposes.
The traditional option, a disk array storing physical copies of the database files, consumes 1 TB per copy and completes 35,000 transactions per minute (TPM), maxing out the disk array’s capabilities. Each copy requires a full duplicate: an additional 1TB of storage with enough performance to service 35,000 TPM. In total, supporting the 26 copies required by the customer requires over 26 TB of storage, and the associated number of disks, to achieve 910,000 TPM (26 X 35K), at a cost of approximately $2 Million.
The Delphix-Pure Storage solution created and supported the same 26 copies, each with full data, in a total space of 1.5 TB, and performing at approximately 1 Million TPM. The cost of this higher- performing solution is 1/10th the cost of the above disk-based option. Unlike the disk-based option, the virtual flash databases can be created, refreshed, or rolled back in minutes, via self-service, by development or test teams. Block sharing across copies enables creation of large databases without duplication or movement of data across networks or arrays
Benchmarking done at Delphix in collaboration with Pure Storage by Uday Vallamsetty
Given a description of gas dynamics and the atmosphere, you would be hard to pressed forecast tornadoes. The term emergence denotes the phenomena of surprising behaviors arising in complex systems. Modern storage systems are complex, and full of emergent behavior that make forecasting application I/O performance fiendishly difficult.
In collaboration with Kyle Hailey, Adam Leventhal, and others I’ve learned some rules of thumb for how to make accurate I/O performance forecasts. I’ve also stepped on every rake imaginable along the way. For those of you who also may wish to forecast the storage performance an application will receive, this post summarizes some lessons learned.
When I began evaluating storage performance, I had a naive idea that the process would be like this:
I quickly discovered that the actual process was much more like this:
In going through this process, I’ve compiled a bestiary of performance anomalies you may expect to encounter if you interact with a variety of storage platforms, along with root causes for those anomalies and some explanatory notes. Following that are some approaches for designing I/O simulations or tests that avoid them.
ANOMALY NAME | CHARACTERISTICS | NOTES |
---|---|---|
Caching | Impossibly good performance:
|
Often the operating system and the underlying storage array will have large memory caches. Drivers will also tend to cache small amounts of data. This mostly occurs during read tests, but depending on the application write semantics can also occur during write tests. |
Shared drives | Inconsistent performance | It is common in storage systems to allocate LUNs or file systems from storage pools that are composed of large numbers of physical drives shared with other LUNs. |
Shared connection to storage | Inconsistent performance, especially for:
|
For storage tests being done within a VM, other VMs on the same physical server can contend with your tests for access to the storage. |
I/O request consolidation | Somewhat paradoxically both higher latency and higher throughput than expected. Particularly common for small sequential non-O_[D]SYNC writes | Various I/O layers can group together multiple I/Os issued by your application before issuing them to the storage or a lower layer. |
I/O request fragmentation | Higher latency and lower throughput than expected, particularly for large I/Os or NFS based NAS storage | Large application I/O requests can be broken down into multiple, smaller I/Os that are issued serially by intervening layers. |
Read ahead |
|
Many layers may decide to Read Ahead – that is to optimistically fetch data adjacent to the requested data in case it is needed. If you have a sequential read workload, read ahead will substantially improve performance. If you have a random read workload, read ahead ensures the storage subsystem components will be doing a lot of unnecessary work that may degrade performance. Finally, some systems will try to discern the random or sequential nature of your workload and dynamically enable / disable read ahead. This can lead to inconsistent behavior, for example a sequential read test may start slowly and then speed up once read ahead kicks in. |
Tiered storage migration | Unexpectedly bad performance, especially during initial tests on high powered SANs such as EMC VMAX | Some storage systems cleverly use a mix of very high performance flash drives, fast hard disks, and slower large capacity hard disks. These systems dynamically move data among these tiers depending on their access patterns. Often data newly created for a test will be initially located on the slow high capacity disks – I have seen 8 kB random read latencies averaging around 20 ms, with spikes to around 100 ms, for initial tests on these kinds of ‘high performance’ ‘Enterprise’ storage systems. |
First write penalty | Unexpectedly bad write performance, especially if it happens early in testing and is not reproducible | Many storage systems, volume managers, and some file systems will use some form of thin provisioning. In these systems when an initial write happens into some region, additional overhead is required, such as adjusting some meta-data and formatting the region. Subsequent writes to the same region will be faster. For example, a thin provisioned VMDK on VMware must be zeroed on first write – so a 1 kB application write can trigger a write of an entire VMFS block of 1 megabyte or more. |
Elided reads | Unexpectedly good read performance on raw devices or regions that have not been written | Some file systems and storage systems know whether a region has been written to. Attempts to read from uninitialized regions can result in an immediate software provided response of: “Here you go, all zeros!” – without actually engaging the disk hardware at all. Both VMFS and ZFS will do this, depending on configuration. |
Compressed I/O | Unexpected, or even impossibly good write or read performance | Some file systems will compress data. If your I/O test is writing out a pattern that compresses well (such as all 0s or all 1s), the amount of I/O submitted to and read from the physical storage will be a tiny fraction of your test’s intended I/O workload. |
Storage Maintenance | Unexpectedly poor performance | Often when I speak to a storage administrator after getting unacceptable performance results, I learn there was some kind of maintenance happening at the time, such as migration of data to another storage pool, rebuilding of RAID configurations, etc. |
Here is a summary of how to avoid these anomalies, with details below:
Use a real workload if possible. Unfortunately, often this isn’t be possible. For example, you probably won’t be able to determine the exact workload of the month end close for the new ERP system while that system is being architected, which is when you’ll need to design and select the storage.
When you must simulate, be sure to simulate what the application actually does in terms of I/O. This means understanding the read and write mix, I/O sizes and rates, as well as the semantics of the I/O that is issued. For example: are writes O_SYNC or O_DSYNC, is Direct I/O used? fio is an amazing tool for performing I/O simulation and tests, it can reproduce most application workloads, has good platform support, and an active development and user community.
When measuring I/O performance, be sure to use histograms to evaluate latencies rather that looking just at averages. Histograms show the existence of anomalies, and can clarify the presence of caches as well as the actual I/O performance that the disks are delivering. See, for example, these two images from an actual test on customer system:
First a sequential read I/O test was run, followed by a random read I/O test. If we looked only at averages, we would have seen a sequential read latency of around 4 ms, quite good. Looking at the histogram distribution however, it is clear we are getting a mix of 10-20 ms disk latencies, and 0.5-2 ms latencies, presumably from cache. In the subsequent random I/O test we see the improbably good performance with an average of 1 ms and I/Os ranging as low as 100 microseconds. Clearly our working set has been mostly cached here – we can see the few actual random read disk accesses that are occurring by looking at the small bar in the 10-20 ms range. Without histograms it would be easy to mistakenly conclude that the storage was screaming fast and we would not see latencies over 5 ms. For this reason, at Delphix, we use the 95th percentile latency as the guideline for how storage is responding to our tests. Again,fio is an excellent tool for I/O testing that reports latency histogram information.
Run tests multiple times to verify reproducibility. If your second and third test runs show different results than the first, none are good basis for making a forecast of eventual application performance. If performance is increasing for later tests, most likely your data is becoming cached. If performance moves up and down, most likely you are on shared infrastructure.
Since shared infrastructure is common, run at least one test at the same time as when the key workload will run. On shared infrastructure it is important to test at the time when the actual application performance will matter. For example, test during the peak load times of your application, not overnight or on a weekend. As an aside, I am occasionally misinformed by customers that the infrastructure is not shared, only to learn later that it is.
For read tests, ensure your test data size is comparable with the size of the eventual application data – or at least much larger than any intervening caches. For example, if you are developing a 1 TB OLTP system try to test over 1 TB of data files. Typical SANs have order 10 GB of cache shared among multiple users. Many operating systems (notably Linux and Solaris) will tend to use all available system RAM as a read cache. This suggests 100 GB would be the absolute minimum test data size that wouldn’t see substantial caching.
Run tests for long enough so that ramp up or dynamic workload detection changes don’t contribute substantially to your result. In practice, I find an adequate duration by running the test workloads over and over while progressively doubling the duration until I get two runs whose performance is within a close margin of one another.
Initialize your test files with data similar to what your application will use. This avoids first write penalties in the test, and ensures your tests are consistent with application performance when the storage uses compression.
Architects must often forecast the I/O performance an application will receive from existing storage. Making an accurate forecast is surprisingly tricky. If you are trying to use a test workload to evaluate storage, there is a risk that the test will trigger some anomaly that makes the results invalid. If you are lucky, these invalid results will be clearly impossible and lead you to do more testing, if you are unlucky they will appear reasonable problems will arise during the production roll out of the application.
An awareness of common root causes of I/O performance anomalies, and some rules of thumb for avoiding them while testing, can improve the accuracy of a performance forecast and reduce risks to an application roll out.
ping back: http://swasheck.wordpress.com/2014/10/15/storage-testing-with-sqlio-and-fio/
To get Delphix Express, the free version of Delphix, go to
and create an account, then log in. You should see a DelphixExpress folder.
What tools do you use? What are your favorites and why? As someone who has been working Oracle performance tools for the last 15 years, I’m always interested in what the best tools are.
For the past 4 blog posts I’ve reviewed some tools I like such as Method-R tools, Confio, Lab128 and EMlite.
There are other tools that I especially like and are totally free such as Tanel Poder’s Moats and snapper which are two important tools for a DBA to know about.
Here is my list of tools that I’m aware of (what other tools are out there?)
price | would I use it ? | db perf | multi-db | sql tune | notes | ||
Specialty | |||||||
10046.pl | free | yes | x | sql trace 10046 parser | |||
orasrp | free | yes | x | sql trace analyzer | |||
10053 Viewer | free | yes | x | replace .doc with .zip and unzip | |||
sqltxplain | free | yes | x | 215187.1: SQLT (SQLTXPLAIN) – Enhanced Explain Plan and related diagnostic information for one SQL | |||
xplan | free | yes | x | extend explain plan info from Alberto Dell Era | |||
xtrace | free | yes | x | java trace file viewer from Alberto Dell Era | |||
latchprof | free | yes | x | Tanel’s latch collection query | |||
waitprof | free | yes | x | Tanel’s wait collection query | |||
tvdxstat | free | yes | x | Trivadis Extended Tracefile Analysis Tool | |||
spviewer | free | yes | x | this looks interesting for statspack/AWR viewing and comparison | |||
awr formatter | free | yes | x | chrome plugin for AWR browsing – seescreencast | |||
perfmon.sh | free | yes | x | script to combine top with top session in same output | |||
snapper | free | yes | x | Tanel’s command line performance analysis | |||
moats | free | yes | x | character mode top activity screen ! | |||
Method-R tools (Cary Millsap) | ?? | yes | x | sql trace analyzer, other stuff,mrkew, mrkew examplesI’ve never had access to these tools but from what I can gather they should be awesome | |||
DBA | |||||||
emlite | free/$256 | yes | x | super cool lite weight tool that replace OEM – ie DBA info in a web browser | |||
Ashviewer | free | yes | x | collects ASH data itself or uses v$active_session_history – pretty cool | |||
ASHMon | free | yes | x | reads ASH data (requires license) unlessSASH is installed and pointed toSee BASH as well | |||
MyOra | free | x | x | Never tried it. Looks fast and covers broad functionalitybut UI looks busy | |||
Mumbai | free | yes | x | integrates snapper and orasrp | |||
tora | free | yes | x | seems like a cool tool – haven’t really used it myself but it’s been around a while | |||
Richmon | free | x | |||||
Lab128 | $500/seat | yes | x | x | |||
DB Optimizer | $1500/seat | yes | x | x | visual sql tuning diagrams | ||
Quest Spotlight | $1000/cpu | x | x | x | |||
Quest SQL Optimizer | $1690/seat | x | |||||
Lightly | $2000/seat | yes | x | x | looks cool, haven’t personally tried | ||
Enterprise | |||||||
QuestFoglight /Performance Analyzer | $3000/cpu | yes | x | x | |||
Oracle Diag Pack | $5000/cpu | yes | x | x | |||
Oracle SQL Tuning Pack | $5000/cpu | x | x | ||||
Confio ignite | $1500/core? | x | x | x | |||
Precise Indepth I3 | $2000/cpu? | x | x | ||||
Monicle | ?? | ||||||
other stuff | |||||||
fourth elephant | |||||||
DBtuna |
Some of the prices I got off of programmers paradise, others are word of mouth, so any clarifications on prices would be appreciated. The tools marked “yes” are ones that I use or would use. I would use Quest’s Performance Analyzer given the chance. I’ve never used it but from the demos of the tool, I like what I see. I have never used Mumbai, but plan to and like again what I see in the demos. All the other tools marked “yes” I use. All the tools except “other stuff”, I consider reasonable tools for one job or another. Some are better than others in different ways. The “other stuff” tools I don’t see much point in. I generally consider OEM with diagnostic pack too expensive and slow, but hey, if customers have it, I’ll use it ! Plus diag pack comes with v$active_session_history and all of AWR which I can query directly with OEM.
On Enterprise tools I like this quote form http://www.infoworld.com/d/
The lack of decent tools to manage your databases on an enterprise level is just staggering. It’s hard to believe that nobody can cross the finish line on delivering a monitoring solution.
This looks interesting :http://windows.podnova.com/trends/oracle_performance_trace.html
Tracefile analyzer tools: http://ba6.us/node/177
this looks quite cool: http://jagjeet.wordpress.com/2012/09/20/sql-dashboard-v2/
another tool to check out http://dbspeed.com/product.html
Link from Yong Huang on sql command tools like Toad: http://yong321.freeshell.org/
Tools that come with Oracle support: http://blogs.oracle.com/Support/entry/support_tools_for_performance
Login to My Oracle Support, Click the Knowledge Tab, On the left, Tools/training > Diagnostics
Image from
EMlight might not be as important now with Oracle 12c coming out and with Oracle’s EM Express, but EMlight is still pretty cool. I tried it about year ago, and was impressed. EMlight is a light weight re-write of Enterprise Manager that is dependable and improved. How is it improved? Let’s look at my favorite part, the “Top Activity” screen. The “Top Activity” screen has added a section on I/O latencies which is sorely missing from OEM’s performance page and Top Activity page. The OEM light version also includes a list of the top events (as I did in DB Optimizer). Very cool.
How is EMlight more dependable? With EMlight you just connect to the database and voila it works. It doesn’t require any of the brittle configuration files that OEM depends on nor the agents and processes that OEM depends on.
Here is an example of the To Activity screen in EMlight:
I tested version 1.2. Since then versions 1.3 and 1.4 have been released. I’ve meant to test these but haven’t yet and didn’t want to wait any longer before posting about this promising tool.
Just gave a quick look at 1.4 and looks pretty much the same but obvious improvements such as the “top activity” screen above now has a super responsive slider window to choose the aggregation time range, and you might notice in the above screen shot that CPU is missing in two of the 3 aggregates which is now fixed in 1.4 (I think the fix actually made it into 1.3 originally)
Check out more screen shots here http://obzora.com/screenshots.html
Here are a few other screen shots fro my testing:
Datafiles along with single block I/O latency which is a nice addition
I/O along with I/O latency (I don’t think OEM has every added I/O latency )
Buffer cache contents and advisory:
Archive file date times sizes and rates
Tables space usage and attributes
Top Session, PL/SQL, SQL, Events (kind of like the top activity page without the timeline)
Recent Comments