Archive for June, 2014

Redo over multiple weeks

June 30th, 2014

I’ve always wanted some sort of calendar view of load where I could see patterns across the same days of the week and same hours of the day and then be able to pick different periods and diff them:

The first step of such a display would be selecting the data in such a way to represent the graphic. A graphic should be harder to do than a similar, though less powerful, ascii representation.

So here is a quick sketch of looking at redo generation over time. Right now I have to monitor how well redo can be archived which depends on how it fast it is generated. (typically what I’m more interested in is the general performance which is what the above graphic addressed)

The output below shows the redo generated per hour on average (picking the maximum hour in the bucket which is 4 hours long below) and the maximum generation at the minute level.  I may or may not be able to archive redo at the up to the minute but I do want to make sure I can keep up with each hour.

DAY    HOUR  MAX(MB)    w1     w2      w3       w4       w5
SAT     0       9               5       4       4       3
SAT     4       12              4       5       4       4
SAT     8       1               0       0       0       0
SAT     12      0               0       0       0       0
SAT     16      1               0       0       0       0
SAT     20      4               1       0       0       1
SUN     0       10              3       3       1       3
SUN     4       13              6       5       3       5
SUN     8       7               6       6       0       6
SUN     12      7               0       3       0       4
SUN     16      1               0       0       1       0
SUN     20      8               3       2       0       3
MON     0       8               3       2       3       2
MON     4       7               2       3       2       1
MON     8       5               1       0       2       0
MON     12      1               0       0       1       0
MON     16      1               0       0       0       0
MON     20      7               2       2       0       2
TUE     0       14              6       5       7       4
TUE     4       7               1       1       1       2
TUE     8       3               0       0       0       0
TUE     12      1       1       0       0       0       0
TUE     16      1       1       0       0       0       0
TUE     20      3       1       1       1       1       1
WED     0       8       3       2       3       2       2
WED     4       7       2       1       3       2       2
WED     8       8       1       0       0       2       3
WED     12      7       1       0       0       1       1
WED     16      1       1       0       0       0       1
WED     20      4       1       1       1       1       1
THU     0       15      7       8       6       6       6
THU     4       8       2       1       1       1       1
THU     8       1       1       0       0       0       1
THU     12      16      1       11      0       0       1
THU     16      1       1       0       0       0       1
THU     20      4       1       1       1       1       1
FRI     0       11      2       2       2       2       2
FRI     4       8       3       1       1       1       1
FRI     8       4       1       0       0       0       0
set heading off
set feedback off
set pagesize 0
with pivot_data as (
          WW pivot_col
        , DY
        , D
        , HH
        , max(maxval) mv
        , max(average) av
      ( select distinct
               4*trunc(to_char(begin_time,'HH24')/4)     HH,
               to_char(begin_time,'DY')       DY,
               mod(to_char(begin_time,'D'),7)  D,
               mod(to_char(begin_time,'WW'),5)       WW,
               average/(1024*1024) average,
               maxval/(1024*1024) maxval,
        where  dbid=&&DBID and
               metric_name='Redo Generated Per Sec'
           and begin_time > sysdate - 31
   group by HH,D,DY,WW
select DY ||'   '||
       HH ||'   '||
       round(max(mv),0)    ||'  '||
       round(max(w1),0)  ||'    '||
       round(max(w2),0)  ||'    '||
       round(max(w3),0)  ||'    '||
       round(max(w4),0)  ||'    '||
from (
   select *
   from  pivot_data
   pivot ( avg(av)
           for pivot_col in ( 1 as w1,2 as w2,3 as w3 ,4 as w4 ,5 as w5 )
group by DY,D,HH
order by D,HH
set heading on
set feedback on
set pagesize 30

PS the above pivot is for 11g, for 10g here is the query without pivot

set pagesize 100
col DY for A4
col HH for 99
col mx for 99
col w1 for 99
col w2 for 99
col w3 for 99
col w4 for 99
col w5 for 99
with pivot_data as (
        , DY
        , D
        , HH
        , max(maxval) mv
        , max(average) av
      ( select distinct
               4*trunc(to_char(begin_time,'HH24')/4)     HH,
               to_char(begin_time,'DY')       DY,
               mod(to_char(begin_time,'D'),7)  D,
               mod(to_char(begin_time,'WW'),5)       WW,
               average/(1024*1024) average,
               maxval/(1024*1024) maxval,
               metric_name='Redo Generated Per Sec'
           and begin_time > sysdate - 31
   group by HH,D,DY,WW
select DY, HH,
      round(max( mv ),0) mx,
      round(max( decode( WW, 0 , av, null ) ),0) w1,
      round(max( decode( WW, 1 , av, null ) ),0) w2,
      round(max( decode( WW, 2 , av, null ) ),0) w3,
      round(max( decode( WW, 3 , av, null ) ),0) w4,
      round(max( decode( WW, 4 , av, null ) ),0) w5
   from  pivot_data
group by DY,D,HH
order by D,HH

Just to see the redo per hour can be done more simply as:

set pagesize 100
col stat_name format a30
col MB format 9,999.99
       btime, stat_name,
       round((end_value-beg_value)/(1024*1024),2) MB
from (
       to_char(s.BEGIN_INTERVAL_TIME,'DD-MON-YY HH24:MI')  btime,
       e.value end_value,
       Lag (e.value)
              OVER( PARTITION BY e.stat_name ORDER BY s.snap_id) beg_value
   and s.begin_interval_time > sysdate -2
   and e.stat_name = 'redo size'
order by e.stat_name, begin_interval_time
where end_value-beg_value > 0
order by btime;


Big data or Agile data ?

June 27th, 2014

The big data phenomenon threatens to break the existing data supply chain (DSC) of many information providers, particularly those whose chains are neither flexible nor scalable and include too many error-prone, manual touch points.  – Cognizant

Big data is getting big hype but what exactly is driving the hype behind big data?  The driving force  is demand by businesses to answer revenue driving  questions.  Questions that drive revenue generating decisions depend on the ability to access the right data at the right time by the right people. Accessing the right data at the right time by the right people unfortunately  remains elusive with big data where, according to Gartner,  85% of Fortune 500 organizations in 2015 will still be unable to exploit big data for competitive edge. How can companies get the competitive edge? The competitive edge can be had be a new technology called data virtualization that uses existing relational databases but accelerates the access to the data.

Data is a hot topic.

Big Data is an even hotter topic.

But data agility? I don’t hear much about it.

As Forbes magazine put it :

It is time to stop the stampede to create capacity to analyze big data and instead pursue a more balanced approach that focuses on finding more data sets and understanding how to use them to improve your business.

What are we all striving for? We are striving to access the data we want, when we want it both quickly an efficiently. That’s what I call data agility.

Data agility is the big pink elephant in the room. Every one is talking about big data but  no one is talking about how do you get the data to where you want when you want. If you want to do big data, how do you get the terrabytes (TB) of data onto  your Hdoop cluster from where ever it was collected?  How does it take? How much work is it?

As Infoworld put it:

The bigger big data gets, the more challenging it becomes to manage and analyze to deliver actionable business insight. That’s a little ironic, given that the main promise of big data is the ability to make better business decisions based on compute-intensive analysis of massive data sets. The solution is to create a supply chain that identifies business goals from the start — and deploy the agile infrastructure necessary to make good on those objectives

Getting results from big data even after you have the data is difficult. Most fortune 500 companies don’t know how to get results from big data:

Through 2015, 85% of Fortune 500 orgs will be unable to exploit big data for competitive advantage.

Unlike big data, most companies already have burning questions they know how to answer if only they could get access to the data faster on their current relational databases. In their current relational databases there are many clear questions and known methods for answering those questions but the problem is getting the right data to the right place at the right time.   One of the main goals of ERP is getting data faster to answer critical business questions. Why do companies invest millions in ERP? ERP gets millions of dollars of investment every year because companies want to get answers to important business questions faster and with fresher data. Fresh data means getting the right data to the right people at the right time, which is data agility.

Data agility means getting the right data to the right people at the right time. Data is the life blood of more and more of the economy and the economy is becoming a data economy. Data agility is crucial for companies to succeed in the current and future data economy.

Agile data is the solution to getting the right data to the right place at the right time. Agile data’s technological core relies on data virtualization which  requires tracking data blocks at the storage level. By tracking data blocks at the storage level, duplicate blocks can be shared across many different copies of data while at the same time  any block changes can be stored in such a way that only the copy that made the changes sees the changed block. Tracking block changes and sharing duplicate data blocks is the core of data virtualization. The core of data virtualization has been around for almost 20 years in the form of storage system snapshots but like the internet without the web or gasoline with out a car, agile data can’t happen without an agile data platform.

An virtual data platform  automates all the pieces of tracking data and provisioning it and encapsulates it into a hardware agnostic software that provides a user friendly self service interface.

For example, how would one supply data to

  • a developer who required a copy of production yesterday at noon when bugs were seen
  • a QA team who required a copy of the development database with all it’s current changes
  • a BI team who required 24×7 access to production for ETL batch jobs
  • a production database team who needs to extract data before it was inadvertently deleted or incorrectly modified on production.

Problem with cloning a database file system snapshots

Let’s look at a concrete example of cloning a database with file system snapshots. Most any experienced storage admin can take a storage snapshot of an Oracle database running on some specialized storage capable of supplying storage snapshots. The snapshot is easy though it still may be necessary to shut the database down before taking the snapshot to ensure the data is consistent. Even when the database can remain running during the snapshot, it may still may require specialized functionality or extra vendor packages if the database spans multiple LUNs that require synchronization for the snapshots. Once the snapshot is made then an experienced DBA in coordination with the storage admin can start up a database using that snapshot. Startup up such a database will require renaming the database, and changing the location of files that were not part of the snapshot typically such as log files, trace files, initialization files and then the database will have to be recovered. If the database is being started on some other machine than that machine might require that the snapshot files been made accessible over fiber channel or mounted via NFS. If these NFS or fiber channel configuration change the file names to the datafiles then the parameter files and possibly other files will require that the datafile locations be changed before starting the database. If the copy is required up to the most recent point in time then this may require getting redo files from the source database and recovering the database down to the last minute or second. All of that is the easy case.

In the easy case we have a clone database, a “thin copy”, that shares duplicate blocks with the original database but stores changed blocks separate from the original. The problem is we now have a development, test, reporting or some other type copy of the source database running on the same storage as the source database. If the source database is an important database then our copy will be impacting the performance of the source database. Protecting the source database from performance impact it one of the huge factors for creating copies of databases.  If the storage snapshot is copy on write, then there is an even bigger impact as all writes will induce a read and two writes (read the original block, write it somewhere else, and then write out the new copy). To solve this we want to get the original database files onto separate storage. We can copy the entire database to a separate filer, call it the development filer, and then make or thin copies there. The next problem that arrises when someone wants a copy of the source database tomorrow and all we have is a copy from yesterday. In this case we have to copy across the entire source database to the development storage array which defeats the purpose of thin cloning. The purpose of thin cloning is to provide fast storage efficient clones. How do we solve all these complications using thin cloning technology?

 Solution to thin cloning : data virtualization

Thin cloning obstacles are solved using data virtualization.  Data virtualization consists of 3 technologies. The first technology continuously collects all the changes from a data source and writes them to storage capable of file system snapshots. The second technology manages the storage saving all changes in a time window and purging data older than the time window that is no longer needed. The third technology harnesses the file system snapshots and the time window to provision database thin clones to target machines either over fibre channel or NFS.  All of this technology can be rolled into a software stack that can run on commodity hardware and map a filesystem onto any storage. It can be cobbled together using a file system such as open source ZFS and scripting or it can be repacked in self contained software such as Delphix.

Data Virtualization is exploding

Four years ago data virtualization technology was non-existent. Since then hundreds of companies have moved to virtual data platforms . When looking at agile data technology, some of the key functionalities to look are are

  • Databases – support cloning major databases such as Oracle, SQL Server, PostGres
  • Applications  – support thin cloning application stacks
  • Self Service – interface should be easily usable by application developers, QA staff and BI teams to provision their own clones on demand.
  • Branching  – support branching clones meaning making clones of clones which crucial for supporting multi-version development or even just patching a previous version.
  • Synchronization – support cloning multiple related database such that each clone is cloned from the exact same point in time. Any Fortune 500 that has multiple revenue tracking databases will need to synchronize a clones of each for analyzing financial close discrepancies.
  • Cloud Ready – supports any storage and efficient low bandwidth replication across heterogenous storage types. Installs on any commodity hardware.
  • Any Time – make clones at any time down to the second with the timeflow window
  • Live Archive – save specific points in the timeflow for ever to support  compliance and auditing

Now that data virtualization is quickly maturing, the next frontier is quickly arriving which is data supply chain.

Data Supply Chain

It’s time to start treating your data less as a warehouse and more as a supply chain. Having identified your sources of data, you must corral it for analysis, in the same way that the various components come together on an assembly line. Recognize that the data won’t be static—it will be manipulated as it goes through the supply chain, added to other pieces of data, updated as more recent data comes along, and transformed into new forms as you look at different pieces of data in aggregate.      – Syed Rasheed Redhat

Data Supply Chain  features are quickly evolving but  include
Screen Shot 2014-05-21 at 8.08.47 AM
  • Security
    • Masking
    • Chain of custody
  • Self Service
    • Login and Roles
    • Restrictions
  • Developer
    • Data Versioning and Branching
    • Refresh, Rollback
  • Audit
    • Live Archive
  • Modernization
    • Unix to Linux conversion
    • Data Center migration
    • Federated data cloning
    • Consolidation

Data Supply Chain re-invents data management and provisioning by virtualizing, governing, and delivering data on demand.

As Accenture put it:

Yes, data technologies are evolving rapidly, but most have been adopted in piecemeal fashion. As a result, enterprise data is vastly underutilized. Data ecosystems are complex and littered with data silos, limiting the value that organizations can get out of their own data by making it difficult to access. To truly unlock that value, companies must start treating data more as a supply chain, enabling it to flow easily and usefully through the entire organization—and eventually throughout each company’s ecosystem of partners too.






Importing AWR repositories from cloned databases

June 26th, 2014

I sometimes want to import AWR repositories from cloned databases to compare performance differences between the databases which should have the same performance characteristics. Unfortunately the clones have the same DBID this importing their AWR repositories into a central AWR repository would be probelematic as I wouldn’t have a way to tell which database the data came from as the DBID is the same. So here is a procedure to change the DBID. THe procedure requires importing the data into the awr_stage user, changing the DBID, then moving the data into the AWR repository:
(see for info on normal exporting and importing of AWR repositories)

The script uses a  awr_change_dbid.sql to defer the constraint on DBID, change the DBID and then put the constraint back

   create tablespace AWR datafile '/home/oracle/oradata/AWR_01.dbf' size 200M;
   Drop Directory AWR_DMP;
   Create Directory AWR_DMP AS '/home/oracle/awr_dumps';
-- create staging user user
   drop user awr_stage cascade; 
   create user awr_stage
     identified by awr_stage
     default tablespace awr
     temporary tablespace temp;
   grant connect to awr_stage;
   alter user awr_stage quota unlimited on awr;
   alter user awr_stage temporary tablespace temp;
-- load data
     dbms_swrf_internal.awr_load(schname  => 'AWR_STAGE',
 				 dmpfile  => 'my_awr_dump', -- file w/o .dmp extension
                                 dmpdir   => 'AWR_DMP');
-- change dbid
-- move data
   def schema_name='AWR_STAGE'
   select  '&schema_name' from dual;
   variable schname varchar2(30);
     :schname := '&schema_name';
     dbms_swrf_internal.move_to_awr(schname => :schname);
-- verify the new DBID in repository
   col host_name for a30
   select distinct dbid,  db_name, instance_name, host_name from


Number one bottleneck in IT ?

June 25th, 2014

“Any improvement not made at the constraint is an illusion.” – Gene Kim paraphrasing “The Theory of Constraints”

What is the constraint in IT?

The constraints in IT are

  1. Provisioning environments for development
  2. Setting up test and QA environments
  3. Architecting development to facilitate easy changes in code
  4. Development speed
  5. Product management  input

Meaning, until the first constraint is eliminated it is pointless and even potentially counter productive to tune the following constraint.

The first constraint for most organizations to tackle is thus the speed and agility with which they can provision environments for development.

The above list comes from Gene Kim the author of The Phoenix Project. He  lays out these top constraints  in this interview Gene Kim interview.

In the interview Gene Kim talks about what causes the biggest delays in application development in IT.  He says, starting around minute 6:45

“I’ve been trained in the theory of constraints and one of the things I think is so powerful is the notion of the constraint in the value stream. What is so provocative about that notion is that any improvement not made at the constraint is an illusion. If you fix something before the constraint you end up with more work piled up in front of the constraint. If you fix something after the constraint you will always be starved for work.

In most transformations, if you look at what’s really impeding flow, the fast flow of features, from development to operations to the customer, it’s typically IT operations.

Operations can never deliver environments upon demand.

People have to wait months or quarters to get a test environment. When that happens terrible things happen. People actually horde environments. They invite people to their teams because the know they have reputation for having a cluster of test environments so people end up testing on environments that are years old which doesn’t actually achieve the goal.

One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need it.

After that it’s about test setup time and test run time one that is eliminated .

After that is eliminated it’s typically about architecture. How do we make changes that don’t require 15 other changes simultaneously. How do we create more looser couplings.

Then after that the constraint moves into development or product management.

It is a very technical cultural obstacle is just making available environments for people who need it whether it’s production development or tests.”




It’s about : Data Supply Chain

June 24th, 2014

There have been a number of questions coming my way about Delphix versus snapshot technologies. The capabilities of Delphix can be differentiated from snapshot technologies through the following hierarchy:

Screen Shot 2014-06-24 at 6.57.54 AM

  1. Data Supply Chain (Delphix approach to data management)
  2. Data Virtualization (end-to-end collection and provisioning of thin clones)
  3. Thin Cloning
  4. Storage Snapshots
On top we have the most powerful and advanced data management features that enable fast, easy, secure, audit-able data flow through organizations.
Screen Shot 2014-05-21 at 8.08.47 AM
Data Supply Chain is built on top of other technologies. On the bottom we have the minimal building blocks starting with storage snapshots.  Storage snapshots can be used to make “thin clone” databases. Storage snapshots have been around for nearly 2 decades but have seen little usage for database thin cloning due to the technical and managerial hurdles. Part of the difficulty with creating thin clones is that thin cloning requires work by multiple people and/or teams such as as DBAs, system admins, storage admins etc it takes to create the thin clones.
Screen Shot 2014-05-21 at 8.08.47 AM
In order to overcome the obstacles creating thin clones, all the steps can be optimized and automated in a process called data virtualization.
Screen Shot 2014-05-21 at 8.08.47 AM
Data Virtualization is just the first step in automation. The next step is adding all the processes, functionality and control to manage the virtual data which is Data Supply Chain.
Screen Shot 2014-05-21 at 8.08.47 AM
File system snapshots such as ZFS address the very bottom of the hierarchy, that is, they only manage storage snapshots. They have no automated thin cloning of databases. Without automated thin cloning of databases there is no end-to-end processing of data from source to thin cloned target i.e. data virtualization. With out data virtualization there is no data supply chain.
Screen Shot 2014-05-21 at 8.08.47 AM
Data Supply Chain  features, all of which are encompassed by Delphix, include
Screen Shot 2014-05-21 at 8.08.47 AM
  • Security
    • Masking
    • Chain of custody
  • Self Service
    • Login and Roles
    • Restrictions
  • Developer
    • Data Versioning and Branching
    • Refresh, Rollback
  • Audit
    • Live Archive
  • Modernization
    • Unix to Linux conversion
    • Data Center migration
    • Federated data cloning
    • Consolidation

Data Supply Chain re-invents data management and provisioning by virtualizing, governing, and delivering data on demand.

Most businesses manage data delivery with manual, ad hoc processes: users file change requests, then wait for DBAs, systems administrators, and storage administrators to push data from system to system, bogging down production applications, networks, and target systems with long load times. Data delays cost businesses billions a year in lost productivity and low utilization of systems and software resources.

As a result, there  an enormous opportunity to optimize data management. Data management can be optimized with  data supply chain yielding significant business impact:

  • Drive revenue, competitive differentiation with faster application time to market
  • Enable faster growth via better release management of enterprise applications
  • Improve customer intimacy, upsell, cross-sell with faster, more flexible analytics
  • Free budget for innovation by reducing IT maintenance costs
  • Reduce compliance risk through better governance, data security.

Businesses need to manage data as a strategic asset across their operations, applying the same rigor as supply chain optimization for manufacturing companies.

Data Supply Chain Transformation Process with Delphix

Delphix applies a three-step process to transform the data supply chain:

  • Analyze: survey systems, processes, teams across data supply chains
  • Transform: virtualize, automate data delivery with centralized governance
  • Leverage: drive business value via new data products, process optimization

Businesses typically manage multiple data supply chains simultaneously, all of which are targets for data chain optimization:

  • Compliance retention, reporting
  • Modernization, migration projects
  • Application projects and development
  • BI, analytics
  • Data protection.

Delphix re-invents the data supply chain with its Virtual Data Platform:

  • Install data engines in hours across all repositories, locations (including cloud)
  • Connect: non-disruptively sync data across sites, systems, architectures
  • Control: secure data, track release versions, preserve and prove data history
  • Deploy: automatically launch virtual data environments in 10x less space, time
  • Leverage data with self service refresh, reset, branching, bookmarks, integration.

According to an IDC study, Delphix pays for itself in IT savings, with an average payback of 4.3 months.




If I can’t picture it, I can’t understand it

June 23rd, 2014

“If I can’t picture it, I can’t understand it.” – Albert Einstein

Along the same lines

“It is impossible to even think without a mental picture.” – Aristotle:

“Man’s mind cannot understand thoughts without images of them.” – Thomas Aqunias:

“The evolution of images is a kind of intermediate between that of the perceptions and that of the intelligence.” – Jean Piaget:

“Mathematics is cognitive process-thinking-that requires the dual coding of imagery and language. Imagery is fundamental to the process of thinking with numbers. Albert Einstein, whose theories of relativity helped explain our universe, used imagery as the base for his mental processing and problem solving. Perhaps he summarized the importance of imagery best when he said, ‘If I can’t picture it, I can’t understand it.’ “– NANCI BELL AND KIMBERLY TULEY

One of my favorite examples of the power of graphics to easily, quickly and powerful display quantitative information is Anscombe’s Quartet.


Anscombe’s Quartet
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89


Edward Tufte uses this example from Anscombe to show 4 datasets of x and y that have the same mean, standard deviation, and regression line, but which are qualitatively different.




Oracle EM 12c ASH Analytics

June 13th, 2014

Was emailing with my esteemed college John Beresniewicz at Oracle in the OEM group. John and I worked together on OEM 10g and thank goodness he is still there as he is generally behind any good quantitative visualizations you might see in the product. Here is one cool example he sent me:

The database load, AAS, can be time selected and from the selection a load map is shown, in this case of which objects are creating the most I/O load, group by type of I/O. Super cool. Congrats JB and I look forward to exploring more in OEM 12c “cloud control”.

Notes from John Beresniewicz

This screenshot is basically the “punch line” from a demo I did at Oracle Openworld this year of the “ASH Analytics” feature that was introduced in EM 12c.

ASH Analytics starts with a view of the “Activity” tab that looks almost identical to the existing “Top Activity” page from EM 10, but with some major improvements. First improvement is the more flexible time picker chart at top of the page, it can be re-sized to any width kind of like the time pickers in Google Finance charts. Below the time picker is the detail region which includes an activity chart and two “top-lists” that default to top sessions and top SQL as in Top Activity. The activity chart defaults to a view of Average Active Sessions broken down by wait class as in Top Activity, but this can be modified break down the load by any of the numerous ASH dimensions that ASH Analytics exposes for analysis.

The dimension selection drop-downs expose a structured list of ASH dimensions but also a small graphical widget showing load breakdown summary for some of the most commonly useful dimensions. This is to enable efficient interactive exploration by exposing dimensions that have high level of skew (interesting) or ones without any breakdown (uninteresting, like instance_id in a single instance database.)

In the activity chart clicking on a band will apply a filter to select out only the load corresponding to that specific value of the dimension shown. So for instance in the view shown here the load was filtered to only show “User IO” wait class activity. The filtered activity is then presented in the activity chart along with a dark line to show the total activity, so user is kept aware of how much of the total load is being analyzed. The dimensional breakdown/filtering cycle can be repeated to further zero in on load of interest, so for instance we could have broken down the “User IO” activity by wait event and then filtered out only “db file sequential read”.

Now for the punch line, which is the “Loadmap” view that you’ve posted here. The loadmap is a treemap visualization of database load over a dimensional hierarchy of up to three dimensions selected by the user. Cell size is always Average Active Sessions (of course) and colors are selected based on the final cell dimension value (so all the “SALES” cells are same color) There is algorithmic color selection for most dimensions, but if wait class is the final dimension then we use the existing coloring scheme developed for the Performance Page and Top Activity to keep that scheme “branded”. Reading treemaps can take a bit of practice, but it’s really quite simple: hierarchical relationships are represented by rectangle containment, i.e. the load for a child dimension will appear as a rectangle within the rectangle representing the parent dimension.

So in the demo I selected a time period of interest because of unusually high IO using the time picker. In the Activity view broken down by wait class I isolated the User IO load by applying a filter. Switching to the Loadmap I broke down the User IO waits hierarchically by wait event > SQL plan operation > object. And the punch line is that we can eyeball that over half the load in that time period was “db file parallel read” all generated on behalf of “TABLE ACCESS BY LOCAL INDEX ROWID” execution plan steps almost all of which were done against the “SALES” table (which is partitioned and thus appears as multiple cells, one for each partition.)

So ASH Analytics is a pretty powerful tool for interactive visual exploration of ASH data. I sure hope customers like it because ASH treemaps is an idea I researched and became convinced of in early 2007 so as you can see it took a long time to finally get it into the product.

More info

for more info see Niall’s blog at
Install guides at
download at


Compliance, Reporting & Dodd Frank

June 12th, 2014

What is one of the most daunting issues in IT departments now?  What kind of database issue can end people up in orange jump suits and yet how to avoid clearly avoid issues is unclear. Regulatory compliance.  Regulatory compliance is driving many Fortune 1000 and world wide corporations IT departments over the edge of their abilities. With regulatory compliance, how much data do you have to keep, how long do you have to keep it, how fast do you have to be able to access it and what technology do you use to maintain it?

Fortune 1000 companies in order to meet compliance demands typically  must collect, model and report on data from multiple systems systems, in multiple sites

  • Collection: adds load to production systems; require extract, staging in every site
  • Modeling: require multiple development, test environments, data refreshes
  • Reporting: compliance reports drive unexpected and unplanned load
  • Reconciliation: required to reconcile data between systems e.g. GL vs. source apps
  • Time-based reporting: regulations require reports from specific dates in past
  • Auditing: best practicing to capture data from report date + report for later auditing

Delphix eliminates the data constraints, accelerates reporting process

  • Quickly provision fresh or historical data from any point in time
  • Eliminate production load by offloading ETLs to VDBs
  • Elastically provision multiple environments for development and  test teams
  • Reduce data tax on systems and networks by 90%
  • Reset, rewind, rollback any system reduces reconciliation effort, time
  • Bookmark environments to facilitate reporting scenarios, audits

For example if a company has several source systems with financial information that are being rolled up into the companies general ledger (GL) then if there are any discrepancies between the source systems and the final roll up into GL then the have be reconciled.

Struggling to Keep up with Regulatory Compliance Overload? 


Live Webinar: Streamline Compliance Initiatives with Agile Data 


The tidal wave of regulatory compliance is overwhelming for any large organization today. At the end of the day, data is the ultimate target of most mandates. Not surprisingly, data related bottlenecks also represent one of the biggest barriers to cost effective risk mitigation and compliance.

Date: Tuesday, June 17

Time: 10:00am PT/1:00pm ET

Speaker: Ansh Patnaik, Sr. Director of Product Marketing, Delphix

Join this session to learn how eliminating data constraints can streamline compliance initiatives through:
  • Data Security: mask data centrally and distribute secure copies on-premise, offshore, and into public or hybrid clouds in minutes
  • Data Governance: preserve application states at a fraction of current costs while ensuring on-demand accessibility
  • Data Reporting: deliver synced virtual data sets from any point in time to offload sources and accelerate delivery of reports

Click here to register for our webinar.

Screen Shot 2014-05-28 at 5.11.49 PM


Oracle EM errors after 11gR2 install

June 11th, 2014

Had a machine running the default database from the Oracle installer. This instance was called o1123.
I then created a second database by hand (with for swingbench and slob tests) instead of dbca (maybe that was a mistake) and then I wanted to acccess OEM. The second instance was called SOE60G ( a 60GB swingbench dataset database)

First tried to start up dbconsole:

$ emctl start dbconsole
OC4J Configuration issue.
not found.

OK, missing file. EM wasn’t configured for this database, make sense. Let’s configure it

$ emca -config dbcontrol db -repos create
SEVERE: Dbcontrol Repository already exists.  
Fix the error(s) and run EM Configuration Assistant again in standalone mode.

Looks like there is already a repository, so lets configure EM with out making the repository

# emca -config dbcontrol db
WARNING: ORA-00942: table or view does not exist
SEVERE: The existing database you have specified has Grid Control repository. 
This operation is not supported.

Hmm, what table is missing? Put db in sql trace mode to see what SQL statement ran into the 942 error

sqlplus / as sysdba
alter database set sql_trace=true

rerun and grep for 942 in trace files in user_dump_dest

cd /home/oracle/oracle1123/diag/rdbms/soe60g/SOE60G/trace
grep 942 *

this show a dozen or so files.
Open each one up and searched for 942 and found two distinct SQL queries:


select count(*) from sysman.mgmt_versions where status > 0 and component_mode
like 'SYSAUX' and component_name in ('DB','CORE')

The first table is from not having run $ORACLE_HOME/sqlplus/admin/pupbld.sql
Second is the problem emca was running into. Let’s look at sysman’s objects and see what’s there.

sqlplus sysman/sys
select object_name from user_objects;
-> no rows returned

OK, sysman looks empty , let’s drop it and recreated it

sqlplus / as sysdba
drop user sysman cascade;
drop user MGMT_VIEW cascade;

someone’s blog had suggested dropping the following but I didn’t

# drop role MGMT_USER;
# drop public synonym MGMT_TARGET_BLACKOUTS;
# drop public synonym SETEMVIEWUSERCONTEXT;
#                                                        host      port  SID
# $ORACLE_HOME/sysman/admin/emdrep/bin/RepManager antarctic 2483 sprod -action drop
# ./emca -repos create

rerun emca

$ emca -config dbcontrol db -repos create
Do you wish to continue? [yes(Y)/no(N)]: y
Apr 20, 2013 7:27:51 AM oracle.sysman.emcp.EMConfig perform
INFO: This operation is being logged at 
Apr 20, 2013 7:27:51 AM oracle.sysman.emcp.ParamsManager checkListenerStatusForDBControl
WARNING: ORA-01031: insufficient privileges
Apr 20, 2013 7:27:51 AM oracle.sysman.emcp.EMConfig perform
Database connection through listener failed. Fix the error and run EM Configuration Assistant again.
Some of the possible reasons may be:
1) Listener port 1521 provided is incorrect. Provide the correct port.
2) Listener is not up. Start the Listener.
3) Database service SOE60G is not registered with listener. Register the database service.
4) Listener is up on physical host and ORACLE_HOSTNAME environment variable is set to
 virtual host. Unset ORACLE_HOSTNAME environment variable.
5) Listener is up on virtual host. Set environment variable ORACLE_HOSTNAME=<virtual host>.
6) /etc/hosts does not have correct entry for hostname.

Notice the ORA-01031. Hmm, I can connect as sqlplus / as sysdba but connecting as sysdba through the listener requires a password file.  Let’s see if password files are set up

sqlplus / as sysdba
show parameters remote_login_passwordfile
remote_login_passwordfile            string      EXCLUSIVE

that’s set correct, but is there a password file?

ls orapw$ORACLE_SID
ls: cannot access orapwdSOE60G: No such file or directory

not there, let’s create it


now emca works !

Now what URL to I use to access OEM for the second database?
It’s the same URL as the first database but with different port. We can find the ports for each database in the following file

cat $ORACLE_HOME/install/portlist.ini
Enterprise Manager Console HTTP Port (o1123) = 1158
Enterprise Manager Agent Port (o1123) = 3938
Enterprise Manager Console HTTP Port (SOE60G) = 5500
Enterprise Manager Agent Port (SOE60G) = 1830

Post script

After the above steps, one of the bizarre things I ran into was being able to  access OEM database page in Chrome but nothing else. Coming into OEM, I’m first asked for a login. I login as system, I see the database home page, then if I click on the performance tab, then I get prompted for the login again. I then  login again, and I’m back at the database home page. No matter what tab I click, I end up back at the database home page after giving login credentials.
In Firefox, though once I login, I can go to any of the other tabs and it works!

PS when running emca above I was prompted for a number of variables.  didn’t try this, but looks like all the arguments can be given in the command line

emca -repos create -silent -ORACLE_HOSTNAME perf234-target1 -SID SOE60G 
-SERVICE_NAME SOE60G -ORACLE_HOME /home/oracle/oracle1123/product

 Post Script II:

Never seen this one before and it was super annoying. I set up a VM in Virtual Box on my Mac with LINUX and Oracle and OEM. On the VM I could access OEM but not from my Mac. On my Mac I kept getting

This webpage is not available

The connection to was interrupted.
Turns out the problem was that I was using “http://” and not “https://” , rrr



Version controlled databases and data

June 10th, 2014

To drive revenue and growth, companies are constantly improving existing applications or creating new ones. This ongoing application development depends upon provisioning environments for developers and QA teams. Once they are up and running in these environments, code development calls for the efficient change management and later deployment of changes.  The slower and more costly provisioning and managing the development environments becomes, the more delays and bugs there will be in the applications, and the less revenue the business will generate.



Code management has become straightforward with the use of source control tools such as Git, SVN, and Perforce. Provisioning development environments has been made more efficient by Chef, Puppet, and Jenkins.

But what about the database? How can you provision environments with multi-terrabyte databases, then version control these databases as changes are made by multiple developers and as multiple versions of code and database states need to be maintained? For all the advances in application development, such as machine virtualization, Agile development, and DevOps practices, the database still stands as a huge barrier to agility.

Database branching, development, merges, and deployments are tricky. Unlike code, a database is not a collection of files. Following best practices using proven file-based tools often fails when dealing with the database, which is  the container of your most valued asset – your business data. Too often the database is left behind, and becomes the weakest link in the chain.


Proper implementation of continuous integration and continuous delivery for database is here at last!

Delphix and DBmaestro have joined forces to deliver a comprehensive solution to get your database up to speed with modern best practices. The Delphix-DBmaestro solution enables you to spawn development database branches on-the-fly, creating parallel team-centric virtual-DB branches, or developer-centric virtual environments with a click of a button.

Enforced version control makes sure all changes are tracked, so you always have full control over who is doing what, where, and why.

Once a development cycle has concluded, relevant changes are easily and automatically implemented to the integration environment, identifying change conflicts and merging them manually or automatically based on your preferences.

Additionally, making sure a release is ready for production has never been safer, as no changes are overlooked, no code is over-ridden, and most importantly, all relevant implemented changes are validated with a virtual pre-production copy of the actual production environment – true to its latest data, without incurring any storage penalties or wasting time creating these environments.


With Delphix and DBmaestro, development productivity, automation, and risk mitigation are easier to achieve than ever before.

Screen Shot 2014-05-29 at 10.17.27 AM


Screen Shot 2014-05-29 at 10.17.34 AM

Delphix Agile Data


Delphix is radically changing the way companies deliver their data – providing fast, flexible, efficient access. Delphix enables companies to clone their databases, no matter the size, in just minutes with almost no storage overhead. Delphix allows for Agile data management through intelligent software that eliminates redundant infrastructure and slow processes.

As a result, Delphix customers deliver higher quality business applications in less time and at lower cost. Today, IT organizations regularly make compromises to reduce costs, limiting the ability to support project teams. The result is delayed projects, missing features, and frustrated users. Delphix breaks this cycle by enabling IT to deliver unlimited data wherever and whenever it is needed.

Screen Shot 2014-05-29 at 10.17.39 AM

Unlock Your Data


DBmaestro DevOps for DB

DBmaestro is the pioneer and leading provider of DevOps for Database solutions that enable full control of the database. Its flagship product, DBmaestro Teamwork, is the leading Database Enforced Change Management (DECM) solution, empowering Agile team collaboration by enforcing change policy and best practices.

DBmaestro Teamwork enables continuous integration, continuous delivery, and continuous deployment processes which eliminate the number of potential risks threatening database development and deployment by 60%, and reduce deployment costs by 95%.


Take Control of Your Database