Archive for June, 2014

Financial close pressures and performance

June 9th, 2014

How long does your financial close take? How long would you like it to take? How much access do your internal business analysts have to the the financial data?

We’ve worked with a number of companies and taken their financial close down from weeks to a couple of days by using data virtualization and the Delphix appliance.  Using Delphix and data virtualization we’ve also thrown the constraints off of limited access to financial data allowing business analysts 24×7 access even in the days ramping up to quarter ends.

Problems we see

  • Analysts get locked out of financial systems near quarter closes to alleviate pressures on the financial systems
  • Businesses want to run early financial closes to determine errors early and make sure everything is correct but they don’t have the resources
  • Analysis want access to BI systems with fresh data from production financials but refreshes are too costly in terms of time, load, personnel and load on productions systems

Delphix solves this by providing fresh copies in minutes for almost no storage with a self service user interface that even analysts can use themselves.  For copies being provided to development teams, Delphix integrates masking to hide sensitive data.

Frequent access to financial production systems is often a big challenge. For FI/CO applications,  companies tell us that their desired refresh frequency is once a week in the early part of the quarter, once a day in the 3rd week, 6 times a day in the last week! How do you do that when refreshes take hours to days of effort? Or when the source financial systems of record are at peak usage late in the quarter? Still it’s even worse, if financial analysts want multiple environments to analyze financial results in parallel.

The entire problem is even more pronounced when results have to be aggregated across locations which is the case with many young rapidly growing startup companies expanding into new countries and data has to be pulled in from each local FI/CO source.
Another challenge is around early disclosure of results. Fixes for FI/CO system bugs are usually a high priority for application development teams. Say you’re in the middle of a quarter and you find a bug that would lead to misstating financial results (SOX issue). The business side of the company wants the problem fixed but they don’t want the developers to test the fix against real data because they would have early visibility into financial results. Imagine the insider trading value that would have! So they want to solve this with data masking.

Finally comparison reporting poses a challenge. Annual and quarterly reporting requires comparative analysis: this qtr vs. last qtr  – and – this yr vs. last yr etc. The internal analysts know that the street is going to look at such trends so they do the same proactively and then explain the trend during analyst calls. Once again Delphix becomes very useful in quickly spinning up these point in time environments.

Financial close means doing

  • month-end,
  • quarterly and
  • year end

closes with both external financial accounting (FI)  and internal financial and management control accounting (CO) .

External Financial (FI) Close


External financial close (FI) is relatively straight forward . The first step of FI close is do a trial balance close. Trial balance close involves closing accounts payable (AP) and accounts receivable (AR) and then manually perform journal entries (JE) to zero everything out.  Delphix accelerates the JE piece by allowing users to perform different types of JEs and reset the data should things don’t reconcile and need to repeat.  The reconciling process is particularly complicated when the company is multinational or when the company has different divisions.  Delphix speeds up that process and make it clean.

Internal Financial Control (CO) Close


Financial close also involves closing the book internally (CO).  Internal accounting is often overlooked but is actually more important for the company.  Internal accounting means all the cost centers and profit centers need to be fully allocated and reconciled.  Delphix speeds up the process by setting up multiple alternate Cost/Profit Center Hierarchies, running different cost allocation scenarios to understand exactly how the business behaves. The internal financial (CO) is fundamentally more important than FI as FI really is meant to comply to reporting and audit requirements, whereas CO actually shows the health of the business.


How Delphix Improves Financial Closes

Delphix addresses both external and internal accounting and speeds up the closing process by

  1. Creating multiple “play areas” for accountants to run different trial balance JEs to get to the final statements
  2. Help reconcile both FI and CO by fully allocating revenues and expenses in cost accounting and allow controllers to run various “what if” analysis to truly understand the health of the company.

One company only ran their financial close at the standard times. After Delphix they were ran their financial close frequently a to find and correct discrepancies such that at the real financial close time, the close could be accomplished in a day or two.

For example if a company has several source systems with financial information that are being rolled up into the companies general ledger (GL) then if there are any discrepancies between the source systems and the final roll up into GL then the have be reconciled.

Delphix can provide a sand box exact copy of GL and the source application data can be rolled into GL, fixed, rolled back, and tested again as many times as necessary. The provisioning of the GL clones using Delphix only takes minutes and they can be rolled back in minutes.

Screen Shot 2014-06-03 at 10.42.51 AM


For example, if there are multiple sources of revenue they have to aggregated into General Ledger. If there are discrepancies, then they have to be ironed out. Ironing out the discrepancies requires copies of the source systems, but making these copies requires time, resources and personnel and these trial environments are limited.

Screen Shot 2014-06-03 at 4.58.54 PM


With Delphix it only takes minutes and almost no storage to spin up copies of the source systems and to provision multiple copies for testing and reconciliation.


Screen Shot 2014-06-03 at 4.48.12 PM


  • Faster financial close lets CEO, CFO, and company know exactly where they are earlier in the cycle after quarter end
  • Allows for more prep time before quarterly earnings call (e.g. time to do damage control)
  • Enables them to make business changes earlier/faster
  • If they can master the close, they can move to trial closes WITHIN the quarter
  • That will give them even more data about how they might drive material changes to operations and spending (e.g. corrective damage control)
  • For some companies (e.g. some types of insurance companies) they have to estimate revenue earnings; here the quality of data and estimate #s are VERY material to stock price and therefore company credibility/volatility.
Financial close timelines will not improve perceptions on the street; they are unlikely to be street visible. But the positive and negative business implications ARE street visible.




Oracle EM : connect to database broken

June 6th, 2014

OEM just seems to have too many brittle esoteric configuration files and process dependencies. Ideally I just want to connect with system/password and go. Is that too simple to ask for?

Today I tried out OEM and got the general broken page:

And my first reaction was just to give up and move on, but then I noticed the error message sounded some what simple:

ORA-28001: the password has expired (DBD ERROR: OCISessionBegin)

Hmm, maybe this *is* easily fixable. Well guess again. Luckily someone has well documented the fix

Read the above post for details but basically you have to find  ORACLE_HOME/<HostName_SID>/sysman/config/  and set



After the “fix” EM still wouldn’t work because DBSNMP’s password had expired as well, but at least it gave a password screen to change the password. So to simply launch OEM  to get to the performance page after my system password had expired I had to change the passwords for not one but 3 users:  SYSTEM, SYSMAN and DBSMP which is crazy enough, but the worst part is that SYSMAN’s password is hardcoded, though encrypted,in a flat file that has to found (good luck) and edited.

PS, to turn off password expiration which is on by default in 11 but was off by default in 10:

alter profile default limit password_life_time unlimited;




Oracle’s SQL Tuning pack, part III

June 5th, 2014

How does the STA work in 11gR2 with the query from “Oracle’s SQL Tuning Pack – part II″ ?

In Part II, the STA in 10g proposed a new profile for the query but that profile actually caused the query to run slower. Quetion is, in 11gR2 does the STA do better?

Below I ran the query load, identied the query, submitted to STA and STA spent 30 minutes burning CPU trying to tune the query and finally ended with an error that a better plan could not be found.

Here is the view on the Top Activity Page:
The first spike is when I ran the query. I pass the gray window over the spike and selected the first query and submitted it to the tuner. The rest of the load, the next 30 minutes is the STA running trying to find a better plan. After 30 minutes the STA times out saying not better plan found.
On 10gR2 the STA finishes in less than 2 minutes and finds a profile though the profile puts a higher load on the database for this query. On 11gR2, the STA runs for 30 and times out without finding any suggestions. Yet in DB Optimizer a new plan is found in minutes that runs faster than the original. So a better plan was out there, relatively easy to get yet STA in 11gR2 could’t find it in a half an hour. I guess that’s better though than suggesting a worse profile as it did in 10gR2.

Here is a video on Vimeo of the tuning with DB Optimizer which takes less than 60 seconds to find a faster plan:


Oracle’s SQL Tuning pack, part II

June 4th, 2014

Above is the AAS load produced by three runs of the same batch. There are 3 distinct humps labled ORIGINAL, STA TUNED and DBO TUNED . The load is a single  query being run by 10 users 20 times each with a sleep of random interval between 1-2 seconds.
The graph measures the load on the database. Green is CPU, blue is IO and the redish brown is concurrency.
First, on the left is the load of the original default execution plan.
Second, in the middle is the load after applying a profile from the SQL Tuning Advisor (STA).
Third, on the right is the load after having been tuned by DB Optimizer.
All this goes to show that the STA’s profile can be sub-optimal even when a better plan exists.
Here is the view of the same load from the “Statistics” tab of the SQL Details in OEM 10g.
First, dark blue, is load from the default path.
Second, light blue is the load from the STA suggested profile.
Third is the load after tuning the query with DB Optimizer.
Here is the query, actually a Peoplesoft query,

For the courageous or bored with idle time, the export of the data and SQL  statement are available at

and for kicks, the query’s VST diagram
Couple of thoughts questions in my mind.
First if the reason for picking sub-optimal path in the first place is due to a bug, then it would make some sense that the the STA would still have the same bug and make the same mistakes.
Second, I wonder if the STA is really trying to find better plans, or if it is trying to pull the statistics used by the optimizer more in line with the actual statistics that would happen when running the query along the lines of TCF or tuning by cardinality feedback. When I look at the hints in profile the are of the nature of skewing stats one way or another:
OPT_ESTIMATE(@"SEL$485D066A", TABLE, "C"@"SEL$1", SCALE_ROWS=4.412144743)
OPT_ESTIMATE(@"SEL$485D066A", JOIN, ("B"@"SEL$1", "A"@"SEL$1"), SCALE_ROWS=12.28434396)
OPT_ESTIMATE(@"SEL$485D066A", JOIN, ("B"@"SEL$1", "VW_SQ_2"@"SEL$09936D23"), SCALE_ROWS=3.049755426)
OPT_ESTIMATE(@"SEL$485D066A", JOIN, ("C"@"SEL$1", "B"@"SEL$1", "A"@"SEL$1", "VW_SQ_2"@"SEL$09936D23"), SCALE_ROWS=15.63082791)
OPT_ESTIMATE(@"SEL$485D066A", JOIN, ("C"@"SEL$1", "B"@"SEL$1", "A"@"SEL$1"), SCALE_ROWS=5.133606627)
OPT_ESTIMATE(@"SEL$485D066A", JOIN, ("B"@"SEL$1", "A"@"SEL$1", "VW_SQ_2"@"SEL$09936D23"), SCALE_ROWS=37.50606343)
OPT_ESTIMATE(@"SEL$B186933D", TABLE, "C"@"SEL$1", SCALE_ROWS=4.412144743)
OPT_ESTIMATE(@"SEL$B186933D", JOIN, ("B"@"SEL$1", "A"@"SEL$1"), SCALE_ROWS=9.626410845)
OPT_ESTIMATE(@"SEL$B186933D", JOIN, ("E"@"SEL$1", "D"@"SEL$1", "C"@"SEL$1", "B"@"SEL$1", "A"@"SEL$1"), SCALE_ROWS=3.896066538)
OPT_ESTIMATE(@"SEL$B186933D", JOIN, ("E"@"SEL$1", "B"@"SEL$1", "A"@"SEL$1"), SCALE_ROWS=9.626410845)
OPT_ESTIMATE(@"SEL$6E19F182", TABLE, "C"@"SEL$1", SCALE_ROWS=4.412144743)
The above profile “hints” are mostly concerning scaling the expected rows.
Speaking of TCF it’s interesting to note that in DB Optimizer the TCF values of estimate verses actual come out closer in the optimize query verses the original.
Below are the explain plans for the default plan, plan with profile and the DB Optimizer tuned plan. Notice the “Orders of Magnitude” column. This column shows the orders of magnitude that the estimated rows differed from the actual rows per line in the execution plan. The larger the difference the bigger the discrepancy between what the optimizer expected and what actually happened
Default Plan and TCF
Profile Explain and TCF
DB Optimized Plan and TCF
Ironically the plan using the profile has the worst TCF discrepancies.


Oracle’s SQL Tuning pack, part I

June 3rd, 2014

Have you used Oracle’s SQL Tuning Pack? What were your experiences like?
At my last consulting gig 2 out 3 statements I “tuned” with the SQL tuning pack actually ran significantly more slowly after applying the profiles recommended. I thought it was just my bad karma and luck of the draw until recently. Recently I’ve started asking other DBAs what their experiences were and generally finding less than enthusiastic responses. In the classes on Oracle 10g and OEM, I’ve always cautioned that profiles should not be blindly trusted and give an example using a Swingbench query where the SQL runs more slowly after the profile.If applying the profile, I recommend to jump immediately to the SQL performance page and to verify the effect of the profile on the load of the query.

Above is a page from OEM after the SQL Tuning Advisor has been run, a profile is suggested, and now I can implement the profile.

After the profile has been implemented, I’d recommend clicking on the SQL_ID to see go to the SQL details page and see what the new execution plan looks like on the load chart

In the above SQL details we can see the new execution plan hash 467377872 and that it’s load is now higher. The number of executions by the application, swingbench remained the same, but the load by the SQL statement with the new profile is higher.
At this point I’d click on “Tuning Information” and back out the profile.

I thought the above example was a fluke along with the couple of bad experiences I had at a customer site but after asking a number of other DBAs who have used the SQL Tuning Advisor and who have responded less than enthusiastically, I’m now curious to poll a wider audience.

A couple of thoughts:
One, I think the reason the STA’s profile created more load is because the application, swingbench, actually only uses one row from the query result set and ignores the rest of the result set. I’m guessing that the STA tuned the query for the entire result set.
Two, it seems to be a serious flaw that the STA doesn’t actually run the query with the new profile to compare the before and after. At least this should be offered as an option.


It’s Not About the Snapshot

June 2nd, 2014

I hear this from time to time:

“Can’t I create a database copy in seconds with a file system snapshot?”

First let’s take a step back. There is a huge need to clone databases quickly, easily and with as little resource as possible for development, QA, UAT, debugging, reporting and backup yet hardly any of the industry uses file system snapshots for cloning. Cloning with file system snapshots has been a possibility for almost 20 years.    Why, if there is a solution to a huge problem, is no one using it? Because it requires specialized hardware, storage system experts coordinating with DBAs and system admins and a lot of scripting. (Unless using Delphix)

It’s a bit like saying I can make a copy of the database using the UNIX “cp” command.  Any good DBA knows using “cp” to backup a database is  naive. Why? Because either I have to shut the database down or I have to put the database in hot backup mode and then I have to be able to get a list of all the datafiles, control files and redo files. The cloning process must also perform prerequisite tasks like producing startup parameter files, database definition files, or other pre-creation tasks for the target database to function. For example, Oracle requires control files, password files, pfile/spfiles, and other pre-created components before the target can be opened for use. Controlfiles, logfiles, and datafiles together constitute a database that can be opened and read by the appropriate DBMS version. Most of the required task which will be done automatically with RMAN. RMAN will also do many things that only RMAN can do such as skip empty blocks, run incremental backups instead of having to copy the entire database  and do corrupt block checking at backup time. Similarly cloning databases from file system snapshots has similar complexities and limitations.

Can you create a thin clone database for almost no storage from a file system snapshot?

Yes! Definitely.

Can you provision a database copy in seconds with a file systems snapshot?

No :(

Longer answer is you might be able to do it in minutes if you were an expert system admin, storage admin and Oracle DBA,  you didn’t mind having some serious drawback to your clone and impact on production, you had all the resources you needed and  you had everything aligned perfectly. (see comments on previous blog post for a start on how you can do this yourself)

Lets get into the real world of actually using storage system snapshots. Talking to one of the largest video game makers, a Fortune 500 company and one of the top  software companies in the world  about their database cloning, I asked them “when you use Netapp storage snapshots, how long does it take you to create a database clone from the snapshots?” I expected the answer to be in the minutes and was shocked to hear that it took them respond with 2-4 days.  My response was “what?! 2-4 days? why?”  Their response was that if everybody required was in the same room working together maybe they could do it in 2-4 hours but because each stage of the process was handled by a different person or group such as the storage admin, system admin, DBA etc that the total process ended up taking days.

Taking 2-4 days to provision a database from a file system snapshot is the good case. I asked another subsidiary of one of the worlds largest software companies how long it took them to create a database clone with file system snapshots and their response was 3 months! The reason give was the amount of bureaucracy that one had to go through to the clones.

Why does it take so long to clone databases with file system snapshots? There are two reasons

  • bureaucracy
  • technical challenges


Depending on your company the more or less bureaucratic steps you will have to get a thin clone database allocated. If you are a DBA, Storage, Systems guru at a small company that might be you, and if so bravo, you can do it pretty  quick. On the other hand if you wear all those hats, you are probably a “Brent” as in Brent in The Phoenix Project who is the crucial person in IT and most critical IT processes grind to a halt because they depend on him and he is super busy.

Screen Shot 2014-05-23 at 4.01.39 PM


Why does it take so long to pass tasks between people and  groups? Because a task that might take an hour when someone is completely free and idle will take multiple days as that person starts to be 95% busy or more. See the following chart from the book The Phoenix Project:

Screen Shot 2014-05-23 at 4.02.15 PM


Technical Challenges

Screen Shot 2013-11-11 at 8.51.06 PM


The easiest way to create a clone is to snapshot the production storage. To snapshot the production storage, either shutdown the source database, take a snapshot or more  likely put all the tablespaces in hot backup mode, take a snapshot, and then take all of the tablespace out of hot backup mode. If the database spans more than one LUN it may take special storage array options to snapshot all the LUNs at the same point in time. Once the all the database LUNs are snapshot, then you can use the snapshots to create a “thin clone” of the production database on the same storage as production. If you are using EMC then most likely this is going to hurt write performance as snapshots on EMC typically use copy on write, which entails a read and two writes to manage the file system snapshot. Also on EMC one is typically limited to 16 snapshots and going over that limit incurs a full copy of the LUNs upon snapshot. On EMC typically snapshots can not themselves be snapshot, preventing branching of clones. On Netapp there is no write performance penalty,  snapshots can be taken of snapshots  and the limit on number of snapshots is 255.  The question is why limit oneself to 255 snapshots when they can be made unlimited with ZFS or with Delphix DxFS.

Problem with this scenario no matter what storage you use is that the clone is doing I/O on the same LUNs as production.  The whole point of cloning production is to protect production but in this case the clone’s I/O will be hurting production. Ooops

Screen Shot 2013-11-11 at 8.51.31 PM

What we want to do is somehow get a copy of production onto some non-production storage where we can snapshot it. This means making a full physical copy of production onto a “development filer.” Once a copy has been made we can make clones by snapshoting the copy. These snapshots then require configuration to make them available to target machines either  over fiber channel or mounting them over NFS and then recovering the database on the target machines.

Problem with this scenario is that what if tomorrow we want a clone of production as it is that day? Currently we only have the copy from yesterday, thus we have to copy across the whole copy of production onto the “development filer.” Continually copying the source each time we need a clone at a different point in time defeats the purpose of creating thin clones in the first place.

What can we do? Netapp offers a feature called Snapshot Manager for Oracle (SMO) that is suppose to copy across file system snapshots from production filer to the development filer.  One problem is it’s complex and difficult to debug problems


Screen Shot 2013-06-05 at 8.11.09 AM

What alternatives do we have?

Well for EMC one could use SRDF if both production filer and development filer were Symmetrix or VMAX Symmetrix filers. SRDF will play the changes from production on to development, but development will still be limited to 16 snapshots and surpassing that limit results in a full copy. We have customers running up to 400 virtual databases on a single set of storage at Delphix and we have customers taking snapshots as fast as every 5 minutes.  That wouldn’t work very well on EMC nor Netapp.

For ZFS one could use Oracle data guard to relay changes from production onto the development array. It makes sense in theory but haven’t seen anyone users clearly document actually usage.  The best document I’ve seen on it is an old Oracle white paper but I’ve yet to find a company doing this.


All of the technical solutions require specialized expertise, manual intervention and scripting. Scripting for oneself can be relatively straight forward, but making those scripts production ready and runnable by anyone is another matter. As anyone who has written scripts or software to be used by someone else, it is orders of magnitude harder.

Ways to do it yourself

 Automation and Integration


Snapshots are the fuel and Delphix is the Ferrari.

Thin cloning obstacles are solved using Delphix.  Delphix consists of 3 technologies. The first technology continuously collects all the changes from a data source and writes them to any storage. The second technology manages the storage implementing snapshot capabilities, saving all changes in a time window and purging data older than the time window that is no longer needed. The third technology harnesses the file system snapshots and the time window to provision database thin clones to target machines either over NFS.  All of this technology is rolled into a software stack that can run on commodity hardware and map a filesystem onto any storage.

Delphix  takes care of the syncing  with the source database which means continuously pulling in changes from the source database.  Delphix  takes care of many other things automatically such as snapshotting the storage, cloning the storage, compressing the storage and then provisioning the thin clone databases which means exposing the file system on the data virtualization tier to the hosts that run the thin clones such as renaming the clones, setting up the startup parameters and recovering the database.

Screen Shot 2013-11-11 at 7.38.02 PM

Delphix technology  has 3 core parts

  1. Source sync
    • initial full copy
    • forever incremental change collection
    • rolling window of save changes with older replace data purged
  2. DxFS storage on Delphix
    • storage agnostic – maps snapshot file system onto any storage
    • compression – typically compress data by 1/3
    • memory sharing of data blocks (only technology AFAIK to do this)
  3. VDB provisioning and management
    • self service interface
    • rolls, security, quotas, access control
    • branching, refresh, rollback

Each of these 3 parts has important features.

1. Source database copying

Delphix not only copies the source database at initial setup time but we also continuously pulls in the changes to  from the source database such that one can create virtual databases from the source at different points in time. Pulling in  changes  requires a time flow meaning the Delphix saves a time window of changes from the source and purge changes older than the time window. The time window allows the system to continuously  run and reach a storage equilibrium without using up more and more storage.

2. The storage 

Delphix maps their own file system onto any storage and  manages  snapshots, clones and compression of the data for efficient storage. Delphix not only shares data blocks on disk but also in memory.   Delphix handles and orchestrates access to the data it manages meaning sharing un-mondified duplicate datablocks between all the thin clones and keeping modified blocks private to the clone that made the modification.

3. Provisioning

Delphix  automates the provisioning of thin clones meaning providing a self service interface. Provisioning handles exposing the datafiles  over NFS to the target machines that run the database. Provisioning  automatically handles renaming the database, setting startup parameters, recovering and opening the thin clone.  Provisioning is self service where anyone can provision clones be they a DBA or a developer. In order to allow access to anyone Delphix handles logins, security groups and defining which groups have access to which source databases, how many clones can be made, what target machines can run clone instances, what operations the user is allowed to do and how much extra storage the user is allow to incur on the system. Delphix also provides functionality such as rolling back, refreshing, branching and tagging virtual databases.


With all 3 core functionality integrated and automated Delphix provides key features

  • Database Virtualization – thin clone support for major databases such as Oracle, SQL Server, PostGres
  • Application Virtualization – support thin cloning application stacks
  • Self Service – interface easily usable by application developers, QA staff and BI teams to provision their own clones on demand.
  • Branching  – support branching clones meaning making clones of clones which crucial for supporting multi-version development or even just being able  to re-provision previous application development stack for patching while at the same time working on the current version
  • Synchronization – support cloning multiple related database such that each clone is cloned from the exact same point in time. Any Fortune 500 that has multiple revenue tracking databases will need to synchronize a clones of each for analyzing financial close discrepancies.
  • Cloud Ready – supports any storage and efficient low bandwidth replication across heterogenous storage types. Installs on any commodity hardware.
  • Any Time – make clones at any time down to the second with the timeflow window
  • Live Archive – save specific points in the timeflow for ever to support  compliance and auditing




The following video shows how to create a full RAC clone of a RAC source database in 5 minutes using Delphix. To see how Dephix links to the source database see the second video below.

Here is the RAC Delphix setup video, which similarly was finished in 5 minutes