Author Archive

Data IS the constraint.

December 20th, 2013


photo by David Blackwell

by Woody Evans

You are paying a data tax. You are paying a data tax moving and copying data and moving and copying data over and over again. Moving the data IS the big gorilla.  This gorilla of a data tax is hitting your bottom line hard.  When moving data is too hard, then the data in non production systems such as reporting, development or QA becomes older, and the older the data, the less actionable intelligence your BI or Analytics can give you. The less intelligence, the more missed revenue. The longer it takes you to match data from system A and system B, the more opportunities for your customer to get frustrated with the fact that the left hand hasn’t talked to the right hand – that you have no intimacy with them – you don’t seem to even know them. The longer it takes your systems to be tested (because it takes so long to reload the data) the fewer real features make it to market, and the more you put your market share at risk.

Business skeptics are saying to themselves that data processes are just a rounding error in most of their project timelines, and that they are sure their IT has developed processes to fix that. That’s the fundamental mistake. The very large and often hidden data tax lay in all the ways that we’ve optimized our software, data protection, and decision systems around the expectation that data is simply not agile. The belief that there is no agility problem is part of the problem.

How big is the data tax? One way we can measure it is by looking at the improvements in project timelines at companies that have eliminated this data tax through implementing data virtualization appliance (DVA) and creating an agile data platform (ADP). Agile data is data that is delivered to the exact spot it’s needed just in time and with much less time/cost/effort. By looking at productivity rates after implementing an ADP compared to before the ADP we can get an idea of the price of the data tax without an ADP.  IT experts building mission critical systems for Fortune 500 companies have seen real project returns averaging 20-50% productivity increases after having implemented an ADP.  That’s a big data tax to pay without an ADP. The data tax is real, and once you understand how real it is, you realize how many of your key business decisions and strategies are affected by the agility of the data in your applications.

Took us 50 days to develop an insurance product … now we can get a product to the customer in 23 days with Delphix

Data Agility by the Numbers

Let’s take a high level look at the kinds of cost, revenue, and risk impact that agile data can have to your business in four key areas: Business Level Initiatives, Application Operations, IT Operations, and IT Systems. In each of these cases, we’re incurring cost we can avoid or  we are missing revenue that we could be capturing or accepting business risk when we don’t have to do so.

Business Level

At the business level, we’ve lived with the constraint of slow data for a long time. We may offshore the data management task of our nightly BI refresh. We may only allow business users to run their reports at night because the cost to copy data off production is too much, and the need for fresh data never goes away. We may live with week old data in our BI because our network is built for data protection and can only handle a full copy during off hours. To get features out the door, we may spend less time testing or simply accept the fact that there will be more errors post production and ignore that cost because it is born by operations. But what if data were agile?

If data were agile then, instead of paying for 2 full time offshore resources to get data that is already day old when we get it the next day, we could instead have minutes old data in minutes and get it automatically. With agile data margins go up and revenue opportunities increase (for example, wouldn’t it be good for Walmart in California to know that Lego Batman sold like hotcakes as soon as it hit the shelves in New York that morning?). Multiply that by 100 applications, and you’re talking about real money and competitive advantage. Instead of running 5 tests in two weeks (because it takes me 2 days to rollback after each of my 1 hour tests) and paying the cost of bugs slipping into production, what if I could run 15 tests in that same two weeks and have no bugs at all in production? Costs fall, quality rises, customer satisfaction rises, competitive position strengthens. Even better, what if I could get more features into my release because I knew my quality testing would be robust enough to handle it, and I had enough time in my schedule? How much is having new features faster worth? What about really big problems like consolidating data center real estate, or moving to the cloud? If you can non-disruptively collect the data, and easily and repeatedly present it in the target data center, you take huge chunks out of these migration timelines. Moreover, with data being so easy to move on demand, you neutralize the hordes of users who insist that there isn’t enough time to do this, or its too hard, or too risky. Moving the data IS the big gorilla. Eliminating the data tax is crucial to the success of your company. And, if huge databases can be ready at target data centers in minutes, the rest of the excuses are flimsy. We know from our experience that there are some $1B+ Data center consolidation price tags. Taking even 30% of the cost out of that, and cutting the timeline, is a strong and powerful way to improve margin.

When the cost to get fresh data falls precipitously, better decisions and faster delivery mean better margins, higher profitability, better revenue growth, and faster time to market. And, for the ones capable and willing to change the way they do business to take maximum advantage, it means better EPS, higher EBITDA, and a business ready to succeed.

Application Operations

Forrester estimates that companies will spend $97.5B in application outsourcing and management in 2013 (Forrester publication 83061). For large ecosystems, such as Oracle E*Business or SAP, the complexity of data management can be sprawling, with each landscape consuming dozens of databases, and landscapes being built not only for lifecycle copies for current development (Dev, Test, QA, Etc.) but for multiple streams of development (Release 2.1, 2.2, 2.3, 2.4, etc.)

For application teams, data constraints are often masked as something else. For example, maybe the application is only allocated so much storage, so there can only be so many full size copies, so the developers have to make do with stripped down copies (that can have unexpected results in production) or shared copies (which often involves all sorts of extra reset/rollback operations as well as freezes and holds so that one developer doesn’t clobber another). We trade productivity for cost in this phase. But, the primary cost sink is the data – storing it, moving it, copying it, waiting to be able to use it. So, business responds. Sometimes, the business lives with the risk of using subsets and slower test cycles by pushing the timeline: Saving cost at the expense of delivering to market quickly. Sometimes the business invests in a lot more hardware (and for many of our customers – a runaway storage bill): Delivering Quickly at the expense of higher cost and lower margin. Sometimes, the business just squeezes testing: Delivering lower quality applications sooner at the cost of higher remediation, and lower customer satisfaction. The point is that data is the big culprit in all of these.

Agile data – virtualized data – uses a small footprint. A truly agile data platform can deliver full size datasets cheaper than subsets. A truly agile data platform can move the time or the location pointer on its data very rapidly, and can store any version that’s needed in a library at an unbelievably low cost. And, a truly agile data platform can massively improve app quality by making it reliable and dead simple to return to a common baseline for one or many databases in a very short amount of time. Applications delivered with agile data can afford a lot more full size virtual copies, eliminating wait time and extra work caused by sharing, as well as side effects. With the cost of data falling so dramatically, business can radically increase their utilization of existing hardware and storage, delivering much more rapidly without any additional cost. An agile data platform presents data so rapidly and reliably that the data becomes commoditized – and servers that sit idle because it would just take too long to rebuild can now switch roles on demand.

Next Post: IT Operations, and IT Systems.


Data Simplicity

November 22nd, 2013

Complexity costs us a lot. Managing data in databases is a big chunk of that cost. Applications voraciously consume ever-larger quantities of data, driving storage spend and increased IT budget scrutiny. Delivering application environments is already so complex that the teams of experts dedicated to that delivery demand many control points, and much coordination. The flood of data and the complex delivery process makes delivery of environments slower and more difficult, and can lengthen refresh times so much that stale data becomes the norm. Complexity also grows as IT tries to accommodate the flood of data while their application owners expect Service Level Agreements, backup/recovery protections, and compliance requirements to remain constant.

What’s the result? Even our most critical projects can get behind schedule and stay there because delivery of environments is so slow. We find ourselves accepting and even anticipating production outages and their reputation risk because we just couldn’t test long enough on data that’s fresh enough or on environments big enough to find those problems before we went live. The cost and complexity of mobilizing our data to fewer platforms and datacenters has grown so high that we’re stuck year after year with a patchwork of datacenters, database versions, and old infrastructure draining already over-strained IT budgets. Our processes for data management are a patchwork, with no central point of control to provide the accountability and controls needed to insure compliance.

When we talk to the folks who support these applications, they tell us that data management is complex, and that’s the way it is. And, it’s not just these high visibility problems. We have a lot of highly paid experts that spend a lot of time copying, moving and babysitting bits for complex architectures like Business Intelligence or Master Data Management. The striking thing about many of these situations is that we don’t think there is a data management problem. We’ve concluded that data management must be complex.
But, that conclusion is the problem. And, instead of entertaining the idea that data management can be simpler, we find that many leading technologists and business leaders shake their heads and say, “We’ve got the best tools, the best technology, and the latest processes. Our data management problems just aren’t that extreme.” Really?

We recently deployed Data Virtualization technology for a company on its own internal MDM project. Now, that company is clearly expert at MDM, and they were certainly using the best tools and processes since they are industry leaders. They cut their own delivery timeline by 50%. We also deployed Data Virtualization technology for a fortune 500 Company with a large SAP implementation. Instead of delivering 2 similar sized SAP releases every 6 months, they are delivering 11 with the same team. Industry leaders are unlocking tremendous value because they are realizing that their processes can evolve and simplify, and the bottlenecks can be removed.

Will you experience the same benefits? Maybe. Maybe Not. But, you’d agree that ignoring that amount of potential value is always a wrong decision. Disruptive technology is never well understood at first, because the essence of disruptive technology is that it finds a lever that no one else can see to unlock value no one knew was there. It requires a new kind of thinking that will challenge the way we’ve managed data. Data Virtualization can help you reduce workloads, streamline data management processes, remove bottlenecks by pushing work closer to end users, reduce the critical path, shorten the application delivery cycle, deliver more features to market sooner, and gain competitive advantage. With Data Virtualization technology, we can massively simplify. And that’s where the value is.


Why Data Agility is more valuable than schema consolidation.

November 20th, 2013

If you’ve been an Oracle data professional for any length of time, you’ve undoubtedly had a conversation about reducing your total number of databases by consolidating applications schemas from each individual database into separate schemas in one monolith database. Certainly, in the days before shared dictionaries, pluggable databases, and thin cloning this argument could be made easily because of the high price of the database license. Consolidation is a winning argument, but doing it at the cost of data agility turns it into a loser. Let’s explore why.

The argument For Schema Consolidation

Essentially, the pro-side of this argument is cost driven. Using 18 schemas in 1 database instead of 18 databases with 1 schema each means:
• Fewer database licenses – and this usually this holds true even if you use CPU-based pricing.
• Fewer hosts – even if you need a slightly larger host to handle the traffic, its usually cheaper than N individual hosts – especially when you consider the managed cost.
• Less Storage – The binaries and dictionary objects are often exactly similar, but we end up storing them on disk and in memory N times, and then backing them up to disk even more times.
• Less CPU and I/O – 18 databases simply require more CPU and I/O than 18 schemas in one database even if you push the same traffic to the 18 schemas.
• Less Database Management Overhead – fewer databases to manage means less time managing.

The argument Against Schema Consolidation

The con-side of this argument is very sensitive on conditions. In some cases, I’ve seen these cons really amount to very little cost and effort. In other cases, even without the newer technologies, the cost was so high that the approach was abandoned. Key things that usually cause Consolidation efforts trouble include:
• Namespace Clobber – especially when you are consolidating applications that weren’t designed that way, all sorts of namespace conflicts can arise. I’ve seen synonyms, links, and packages wreak havoc on a consolidation effort, sometimes even requiring large re-coding because of the nature of the beast.
• No Isolability – traffic, resources, and limits are no longer fully isolated. A lot of traffic to one schema can affect the performance of another. A huge update may rob you of the ability to get good response rates. A crash in one application can cause a domino effect – whether the fail begins at the database tier or the app tier. One failure affects all.
• Conjoined Backup and Restore – Often, the database is backed up as a collective, and restored as one. So, great care must be exercised when only a single schema related to a single app needs a restore. Of course, you can work around this by creating separate tablespaces for each schema, and then using tablespace PIT Recovery, but that itself takes time and resources.
• Risky Planned and Unplanned Outages – If you’ve got to patch the database, everybody goes down. If someone pulls the plug on your database host in the data center, everyone is hosed.
• Resource Pool Management Complexity – if you’ve only got one database, then you’ve probably got one storage pool. So, unless you’re very carefully carving and monitoring it (which itself takes time and resources), you can cause all sorts of unintended side effects.
• One Attack Surface – If 18 apps share a database, then they share an attack surface. An exploit against one is an exploit against all.
• More Management Headaches – A lot more focus, concern, and worry about Quotas, Security, backing up/restoring schemas, isolation measures, and control. This is such a headache that a lot of work has gone into automation.

The Tide has Turned

First, the benefits aren’t as strong as they used to be. The marketing around Oracle 12c provides ample evidence that the same amount of work over a collection of databases takes up 6x less hardware. Pluggable databases, or databases with shared dictionaries make the cost side of the equation significantly less attractive. Thin cloning technology neutralizes most of the rest of the equation as it provides a way to have copies of the database at almost no cost, virtually eliminating the storage argument.

Then there are the cons. And, this is where I content that we have systematically ignored or under estimated the value of Data Agility.

The value of Data Agility

Getting the right data to the right person at the right time is such a key value for organizations because there are so many obstacles to doing it. And, instead of understanding the value of agility, we’ve spent a lot of time, energy and effort finding solutions to minimize the impact of not having that agility. Like what, for example? Letting developers code on smaller sets of data OR live with older data OR write their own custom “rollback” scripts. Encouraging testers to accept the idea that a rewind has to take 8 hours, or that they have to wait for a refresh because they might clobber someone else’s app, or that they can’t test in their schema today because another application is at a “critical” stage. Telling BI folks that they can’t get their data refreshed because other apps can’t be taken offline, and it just takes too long to ship the whole schema over. Telling all of the apps that they have to be down like it or not because we can’t patch one schema at a time.

Using Schema consolidation saves money at the cost of data agility, and shifts the burden in ways where we’ve been trained not to miss it, or where we think it’s an IT problem.

Delphix thin cloning lets you keep your individual databases, but pay the price of a consolidated one. Developers can code on a full and fresh set of data at a fraction of the cost and never write custom rollback scripts again. Testers can rewind in minutes without having a huge resource drain, avoiding wait times mostly required to avoid secondary effects outside their app. BI folks never have to go offline to get fresh data, and refresh is a minutes long operation every time. Patch and Upgrade can not only be accomplished on a rolling basis, but using intelligent refresh, can be performed once and distributed to all sorts of downstream systems.

So what?

If you’re considering schema consolidation, look hard at the ROI. What used to make sense even 3 years ago is completely upside down now. Pluggable databases destroy the cost argument at the memory tier, and Delphix Thin Cloning does the same at the storage tier. Schema Consolidation just doesn’t make the economic sense it used to make.


Designing IT for Data Manufacture

November 16th, 2013


photo by Jason Mrachina

As a (recovering) Mechanical Engineer, one of the things I’ve studied in the past is Design for Assembly (DFA). In a nutshell, the basic concepts of DFA are to reduce assembly time and cost by using fewer parts and process steps, making the parts and process steps you do use standard, automating, making it easy to grasp/insert/connect parts, removing time wasting process steps like having to orient the parts and so on.

In a meeting with a customer a couple of days ago, it struck me that out IT departments and increasingly our database and storage professionals have become very much like a database factory. What also become clear is that the processes that exist are cumbersome, expensive, and often burdened with low quality outputs.

Process complexity

If you think about the standard cycle to get a clone of a dataset provisioned in an organization, it can be long and expensive. There’s the process flow itself, which, although often automated, is also chock full of stop points, queues, wait times, and approvals. (At one major customer of mine on Wall St., a single request required 34 approvals.) Then, there are many decision points. For a simple clone, you might have to involve the project lead, the DBA, the storage team, the backup gut, the System Administrator, and in some cases even management just to get a simple clone made. And, each of these people has a queue, and has to decide how important your request is and if they have enough space and if there’s enough time, etc. etc. Then, even when it’s approved, maybe the Oracle RAC has a different methodology than the SQL Server, maybe we store our datasets locally whereas our sister application has to go get a backup from tape to use as a restore. All of this creates a process flow with lots of steps, moving parts, complexity, localizations and customizations, and potential for error and rework.

Principles of an efficient Data Factory

Considering IT as a data factory, we could apply the DFA principles to data and enumerate them as:
• Reduce the number of people, steps, and parts to get data clones provisioned.
• Simplify and standardize the steps that must remain
• Automated provision wherever possible
• Use standard Data Management Processes across data clones of all flavors.
• Simplify the process to connect data clones to the hosts and applications that need it.
• Encapsulate or eliminate the need for special knowledge to get a data clone to operate; Make it Dead Simple.

Delphix is the Data Factory

One great capability of the Delphix Engine is that it fulfills the tenets of an efficient Data Factory.

First, by automating 99% of the collection, storage, provision, and synchronization of data clones, it radically reduces the provisioning and refreshing process. Storage Administration often becomes a one-time activity, and provisioning or refreshing becomes a 3-click operation. Hours and Days (and sometimes even Weeks and Months) become minutes.

Second, it simplifies data clone management to such a degree that developers and even business application managers can build databases themselves – whether they are Oracle or SQL Server.

Third, in addition to radical provision and refresh automation, all of the housekeeping to gather change, integrate change, build snapshots, retain and release data, automate refresh are completely automated to such an extent that refreshing, rewinding, and restoring are also 3-click operations. And, doing things like a daily refresh for BI is a set-and-forget kind of operation.

Fourth, Data Management processes are standard across all flavors of supported databases. A refresh is a refresh. It shouldn’t matter to the end user that it’s a refresh for Oracle vs. a refresh for SQL Server or Postgres.

Fifth, by integrating with the mechanisms that let the database be ready for action (such as automatically registering with the Oracle Listener, or automatically applying masking scripts to make sure you’ve got obfuscated data to ship to your databases at Rackspace), the hosts and applications may not need to do anything except wait for the refresh to finish. No Request Necessary. No ticket to file. Nothing but fresh data in your database every morning ready to go!

Sixth, by encapsulating all of the difficult knowledge through automation or smart templating, it empowers a whole class of non-data professionals to perform their own Data Management. Letting Developers refresh for themselves completely takes the middle man out. No process needed at all.

So What?

If you’re a CIO, you may know that you’ve been operating your data factory like it’s 1965. You’ve been so far ahead of the game for so long that it has been inconceivable that there is a radically better way. That was the way that the American manufacturers thought before Deming and the other Total Quality Management gurus changed the way cars were manufactured in Japan. It’s time to bring your data factory into the 21st century. Stop trusting your data provisioning process to an outdated, overly complex, error-prone factory based on the IT organization of the 90s. Start trusting Delphix to deliver high quality database clones at much lower cost just-in-time for your needs.


Database Upgrade – What’s the Bottleneck?

November 15th, 2013

I met with a customer today who described for me the challenges they had in their previous 10g to 11g Oracle database upgrade. Their requirements boiled down to this:
• The business can’t afford a lengthy cutover time for the upgrade.
• The business can’t afford any data loss.
• They business had to be able to rollback the upgrade in the event of a fail.
• The 8-10 downstream systems need to be upgraded very quickly soon after.

To meet these requirements, they had to make a whole variety of difficult choices that exacerbated all of the limitations and bottlenecks that an upgrade can pose. Instead of upgrading their 10g in place, they had to make a full copy, upgrade that backup to 11g, figure out how to ship the changes from the old 10g to the new 11g during which both databases were essentially down. And then, once the cutover was complete, there was still the job of making a backup of the new 11g that could be used to create all of the downstream systems. They faced most of the typical bottlenecks of an upgrade:

1. For databases in the 5 TB+ range, the time it take to run a database upgrade can be significant.
2. An upgrade is typically a one-way transformation on a physical file system.
3. Downstream systems either go through the upgrade as well or have to be restored or cloned from the “new” database, which can be very expensive as well.

What’s the real bottleneck?

An Oracle database is just a collection of blocks. Even if you go from 10g to 11g, typically you’re only changing a few of those blocks in that database. And, the reason why we’re faced with choices such as upgrade or re-clone on our downstream environments is because we just don’t have the data agility to be able to rapidly reproduce the change – we are forced to pay a tax in time, copy, or transmission to make it happen. But, again, the real change in the data is very minimal. What’s the real bottleneck? Data Agility.

Delphix to the rescue

I like to think of a database upgrade as made up of 3 distinct process steps. First, there’s the rehearsal where you create copies of your existing database and rehearse the upgrades until you’re happy and secure that everything went well. Second, there’s the cutover where you either quiesce and convert in place or stand up a new target and quiesce and apply the delta to the new target. And, third, there’s the propagate where you take the newly minted environment and copy it around for Prod Support, Reporting, Dev, Test, QA, etc. to bring everyone up to the same version.

Delphix has several powerful features that cut through the noise and get to the data agility problem: Virtual-to-Physical, Any Point In Time Database Clones on Demand, Database Rewind, and upgradeable source and virtual databases.

Consider this same client’s situation if they had used Delphix. Since Delphix has on-tap a Near-Real-Time version of the database to be upgraded, and can spin up a copy of that database in minutes, it’s easy to reduce the cycle time of each iteration of testing. So, the big gorilla in the room – the time it takes to rollback and reset for each rehearsal – just goes away with Delphix. Second, if you’re using Delphix to virtualize all of the downstream copies of the databases, then they will take up minimal space BOTH before AND after the upgrade (again, since the upgrade typically doesn’t change more than a small %ge of blocks.) Third, if you upgrade your primary data source from 10g to 11g, then the operation to “upgrade” virtual downstream systems can literally be a couple of clicks and a few minutes.

So What?

In my experience, the vast majority of the time people spend on their upgrade projects is not for the execution of the actual upgrade script, it’s in fact mostly around migration and cutover – moving bits, synchronizing bits, etc. When you see these things as the Data Agility problems that they are – and find ways to optimize those operations with a tool like Delphix, then you realize that the only real bottleneck in the operation is the actual upgrade script – and that’s the one thing you can’t change anyway.

The power of this sort of approach to upgrading databases is significant. I can recall a customer who had put together an entire 6-week timeline to prepare, test, and execute their cutover of a large multi TB data warehouse. With Delphix, that entire operation was complete in 6 hours. With thin cloning from Delphix, you can remove bottlenecks related to your Data Agility, and focus on the true bottleneck and by so doing reduce your Total Cost of Data by delivering your upgrades faster at a fraction of the cost you pay today.


Data Management from a Theory of Constraints Perspective

November 14th, 2013


When I read Eliyahu Goldratt’s the Goal in Grad School, one of the key things that stuck with me is that there’s always a bottleneck and that the process only moves as fast as the bottleneck allows it to move. The Theory of Constraints methodology posits three key measures for an organization: Throughput, Inventory, and Operational Expense. Sitting down to think about this, I reasoned that we could use those same metrics to measure the total cost of data for copies, clones and backups.

For every bit of data that enters the door through Production, we could offer that the Throughput represents the data generated to create or update the copies, clones and backups. Inventory could represent the number of copies of each bit of production data that sits on a copy, clone, or backup. And, Operational Expense represents all of the Labor and Resources spent creating and transmitting that data from Production to its final resting place.

When expressed in these terms, the compelling power of thin cloning is clear. Let me show you what I mean by a little Thought Experiment:

Thought Experiment

If I had a fairly standard application with 1 Production Source, 8 downstream copies, and a 4 week – Weekly Full / Daily Incremental backup scheme and a plan to refresh the downstream systems on average 1/week, what would the metrics look like with and without thin cloning?

TOC Metrics for Cloned/Copies/Backed-up Data

8 * Daily Change Rate of Production

8 * Full Size of Production
4 * Full Size of Prod (1/week for 4 Weeks)
24 * Daily Change Rate of Production (6/week for 4 weeks)

Operational Expense
8 shipments and applications of change data / day
1 Backup Operation/Day

With Delphix thin cloning, these metrics change significantly. The shared data footprint eliminates most of the shipment and application and redundancy. So:

TOC Metrics for Cloned/Copies/Backed-up Data using thin clones

1 * Daily Change Rate of Production

Change is shipped and applied to the shared footprint once.

1 * Full Size of Production (Before being Compressed)
28 * Daily Change Rate of Production

A full copy is only ever taken once. (Otherwise, it is incremental forever.)

Operational Expense
1shipments and applications of change data / day
0 Backup Operation/Day

Since change is applied to the common copy, backups are just redundant operations.

So what?

The thought experiment reflects what we see every day with Delphix. The Throughput of data that has to move through a system is significantly less (8x less in our experiment). And, it gets relatively more efficient as you scale. The Inventory of data that has to be kept by the system is not driven by the number of copies, but rather is driven by the change rate and the amount of change kept. Unless you are completely flopping over your copies downstream (in which case you have different problems), this also gets relatively more efficient as you scale. And finally, when it comes to Operational Expense, you’re not just getting more efficient, you’re actually eliminating whole classes of effort and radically simplifying others.

The bottom line here is that Data has been THE BIG BOTTLENECK for a long time in our applications. And, with thin cloning from Delphix, you’ve finally got the power to take control of that bottleneck, measure the true impact, and reduce your Total Cost of Data by delivering the right data to the right person at a fraction of the cost you pay today.



The Thin Cloning Left Shift

November 13th, 2013

The DevOps approach to software delivery manages risk by applying change in small packages instead of big releases. By increasing release frequency, overall risk falls since more working capabilities are delivered more often. The consequence of this is that problems with your data can be amplified. And, as a result, you can squeeze the risk out of one aspect of your delivery just to introduce it in another. Thin cloning attacks that risk, enhancing and amplifying the value of DevOps by reducing the data risk inherent in your architecture.

Data Delivery

How is there risk in your architecture? Well, just because you’ve embraced Agile and DevOps doesn’t mean that your architecture can support it. For example, one customer with whom I spoke had a 3-week infrastructure plan to go along with every 2-week agile sprint because it took them that long to get their data backed up, transmitted, restored and ready for use. So, sure, the developers were a lot more efficient. But, the cost in infrastructure resources, and the corresponding Total Cost of Data was still very high for each sprint. And, if a failure occurred in data movement, the result would be catastrophic to the Agile cycle.

Data Currency and Fidelity

Another common tradeoff has to do with the hidden cost of using stale data in development. The reason this cost is hidden (at least from the developer’s viewpoint) is that the cost shows up as a late breakage event. For example, one customer described their data as evolving so fast that a query developed using stale data might work just fine in development but then be unable to respond to several cases that appear in more recent production data. Another customer had a piece of code tested against a subset of data that came to a crawl 2 months later during production-like testing. Had they not caught it, it would have resulted in a full outage.

I contend that the impact of these types of problems is chronically underestimated because we place too much emphasis on the number of errors, and not enough on their early detection. I contend that being able to remediate errors sooner is significantly more important than being able to reduce the overall error count. Why? First, because the cost of errors rises dramatically as you proceed through a project. Second, because remediating faster means avoiding secondary and tertiary effects that can result in time wasted chasing ghost errors and root causing things that simply would not be a problem if we fixed things faster and operated on fresher data.

Thought Experiment

To test this, I did a simple thought experiment where I compared two scenarios. In both scenarios, time is measured by 20 milestones and the cost of error rises exponentially from “10” at milestone 7 to “1000” at milestone 20. In Scenario A, I hold the number of errors constant and force remediation to occur in 5% less time. In Scenario B, I leave the time for all remediation constant and shrink the total number of errors down by 10%.

Scenario A
Scenario A: Defects Held Constant; Remediation Time Reduced by 10%

Scenario B
Scenario B: Remediation Time Held Constant; Defects Reduced by 10%

In each graph, the blue curve represents the before state, and the green curve the after state. For both Scenarios, in the before state, the total cost of errors was marked at $2.922M. The comparison of the two graphs shows that the savings from shrinking the total time to remediate by 10% was $939k vs. the savings from shrinking the total number of errors was $415k. In other words, even though these graphs didn’t change much at all – the dollar value of the change was significant when time to remediate was the focus. And, the value of reducing the time to remediate by 10% was more than twice as much then the value of just reducing the number of defects by 10%. In this thought experiment, TIME is the driving factor driving the cost companies pay for quality – the sooner and faster something gets fixed, the lower it costs. In other words, shifting left saves money. And, it doesn’t have to be a major shift left to result in a big increase in savings.

The Promise of thin cloning.

The power of thin cloning is that it addresses both of the key aspects of data freshness: currency and timeliness. Currency measures how stale it is compared to the source [see Segev ICDE 90] and timeliness how old it is since its creation or update at the source [See Wang JMIS 96]. These two concepts capture the real architectural issue with most organizations. There is a single point of truth somewhere that has the best data (high timeliness). But, it’s very difficult to make all of the copies of that data maintain fidelity with that source (currency) and the difficulty to do so rises in proportion to the size of the dataset, and the frequency with which the target copy needs currency. But, it’s clear that DevOps goes in this direction.

Today, most people accept the consequences of low fidelity/lack of currency because of the benefits of a DevOps approach. That is, they accept that some code will fail because its not tested on full size data, or because they will miss cases because data is evolving too quickly, or that they will chase down ghost errors because of old or poor data. And, they accept it because the benefit of DevOps is so large.

But, with thin cloning solutions like Delphix, this issue just goes away. Large – even very large databases can be fully refreshed in minutes. That means full size datasets with minutes old timeliness and minutes old currency.

So what?

Even in shops that are state of the art – with the finest minds and the best processes – the results of thin cloning can be dramatic. One very large customer struggling to close their books each quarter was struggling with a close period of over 20 days, with more than 20 major errors requiring remediation. With Delphix, that close is now 2 days, and the errors have become undetectable. For a large swath of customers, we’re seeing an average reduction of 20-30% in the overall development cycle. With Delphix, you’re DevOps ready, prepared for short iterations, and capable of delivering a smooth data supply at a much lower risk.

Shifting your quality curve left saves money. Data Quality through fresh data is key to shifting that curve left. Delphix is the engine to deliver the high quality, fresh data to the right person in a fraction of the time that it takes today.


The Principle of Least Storage

November 12th, 2013

We’re copying and moving redundant bits

In any application environment, we’re moving a lot of bits. We move bits to create copies of Prod for BI, Warehousing, Forensics and Production Support. We move bits to create Development, QA, and Testing Environments. And, we move bits to create backups. Most of the time most of the bits we’re moving aren’t unique, and as we’ll discover, that means they we’re wasting time and resources moving data that doesn’t need to be moved.

Unique Bits and Total Bits

Radically reducing the bulk and burden of caring for all of the data in the enterprise has to start with two fundamental realizations: First, the bits we store today are often massively redundant. Second, we’ve designed systems and processes to ship this redundant data in a way that makes data consolidation difficult or impossible. Let’s look at a few examples:

Backup Redundancy

Many IT shops at major companies follow the Weekly Full Daily Incremental Model and will keep 4 weeks full of their data on hand for recovery. If we assume that for a data store (such as a database) the daily churn rate is 5% per day, then we could describe the total number of bits in the 4 week backup cycle as follows: (Using X as the current size of the database and ignoring annual growth):

Total Bits: 4*X + 24*5%*X = 5.20*X

But how much of that data is really unique? Again, using X as the current size of the database and ignoring annual growth:

Unique Bits: X + 27*5%*X = 2.35*X

The ratio of total to unique bits is 5.2 / 2.35 or 2.09. That is, our backups are 51% redundant at a bit level. Moreover, the key observation is that the more full backups you perform, the more redundant your data is.

Environment Redundancy

According to Oracle, the average application has 8 copies of their production database, and this number is expected to rise to 20 in the next year or two. In my experience, when backups have about a 5% daily change rate, Dev/Test/QA classes of environments have about a 2% daily change rate, and are in general 95% similar to their production parent database even when accounting for data masking and obfuscation.

If we assume an environment with 8 copies that are being refreshed monthly, start out 5% divergent and churn at a rate of 2% per day, then we could describe the total number of bits in these 8 environments as follows: (Using X as the current size of the database and ignoring annual growth):

Total Bits: 8*95%*X + 2%*30*8*X = 10*X

But how much of that data is really unique? Again, using X as the current size of the database and ignoring annual growth:

Unique Bits: X + 2%*30*8*X = 3.40*X

The ratio of total to unique bits is 10 / 3.4 or 2.94. That is, our copies are 65% redundant at the bit level. Moreover, the key observation is that the more copies you make, the more redundant your data is.

Movement is the real redundancy

Underlying this discussion of unique bits vs. total bits is the fact that most of the time, the delta in bits between the current state of our environment and the state we need it to be in is actually very small. In fact, if we eliminate the movement of bits to make operations happen, we can reduce the total work in any operation to almost nothing. If you’re hosting not just one copy but every copy from a shared data footprint, you have a huge multiplying effect on your savings.

The power of a shared data footprint is that it makes a variety of consolidations possible. If the copy of production data is shared in the same place as the data from the backup, redundant bits can be removed. If that same data is shared with each development copy, then even more redundant bits can be removed. (And, in fact, we see a pattern of only storing unique bits emerging.) Finally, if we need to fresh development, we can move almost NO bits. Since every bit that we want already exists in the production copy, we just have to point to those bits and do a little renaming. And because its a shared footprint, we don’t have to export huge amounts of data to a distant platform; we can just present those bits (e.g., via NFS).

Consider a developer who needs to refresh his 1 TB database from his production host to his development host in concert with his 2-week agile sprints. In a world without thin clones, this means we transmit 1 TB over network every 2 weeks. In a world with thin clones and a shared footprint, we copy 8 GB locally and don’t have to transmit anything to achieve the same thing.

The better answer

Regardless of our implementation, we reach maximum efficiency when we achieve our data management operations at the lowest cost. Reducing the cost of movement is part of that, so I offer the:
Principle of Least Movement:

Move the minimum bits necessary the shortest distance possible to achieve the task.

So what?

There’s a workload attached to moving these bits around – a cost measured in bits on disk or tape, network bandwidth consumed, and hours spent. Since we’re moving a lot of redundant bits, much of that work is unnecessary. There’s money to be saved, and it isn’t a small amount of money. And, that cost doesn’t just end in IT. It costs the business every time a Data Warehouse can’t get the fresh data it needs to so that real time decisions can be made. (Should I increase my discount now, or wait until tomorrow? Should I stock more of Item X because there is a trend that people are buying it?) It costs the business when a production problem continues for an extra 4 or 6 or 8 hours because that’s how long it takes to restore a forensic copy. In fact, in my experience, the business benefit to applications for outweighs the cost advantage, which is not insignificant.


The Inferior Subset

November 9th, 2013

Why Subsets qualify as an inferior good

Why are you sub-setting your data? Even with the cost of spinning disk falling by half every 18 months or so, and the cost and power of flash rapidly catching up, several large customers I’ve encountered in the last three years are investing in large scale or pervasive programs to force their non-prod environments to subset data as a way to save storage space.

However, there are also several trade-offs with sub-setting and potential issues it can create, including:

* The performance of code under small sets of test data can be radically different than results on full sets of data.
* The creation of the subset can be CPU and Memory intensive, and may need to be repeated often.
* The process to create consistent subsets can be arduous, iterative, and error prone. In particular, corner cases are often missed, and creating subsets that maintain referential integrity can be quite difficult.

Its difficult to get 50% of data and 100% of skew instead of 50% of data 50% skew.  Without the proper skew QA could miss important cases and the SQL optimization could come out completely wrong not to mention that the SQL queries could hit zero rows instead of thousands.

Why thin cloning makes subsets an inferior good

As we’ve discussed in other blogs, a thin cloning solution such as Delphix caused the total cost of data to fall dramatically, and this increases a CIO’s purchasing power (in the context of data) – allowing much more data to be readily available at a much lower price point. The dramatic result that we observe out of this is that people are abandoning subsets in droves. In fact, as the price of data has fallen with the implementation of Delphix, the desire for subsets is being replaced by a desire for full size datasets. Certainly, customers will still want subsets for reasons such as: limiting data to a specific business line or team, or as a security measure, or as a way to utilize fewer resources in the CPU and Memory stack. But, it is also clear that the reduction in the total cost of data has resulted in customers switching to full size datasets to avoid performance-related late breakage, avoid the cost of subset creation. Beyond this, its causing them to rethink their investment in a sub-setting apparatus altogether.

At Delphix, the data we see from customers bears this out. Subsets cost a lot to make, and with the storage savings gone – they are just inferior to full size sets when it comes to many applications. With the elimination of storage as a primary reason to subset, (based on storage savings through thin clones), the inferiority of the subset is quickly being realized.


Production Possibility Frontier

November 8th, 2013

By: Woody Evans

“The most powerful thing that organization can do is to enable  development and testing to get environments when they need them”

Gene Kim, author of the Phoenix Project

App Features vs. Data Work

The power of a technology change, especially a disruptive technology shift, is that it creates opportunities to increase efficiencies. The downside is that companies take a long time to realize that someone has moved their cheese. Data virtualization, ie automated thin cloning of databases, VM images, App Stacks etc alters the production possibility frontier dramatically, providing customers can get past the belief that their IT is already optimized.

An Ideal Frontier

An idealized Production Possibility Frontier describing the tradeoff between Application Features and Data Related Work might look like the following where an engineering team of developers and IT personel could  shift their focus from producing feature work or towards data related work smoothly.


Companies on the blue line are able to efficiently shift between working on the data related work and the application feature related work in their IT projects. That self-proclaimed confidence, however, can become a barrier to adoption when a technology shift occurs – especially if you believe that certain tradeoffs are already optimized.

Suppose a developer needs to execute a refresh as part of a testing cycle. In this idealized world, it may be able to refresh my database in 2 hours or I may be able to refresh by spending 2 hours writing a piece of throwaway rollback code. Either way, that developer would have to trade off 2 hours that they would spend writing new application features in order to accomplish the refresh.

The Thin Cloning Technology Shift

Using a broad brush, we can classify much of the time and effort of application projects as Data Related:
• Waiting for data to arrive at a certain place
• Performing extra work because of a lack of the right data
• Trying to keep data in sync for all of the various purposes and work streams that an application endeavors to complete.

The research we have shows that 85% of the work in application delivery is really data related. The technology shift brought about by thin cloning, and in particular Delphix technology, pushes the Production Possibility Frontier and dramatically reshapes IT’s understanding of efficiency.

Because application feature work is dependent on data related work such setting up development environments, creating builds, building QA environments, the application feature work will be constrained by the efficiency of the data and IT work. If we make the data and IT work much more efficient then we accomplish more data work and thus more feature work.


This new production possibility frontier dwarfs the initial one. In fact, the massive size of the shift contributes greatly to the resistance that IT has because to unlock this value IT has to change what they believe to be their already optimized processes.

But, the proof is out there. And, at Delphix, we’re gathering powerful proof points every day demonstrating how customers are creating powerful efficiencies.

Waiting for data to arrive is affecting customers today. One customer of Delphix was spending 96% of their testing cycle time waiting for data to refresh. That meant only 4% of their testing time frame was used to actually test the product, shifting error detection to the right where it is more expensive. Using Delphix to refresh, they now spend less than 1% of their time waiting for refresh. That is a 99:4 ROI. That’s Better than 20:1!

We started this post with the example of an ideal situation where a developer could chose to refresh a database in 2 hours or spend 2 hours writing a piece of throwaway rollback code. But, the reality is that more often it’s a 10 hour wait to get your copy of the 3 TB database (if you can get the DBAs attention). There’s a lot of code being written out there because we’ve accepted the “optimized” way of doing things – where we accept that we can’t get fresh data so we write our own workaround. This kind of wasted effort just evaporates with Delphix.

And if you’re thinking that this is a small scale problem, think about all of the ETL and Master Data Management applications out there where developers spend endless hours writing code and business users do the same configuring apps so that the data can be properly synchronized. If you had immediate access to data that was already being synchronized in near real time, all of that work just goes away.

What IT isn’t considering and CIOs should

Disruptive technology is exactly that. It uncovers an opportunity for efficiency that you don’t see already. So, whatever was optimal before simple isn’t now. In fact, if you don’t challenge the current optimization, you’ll likely never reap the benefits of the disruptive technology. All the same, overcoming the resistance to the idea that a new optimization is possible, as well as overcoming the resistance to the idea that change can be revolutionary not just evolutionary just isn’t in the DNA of war weary, battle hardened DBAs and developers. CIOs need to consider and understand the powerful imagery of the Production Possibility Frontier for application development using thin cloning.

Thin cloning is such a powerful shift, that IT shops will often shake their heads in disbelief. CIOs need to see through that and understand that Data virtualization with thin cloning is a seismic shift. 10 years ago no one knew what VMWare was. Now, you can’t walk into any data center without it. 10 years from now the idea of having physical data instead of thin clones will be laughable. Careers are about to be made on Data Virtualization, and Delphix is the tool to which you should attach you star.