Production data is clearly critical for the core of businesses, such as
- Product orders received, opened, processed
- Back accounts deposits, withdrawals, interest accrued
- Telephone calls, times and duration
photo by Tom Blackwell
but the data is also fundamental to other parts of the business. The data is constantly being copied and pumped across corporate infrastructure. Critical business databases always have a failover database that can be used in case the primary database ever goes done. The failover database is constantly pulling in changes from production and just the changes, but on top of the fail over database, businesses incur a triple data copying tax. For every piece of production data, a copy is also made typically for each of
- Business intelligence and analytics
- Project data for development, testing, QA, integration, UAT and training
- Data protection backups for recovery
These copies of production data are repeatedly made which taxes corporate IT infrastructure.
Copying the production data requires considerable operational expense requiring work from the applications administrators, DBAs, system administrators, storage administrators, backup administrators and network administrators.
All of the copies of course require considerable capital expenditure such as storage space which requires more hardware and datacenter space. The data has to be transferred over company networks clogging communication pathways. The data has to be read from one set of storage taxing that storage system and written to a remote storage system impacting the performance of that system as well. For example the recommended method of backing up an Oracle database is to take a full backup every weekend and incremental backups every day in between. Thus every weekend, for the full backup, there is a significant strain put on disk, network and infrastructure. The key here is it’s not just the backup copy that is created but other systems require the data as well and the data is copied to all those systems. Data warehouses require the data and either go through data warehouse refreshes or are populated by running extract, transform and load (ETL) jobs on the production data. ETL jobs and data warehouse refreshes are limited by nightly batch windows but has data sets grow these jobs are pushing up against the batch windows. If the jobs overrun the batch window and have to be killed then it can mean a business decision group working on old data for the next week. With old data the business risks making decisions that are incorrect and have negative financial impact. Furthermore, corporations are going global which eliminates any windows to run data warehouse refreshes or ETL jobs further complicating the system and ins some cases causing businesses to run these jobs at times where some part of the global corporation has to suffer the performance overhead impact.
Due to the considerable capital expenses, operational expenses, infrastructure tax and work required to make copies of production data , the copies often are slow to make and or hit delays. These delays and wait times impact the business analysis who have to make the decisions the business depends on to guide the company. The delays impact application development teams who build the applications that companies depend on to generate revenue. The delays reduce productivity and quality and ultimately negatively impact revenue.
All 3 of these data copy classes, BI, backup and application development, cross over organizational division boundaries adding to dependencies and management complexity. The more complex the management and the more teams involved the more delays these copies suffer.
All of the infrastructure, operational and capital expenses as well as management overhead add up to a lot of expenses. For example, in financial services, one study calculated that the cost of data alone is responsible over 90% of the overall business cost. Of the cost of handling data much is due to the costs of acquiring and of course processing the data but surprisingly more than 60% of the cost is due to the more pedestrian tasks of storing, retrieving, distributing and delivering. (http://www.wsta.org/resources/industry-articles/)
Of course as anyone who had been reading my blog over the last 2 or 3 years knows, the majority of this data flood can be eliminated with Delphix. With Delphix , there is only one copy of unique data but that one copy can be shared by multiple databases. Each database copy shares the duplicate blocks for which there is only one copy. Any blocks that are modified by a database are kept private to that database. As far as new changes from the source, or production database, only the changes are captured into Delphix putting only a light load on infrastructure. Delphix is fully automated taking care of all the steps of capturing changes from the production database and makes the creation of data copies a simple self service exercise that anyone can do in a few minutes.