Archive for May, 2014

Database Thin Cloning: Allocate on Write (ZFS)

May 31st, 2014

Allocate on Write Thin Cloning

Three challenges specifically stand out when considering Copy on Write filesystem snapshots described in the previous section:

  • The number of snapshots you can take of source database LUNs is limited
  • The size of the snapshots is limited
  • Difficulties arise sharing the base image of source databases at multiple points in time. In some cases it is not possible, in others difficult or resource heavy.

These challenges highlight a specific need: to create thin provision clones of a source database from multiple points of time at the same time without using any additional space consumption. This requirement is important, as it allows one base image to serve as the foundation for all subsequent clones and imposes no unplanned storage or refresh requirements on users of the target (cloned) systems.

With a filesystem storage technology called Allocate on Write, these challenges can be met. In allocate on write filesystems, data blocks are never modified. When modifications are requested to a block, the block with the new changes is written to a new location. After a request to modify a block has been issued and completed there will be two versions of the block: the version that existed prior to modification and the modified block. The location of the blocks and the versioning information for each block is located in a metadata area that is in turn managed by the same allocate on write mechanism. When a new version of a block has been written to a new location, the metadata has to be modified. However, instead of modifying the contents of the relevant metadata block, the new metadata block is written to a new location. These allocations of new metadata blocks with points to the new block ripple up the metadata structures all the way to the root block of the metadata. Ultimately, the root metadata block will be allocated in a new place pointing to the new versions of all blocks, meaning that the previous root block points to the filesystem at a previous point in time. The current, recently modified root block points to the filesystem at the current point in time. Through this mechanism an allocate on write system is capable of holding complete version history of not only a block, but all blocks involved in that block’s tracking.

Screen Shot 2013-06-03 at 10.29.37 AM

Figure 10. When a datablock in the bottom left is modified, instead of modifying the current block a new block is allocated with the modified contents. The metadata pointing to this new location has to be modified as well, and again instead of modifying the current metadata block, a new metadata block is allocated. These changes ripple up the structure such that the current root block points to the filesystem at the current point in time while the previous root block points to the filesystem at the previous point in time.


Allocate on write has many similar properties with EMC’s VNX copy on write and NetApp’s WAFL systems, but the way allocate on write has been implemented in ZFS eliminates the boundaries found in both. With ZFS there is no practical size limitations to snapshots, no practical limit to the number of snapshots, and snapshots are almost instantaneously and practical zero space (on the order of a few kilobytes).

ZFS was developed by Sun Microsystems to address the limitations and complexity of filesystems and storage. Storage capacity is growing rapidly, yet filesystems have many limitations on how many files can be in a directory or how big a volume can be. Volume sizes are predetermined and have to be shrunk or expanded later depending on how far off the original calculation was, making capacity planning an incredibly important task. Any requirement to change filesystem sizes could cause hours of outages while filesystems are remounted and fsck is run. ZFS has no need for filesystem checks because it is designed to always be consistent on disk. The filesystems can be allocated without size constraints because they are allocated out of a storage pool that can easily be extended on the fly. The storage pool is a set of disks or LUNs. All disks are generally assigned to one pool on a system, and thus all ZFS filesystems using that pool have access to the entire space in the pool. More importantly, they have access to all the I/O operations for the spindles in that pool. In many ways, it completely eliminates the traditional idea of volumes.

On a non-ZFS filesystem the interface is a block device. Writes are done per block and there are no transaction boundaries. In the case of a loss of power or other critical issue there is also a loss of consistency. While the inconsistency issues have been addressed by journaling, that solution impacts performance and can be complex.

In a ZFS filesystem all writes are executed via allocate on write, and thus no data is overwritten. Writes are written in transaction groups such that all related writes succeed or fail as a whole, alleviating the need for fsck operations or journaling. On-disk states are always valid and there are no on-disk “windows of vulnerability”. Everything is checksummed and there is no silent data corruption.

Screen Shot 2013-06-03 at 11.11.33 AM

Figure 11. Comparison of non-ZFS filesystems on top and ZFS filesystems on the bottom. The ZFS filesystems are created in a storage pool that has all the available spindles, giving filesystems access to all the storage and IOPS from the entire pool. On the other hand, the non-ZFS filesystems are created on volumes and those volumes are attached to a specific set of spindles, creating islands of storage and limiting the IOPS for each filesystem.

Excepting certain hardware or volume manager specific software packages, the general comparison between non-ZFS and ZFS filesystems is as follows:

Filesystem (non-ZFS)

  • One filesystem per volume
  • Filesystem has limited bandwidth
  • Storage is stranded on the volume

ZFS Filesystem

  • Many filesystems in a pool
  • Filesystems grow automatically
  • Filesystems have access to all bandwidth

Along with many filesystem improvements, ZFS basically has moved the size barrier beyond any existing hardware that has yet been created and has no limitations on the number of snapshots that can be created. The maximum number of snapshots is 2^64 (18 quintillion) and the maximum size of a filesystem is 2^64 bytes (18.45 Exabytes).

A ZFS snapshot is a read-only copy of a filesystem. Snapshot creation is basically instantaneous and the number of snapshots is practically unlimited. Each snapshot takes up no additional space until original blocks become modified or deleted. As snapshots are used for clones and the clones are modified, the new modified blocks will take up additional space. A clone is a writeable copy of a snapshot. Creation of a clone is practically instantaneous and for all practical purposes the number of clones is unlimited.

Snapshots can be sent to a remote ZFS array via a send and receive protocol. Either a full snapshot or incremental changes between snapshots can be sent. Incremental snaps generally send and receive quickly and can efficiently locate modified blocks.

One concern with allocate on write technology is that a single block modification can set off a cascade of block allocations. First, the datablock to be modified is not overwritten but a new block is allocated and the modified contents are written into the new block (similar to copy on write). The metadata that points to the new datablock location has to be modified; but again, instead of overwriting the metadata block, a new block is allocated and the modified data is written into the new block. These changes cascade all the way up the metadata tree to the root block or uber block (see Figure 10). Thus for one data block change there can be 5 new blocks allocated. These allocations are quick as they take place in memory, but what happens when they are written out to disk? Blocks are written out to disk in batches every few seconds for non-synchronous writes. On an idle or low activity filesystem a single block change could create 5 writes to disk, but on an active filesystem the total number of metadata blocks changed will be small compared to the number of datablocks. For every metadata block written there will typically be several datablocks that have been modified. On an active filesystem typically a single metadata block covers the modifications of 10 or 20 datablocks and thus the extra number of blocks written to disk is usually on the order of 10% the actual metadata block count.

Screen Shot 2013-06-03 at 10.30.46 AM

Figure 12. The flow of transaction data through in-memory buffers and disk.

But what happens for sync writes that can’t wait for block write batches that happen every few seconds? In those cases the sync writes must be written out immediately. Sync writes depend on another structure called the ZFS Intent Log (ZIL). The ZIL is like a database change log or redo log. It contains just the change vectors and is written sequentially and continuously such that a synchronous write request for a datablock change only has to wait for the write to the ZIL to complete. There is a ZIL per filesystem, and it is responsible for handling synchronous write semantics. The ZIL creates log records for events that change the filesystem (write, create, etc.). The log records will have enough information to replay any changes that might be lost in memory in case of a power outage where the block changes in memory are lost. Log records are stored in memory until either:

  • Transaction group commits
  • A synchronous write requirement is encountered (e.g. fsync() or O_DSYNC)

In the event of a power failure or panic, log records are replayed. Synchronous writes will not return until ZIL log records are committed to disk.

Another concern is that blocks that were initially written sequentially next to each other may end up spread over the disk after modifications to those blocks due to the updates resulting in a new block being allocated to a different location. This fragmentation has little effect on random read workloads but multiblock reads can suffer from this because a simple request for a continuous number of blocks may turn into several individual reads by ZFS.

ZFS also introduced the concept of hybrid storage pools where both traditional spinning disks and modern flash-based SSDs are used in conjunction. In general, disks are cheap and large in size but are limited both in latency and throughput by mechanics. Flash devices on the other hand provide I/O requests with latency that is only a small fraction of that of disks; however, they are very expensive per gigabytes. So while it may be tempting to achieve the best possible performance by putting all data on SSDs, this is usually still too cost prohibitive. ZFS allows mixing these two storage technologies in a storage pool, after which the ZIL can be placed on a mirror of flash devices to speed up synchronous write requests where latency is crucial.

Another use for SSDs in ZFS is for cache devices. ZFS caches blocks in a memory area called the Adaptive Replacement Cache—also the name of the algorithm used to determine which blocks have a higher chance of being requested again. The ARC is limited in size by the available system memory; however, a stripe of SSD devices for a level 2 ARC can be configured to extend the size of the cache. Since many clones can be dependent on one snapshot, being able to cache that snapshot will speed up access to all the thin clones based off of that snapshot.

Screen Shot 2013-06-03 at 10.30.57 AM

Figure 13. A storage pool with an SSD caching layer and ZFS Intent Log for syncing.

With these capabilities in mind, there are several methods available to use this technology for database thin clones:

  • Open Source ZFS snapshots and clones
  • ZFS Storage Appliance from Oracle with RMAN
  • ZFS Storage Appliance from Oracle with Dataguard

(Open) Solaris ZFS

ZFS is available in a number of operating systems today. It was released in Solaris 10 and has gained even more features and importance in Solaris 11. After the acquisition of Sun by Oracle, the OpenSolaris project was abandoned but the community forked a number of open source projects, the most notable of which is Illumos and OpenIndiana. These releases are still actively being developed and maintained. Many commercial products are built on these open source projects.

Any one of these systems can be used to build your own ZFS based storage system to support thin cloning:

  • Database storage on local ZFS
  • ZFS storage as an NFS filer
  • ZFS storage as an iSCSI/block storage array

When a database is already running on Solaris with local disks, a ZFS filesystem can be used to hold all database files. Creating snapshots and clones on that filesystem is a simple matter of using a few ZFS commands; however, one does not have to bother with storage protocols like NFS. If Solaris is in use and datafiles are on ZFS anyways, it may also be a good idea to automate regular snapshots as an extra layer of security and to enable a “poor man’s flashback database”.

When a database is not running locally on a Solaris server, you can still benefit from ZFS features by building your own ZFS storage server. You can share ZFS volumes via iSCSI or fibre channel and use ASM on the database server for datafiles but instead we will focus on the easier setup with ZFS filesystems and the NFS protocol to share the volumes.

On a Solaris Storage server

  • Create a zpool (ZFS pool)
  • Create a ZFS filesystem in the pool
  • Export that filesystem via NFS

On the source database server

  • Mount the NFS filesystem
  • Put datafiles on the NFS mount as one of:
    • “live” data (this may have performance implications)
    • backup image copies (or an RMAN clone)
    • a replication target

On the Solaris Storage server

  • Take snapshots whenever necessary
  • Create clones from the snapshots
  • Export the clones via NFS

On the target database server

  • Mount NFS clones
  • Use this thin clone

ZFS Storage Appliance with RMAN

Oracle sells a ZFS storage appliance preconfigured with disks, memory, ZFS filesystem, and a powerful monitoring and analytics dashboard. One of these appliances can be used to create database thin clones; in fact, Oracle has published a 44-page white paper outlining the steps (found at In brief, the steps involved are:

On the ZFS Appliance

  • Create a “db_master” project
  • Create a “db_clone” project
  • For both the “db_clone” and “db_master” project, create 4 filesystems:
    • datafile
    • redo
    • archive
    • alerts

On the Source Database

  • Mount a directory from the ZFS Appliance via NFS
  • Backup the source database with RMAN to the NFS mount directory

On the ZFS Appliance

  • Select the “db_master” project
  • Snapshot the “db_master” project
  • Clone each filesystem on “db_master” to the “db_clone” project

On the Target Host

  • Mount the 4 filesystems from the db_clone project via NFS
  • Startup the clone database on the target host using the directories from the db_clone project mount via NFS from the ZFS storage appliance

Screen Shot 2013-06-03 at 10.31.06 AM

Figure 14. A diagram of the procedure used to clone databases using the ZFS storage appliance and RMAN. First a directory is mounted on the source machine from the ZFS storage appliance via NFS. Then an RMAN backup is taken of the source database onto the NFS mounted directory. The snapshot can be taken off the RMAN backup on the ZFS storage appliance and then used to create thin clones.

ZFS Storage Appliance with DataGuard

One way to efficiently address getting changes from a source database onto a ZFS storage appliance is by using Dataguard as outlined in Oracle’s white paper on Maximum Availability Architecture (MAA) DB Cloning. You can find the document at the following link:

The concept revolves around using Dataguard to host the datafiles from a Dataguard instance on the ZFS storage appliance. With the datafiles hosted in ZFS, all changes from the source database will be propagated to the ZFS Storage Appliance via the Dataguard instance. Once the Dataguard datafiles are hosted on the ZFS storage appliance, the snapshots of the datafiles can easily be taken at desired points in time and clones can be made from the snapshots. The ZFS clones can be used to start up database thin clones on target database hosts by mounting those datafiles via NFS to the target hosts.

Screen Shot 2013-06-03 at 10.31.14 AM

Figure 15. Using Dataguard, files can be shared with a ZFS storage appliance via NFS to use for thin cloning of a target database.


Database Thin Cloning: WAFL (Netapp)

May 30th, 2014

Write Anywhere File Layout (WAFL)

With EMC, thin cloning can only be achieved by using backup technology; in essence, the process has to be architected manually in order to support databases. How can the same goals be achieved but with database thin cloning specifically in mind?

A more seamless approach to database thin cloning is SnapManager for Oracle (SMO) and SnapManager for SQL Server offered by NetApp. NetApp employs a technology called Write Anywhere File Layout (WAFL) that sounds on the surface like EMC VNX copy on write but is different.  WAFL has been around far longer and has a track record of being used for database thin cloning. WAFL allows quick, easy, and efficient snapshots to be taken of a filesystem. New writes don’t overwrite previous blocks with WAFL; instead, the new writes go to a new location. With this architecture it is easy to snapshot files, filesystems or LUNs in minutes.

Up to 255 snapshots can be created from a single LUN. (The 255 limitation is  per volume) An entire LUN can be the source of a snapshot, or snapshots can be made of specific sets of files. Along with the quick and easy snapshot technology, NetApp provides a feature called SnapMirror that will propagate snapshots to a secondary filer. The secondary filer in turn can use a feature called FlexClone that can be used to create clones.

Clones created in this manner will share duplicate blocks and thus can be used to create database thin clones on a secondary filer. The snapshots on the source array can be managed specifically for databases with NetApp Snapshot Manager for Oracle (SMO), or Snapshot Manager for SQL Server. SMO connects to the database, and in the case of Oracle will put all tablespaces in hot backup mode before taking snapshots then take them out of hot backup mode when the snapshot is complete. Information about the snapshots is tracked and managed within SMO inside an Oracle database that serves as a repository.

The technology involved with snapshot cloning in WAFL is solid but very component heavy. On top of the components already listed is a required installation on the target array called NetApp SnapDrive for UNIX. Snapshots are propagated to the secondary array with SnapMirror but a feature called Protection Manager manages the process. A critical step in cloning operations is correctly synchronizing the snapshot schedule of SMO with the transfer schedule of Protection Manager so that the same retention class is maintained on the source and target arrays. On the destination array it is important to manage and track how many clones are made and which snapshot is used for the basis of each clone. If more than 255 clones are made of a single LUN, the next clone will no longer be a logic (virtual) clone sharing duplicate data blocks but a physical clone with a completely new copy of the datafiles.

Screen Shot 2013-06-03 at 10.29.29 AM

Figure 8. Using NetApp filer technologies including WAFL, SnapMirror, SMO, and FlexClone to create thin provisioned database clones.

An important consideration on WAFL volumes on NetApp is the aggregate pool. The aggregate pool defines which LUNs will be included in a snapshot. The size limitation on this pool varies between 16TB and 100TB depending on the model of the NetApp array. The limits on the size of this pool and the limit of 255 snapshots should be considered when evaluating the capabilities of SMO and FlexClone on NetApp.



Screen Shot 2013-06-05 at 8.11.09 AM

Screen Shot 2013-06-05 at 8.11.18 AM

an interesting discussion of Netapp vs EMC filesystem snapshots:


Database Thin Cloning: Copy on Write (EMC)

May 29th, 2014

Copy on Write

Copy on write is a storage or filesystem mechanism that allows storage or filesystems to create snapshots at specific points in time. Whereas Clonedb is a little known and rarely used option, storage technologies are widely known and used in the industry. These snapshots maintain an image of a stroage a specific point in time. If the active storage makes a change to a block, the original block will be read from disk in its original form and written to a save location. Once the block save is completed, the snapshot will be updated to point to the new block location. After the snapshot has been updated, the active storage datablock can be written out and overwrite the original version.

Screen Shot 2013-06-03 at 10.28.39 AM

Figure 4. This figure shows storage blocks in green. A snapshot will point to the datablocks at a point in time as seen on the top left.

Screen Shot 2013-06-03 at 10.28.44 AM

Figure 5. When the active storage changes a block, the old version of the block has to be read and then written to a new location and the snapshot updated. The active storage can then write out the new modified block.

Using storage snapshots, an administrator can snapshot the storage containing datafiles for the database and use the snapshot to create a clone of a source database. With multiple snapshots, multiple clones with shared redundant blocks can be provisioned.

On the other hand, if the source database is an important production environment then creating clone databases on the same storage as the production database is generally not a good practice. A strategy that allows the cloned database files to be stored off of the production storage environment will be more optimal for performance and stability.

EMC Snapshot with BCV

EMC has a number of technologies that can create database thin clones. In the simplest case the clone databases can share the same storage as the source databases using snapshots of the storage. The storage snapshot can be taken and used to make a thin clone. EMC supports up to 16 writeable storage snapshots allowing up to 16 thin clones of the same source datafiles (while sharing the same storage as the source database). If the source database consists of several LUNs then snapshots must be taken of the LUNs at the same point in time. Taking consistent snapshots of multiple LUNs at the same point in time requires the EMC Timefinder product that will manage taking snapshots of multiple LUNs at the same point in time.

Taking load off of production databases and protecting production databases from possible performance degradation is an important goal of cloning. By taking snapshots of the production LUNs one incurs an extra read and extra write for every write issued by the production database. This overhead will impact both production and the clone. On top of the extra load generated by the snapshots, the clones themselves create load on the LUNs because of the I/O traffic they generate.

In order to protect the performance of the production database, clones are often provisioned on storage arrays that are separate from production. In the case where production LUNs are carved out of one set of isolated physical disk spindles and another set of LUNs are carved out of a separate set of physical spindles on the same array, it may be acceptable to run the clones within the same array. In this case, Business Continuance Volumes (BCV) can be used to mirror production LUNs onto the LUNs allocated for the clones. Then shapshots can be taken of the mirrors and those snapshots can be used for thin clones; or, in order to protect the production LUNs from the overhead generated by snapshots, the BCV mirrors can be broken and the LUNs allocated for cloning can be used to start up thin clone databases. Filesystem snapshots can be used to clone up to 16 thin clone databases using the LUNs mirrored from production.

More often than not, however, snapshots are taken of BCVs or the BCVs are broken and then copied to a second non-production storage array where snapshots can be taken and clones provisioned off of the snapshots. In this case, though the EMC environment is limited to only 16 clones and if those clones are from yesterday’s copy of production, then a whole new copy of production has to be made to create clones of today’s copy of production. This ends up taking more storage and more time, which goes against the goal of thin cloning.

EMC’s goal has been backup, recovery, and high availability as opposed to thin cloning; however, these same technologies can be harnessed for thin cloning.

The steps to set this configuration up on EMCs system are:

  1. Create BCVs and then break the BCVs
  2. Zone and mask a LUN to the target host
  3. Perform a full copy of the BCV source files to target array
  4. Perform a snapshot operation on target array
  5. Startup database and recover using  the target array

Screen Shot 2013-06-03 at 10.29.07 AM

Figure 6. Timefinder is used to snapshot multiple LUNs from the production filer to the non-production filer to be used for thin provision clones.

EMC is limited to 16 writeable snapshots and shapshots of snapshots (also known as branching) is generally not allowed. On some high-end arrays it may be possible to take a single snapshot of a snapshot, but not branch any deeper.


While copy on write storage snapshots are limited to 16 snapshots, there are other options available in order to increase the number and to enable branching of oclones. EMC has another technology called VNX which improves upon previous Snapview snapshots. The VNX technology:

  • requires less space
  • has no read+write overhead of copy on first write (COFW)
  • makes snapshot reads simpler
  • supports clones of clones (branching)

When the older Snapview snapshots were created they required extra storage space at creation time. The newer VNX snapshots don’t require any extra storage space when they are created. The older COFW feature caused more writes for the storage than before the snapshot was in place. With newer VNX Snapshots the storage writes become Redirect on Write (ROW) where each new active storage modification is written to a different location with no extra read or write overhead.

Another benefit of VNX is how blocks are read from the source LUNs: in  the older Snapview, reads from snapshot had to merge data from the storage with the Reserve LUN Pool (RLP) where the original data blocks that have been modified are kept. With the newer VNX the snapshot data is read directly from the snapshot source LUN.

EMC’s Timefinder capability is also no longer necessary with VNX. Up to 256 snapshots can be taken in a VNX environment, and snapshots can be made of multiple LUNs simultaneously without needed additional software capabilities to create a consistent copy.

Despite all the improvements on VNX, VNX is still considered a lower end storage solution as compared to Symmetrix arrays that have all the short comings described above.

VNX relaxes some of the constraints of the older Snapview clones; however, in both cases the problem of efficiently bringing new changes from a source array to arrays used for development still exists. After a copy is brought over to a target array from source database LUNs, changes on the source (fresh data) cannot easily be brought over to the target array without a full new copy of the source database. Multiple point in time snapshots are also difficult, as having a target database on the development array share duplicate blocks with another version of the target database (different point in time) is impossible with this architecture. Instead, multiple copies will take up excess space on the target array, and none of the benefits of block sharing in cache or on disk will apply if multi-versioned clone databases are required.

EMC Snapshots with SRDF and Recover Point

A major challenge of both BCVs and VNX is keeping the remote storage array used for clones up to date with the source database. EMC has two solutions to this challenge; each provides a way of continuously pulling in changes from the source database into the second storage array in order to keep it up to date and usable for refreshed databases:

  • Symmetrix Remote Data Facility (SRDF)
  • RecoverPoint

SRDF streams changes from a source array to a destination array on Symmetric storage arrays only.

RecoverPoint is a combination of a RecoverPoint Splitter and a RecoverPoint appliance. The splitter splits writes, sending one write to the intended destination and the other to a RecoverPoint appliance. The splitter can live in the array, be fabric based, or host based. Host based splitting is implemented by installing a device driver on the host machine and allows RecoverPoint to work with non-EMC storage; however, because the drivers are implemented at the OS level the availability will depend on the operating system that has been ported. The fabric based splitters currently work with Brocade SAN switches and Cisco SANTap. Fabric splitters open up the usage of RecoverPoint with non-EMC storage. The RecoverPoint appliance can coalesce and compress the writes and send them back to a different location on the array or send them off to a different array either locally or in another datacenter.

One advantage of RecoverPoint over SRDF is that SRDF will immediately propagate any changes from the source array to the destination. As with all instant propagation systems if there is a logical corruption on the source (for instance, a table being dropped), it will immediately be propagated to the destination system. With RecoverPoint changes are recorded and the destination can be rolled back to before the point in time of the logical corruption.

SRDF could be used in conjunction with Timefinder snapshots to provide a limited number of consistent point-in-time recovery points for groups of LUNs. RecoverPoint on the other hand can work with consistency groups to guarantee write order collection over a group of LUNs, and provides continuous change collection. RecoverPoint tracks block changes and journals them to allow rolling back target systems in the case of logical corruption or the need to rewind the development system.

Screen Shot 2013-06-03 at 10.29.21 AM

Figure 7. EMC SRDF or RecoverPoint can propagate changes from source filer LUNs to the target filer dynamically, allowing better point in time snapshotting capabilities.

Using SRDF or RecoverPoint allows propagation of changes from a source array to a target array. On the target array, clones can be made from the source database at different points in time while still sharing duplicate blocks between the clones no matter which point in time they came from.

In all these cases, however, there are limits to the snapshots that can be taken as well as technical challenges trying to get the source changes to the target array in an easy and storage-efficient manner.

More information on EMC snapshot technologies can be found via the following website links:


With EMC, thin cloning can only be achieved by using backup technology; in essence, the process has to be architected manually in order to support databases. How can the same goals be achieved but with database thin cloning specifically in mind? See the following blogs on Netapp, ZFS and Delphix.


I’ve been getting questions about how EMC compares with Delphix. Delphix offers technology that is completely missing from EMC arrays

Screen Shot 2014-05-30 at 12.07.53 PM

EMC historically only supports 16 snapshots and no branching. EMC has no tools to transfer changes of a database from the production storage to the development storage. In theory one could use either SRDF which only works between compatible Symmetrix arrays for sending changes from one to the other or they could use Recover Point. Recover Point requires two additional appliances to capture changes on the wire and then play them onto different storage. Neither is setup for databases specifically to take into account things like file system snapshots with putting the database in hot backup mode. I haven’t met anyone with EMC that thinks that EMC could do much of what Delphix does when we explained what we do.
We have 3 parts
  1. Source sync
    • initial full copy
    • forever incremental change collection
    • rolling window of save changes with older replace data purged
  2. DxFS storage on Delphix
    • storage agnostic
    • compression
    • memory sharing of data blocks (only technology AFAIK to do this)
  3. VDB provisioning and management
    • self service interface
    • rolls, security, quotas, access control
    • branching, refresh, rollback
Of these EMC only has limited snapshots which is a part of bullet 2 above but for bullet 2 we also have unlimited, instantaneous snapshots that work on any storage be it EMC, Netapp or JBODs. Also if one is considering a new SSD solution like Pure Storage, Violin, Fuision IO etc, only Delphix can support them for snapshots.  We also compress data by 1/3 typically along data block lines. No one else AFAIK is data block aware and capable of this kind of compression and fast access. There is no detectible overhead for compression on Delphix.
No one in the industry does point 1 above of keeping the remote storage in sync with the changes.Netapp tries with a complex set of products and features but even with all of that they can’t capture changes down to the second.
Finally point 3, provisioning. No one has a full solution except us. Oracle tries to with EM 12c but they are nothing without ZFS or Netapp storage, plus their provisioning is extremely complicated. Installation takes between 1 week to 1 month and it’s brand new in 12c so their are bugs. And it does’t provide provisioning down to any second nor branching etc.

Delphix goes way beyond just data

  • SAP endorsed business solution
  • EBS automated thin cloning of full stack – db, app, binaries
  • Application stack thin cloning

Delphix customers have seen an average application development throughput of 2x.

One SAP was able to expand their development environments from 2 to 6 and increased the project output from 2 projects every 6 months to over 10.

Points to consider

• Storage Flexibility: EMC cloning solutions only work with EMC storage – increasing
lock-in at the storage tier. In contrast, Delphix is storage vendor agnostic and can be
deployed on top of any storage solution. As companies move towards public clouds,
influence over the storage tier vendor diminishes. Unlike EMC, Delphix remains
relevant on-premise and in the cloud (private or public).

• Application Delivery: Database refresh and provisioning tasks can take days to weeks
of coordinated effort across teams. The sheer effort becomes an inhibitor to
application quality and a barrier to greater agility. Delphix is fundamentally designed
for use by database and application teams, enabling far greater organizational
independence. Delphix fully automates various functions like refreshing and
promoting database environments, re-parameterizing init.ora files, changing SIDs, and
provisioning from SCNs. As a result, with Delphix, database provisioning and refresh
tasks can be executed in 3 simple clicks. The elimination of actual labor as well as
process overhead (i.e. organizational inter-dependencies) has allowed Delphix
customers to increase application project output by up to 500%. In contrast, EMC
cloning products increase cross-organizational dependencies and are primarily
designed for storage teams.

• Storage Efficiency: While EMC delivers storage efficiency simply through copy on write
cloning, Delphix adds intelligent filtering and compression to deliver up to 2-4x greater
efficiency (even on EMC storage!). Additionally, most customers realize more value
from other Delphix benefits (application delivery acceleration; faster recovery from
downtime etc.) that EMC does not offer or enable.

• Data Protection and Recovery: While EMC only allows for static images or snapshots
of databases at discrete points in time, Delphix provides integrated log shipping and
archiving. This enables provisioning, refresh, and rollback of virtual copies to any point
in time (down to the second or SCN) with a couple of clicks. It also enables an
extended logical, granular recovery window for edge-case failures and far better RPO
and RTO compared disk, tape or EMC clones. Many Delphix customers have wiped out
the cost of backup storage as well as 3rd party backup tools for databases with this
Delphix “Timeflow” capability.

 2nd Level Virtualization: Delphix can create VDBs (virtual databases) from existing
VDBs, which is extremely valuable given the natural flow of data in application
lifecycles from development to QA to staging etc. For example, a downstream QA
team may request a copy of the database that contains recent changes made by a
developer. EMC cloning tools can only create first generation snapshots of production
databases and do not reflect the real need or data flow within application
development lifecycles.

• Integrated Data Delivery: Many enterprise applications (ex: Oracle EBS, SAP ECC etc.)
are comprised of multiple modules and databases that have to be refreshed to the
same point in time for data warehousing, business intelligence, or master data
management projects. Delphix uniquely supports integrated and synchronized data
delivery to the exact same point in time or to the same transaction ID.

• Resource Management: Delphix offers resource management and scheduling
functionality such as retention period management, refresh scheduling, and capacity
management per VDB that is lacking in EMC’s cloning products. For example, some
VDBs for a specific source database may be retained for a few weeks while specific
quarter ending copies can be retained for extended durations (for compliance).
Delphix also supports prioritizing server resources allocated to process IO requests per
VDB. This is important in environments where DBA teams must meet SLAs that vary by
lines of business or criticality of applications.

 Security and Auditability: Physical database copies and EMC clones alike constantly
proliferate and increase the risk of audit failures and data breaches when sensitive
data is involved. Delphix delivers a full user model, centralized management, retention
policies (for automated de-provisioning), and complete auditing for VDBs. Delphix also
integrates with homegrown and 3rd party masking tools so virtual copies can be
centrally obfuscated – avoiding tedious masking steps per copy.

• V2P (Virtual to Physical): In the event that customers experience downtime across
primary and standby databases, Delphix can quickly convert a VDB (from any point in
time) from virtual to physical form to minimize the cost and risk of downtime. This
provides an extended recovery layer and also a quick path to creating physical copies
for other purposes like performance testing.



Database Thin Cloning : clonedb (Oracle)

May 28th, 2014

A production database is full of data that makes sense for its purpose, whether the database is 10GB or 10TB.
Now if you take that database and clone it for QA or development suddenly the data is cumbersome, unnecessary. Terabytes of disk are provisioned simply to hold a replica for the purpose of testing a small subset of it. An entire architecture with its myriad support structures, historical data, indexes, and large objects cloned and ready just to be partially used, trashed, and rebuilt. This is waste, both of time and storage resources. To the business, having a duplicate environment makes absolute sense; however, from an IT point of view the repeated duplication of storage space and  the time drain it cause just makes no sense at all.

When database copies are made from the same master database (the source environment , the master database, being used for cloning), typically 95% or more of the blocks are duplicated across all copies. In a QA, development, and reporting environments there will almost always be some changes exclusive to the cloned system; however, the amount is usually extremely small compared to the size of the source database. The unchanged blocks are redundant and take up massive amounts of disk space that could be saved if the blocks could somehow be shared.

Yet sharing duplicate blocks is not an easy feat in most database environments. It requires a technology that can act as a foundational cornerstone that coordinates and orchestrates access to duplicate blocks. At the same time, it requires the database copies to be writable with their own private modifications that are hidden from the source or other clones made from the source.

There are several technologies available in the industry that can accomplish block sharing across database clones. The primary technology involves filesystem snapshots that can be deployed across multiple clones. This concept is known as thin cloning, and it allows filesystems to share original, unmodified blocks across multiple snapshots while keeping changes on the target clones private to the clone that made the changes.

But as with all new technologies, there are many methods available that all accomplish the same task. Likewise with the taks of  thin cloning, there are multiple vendors and methods that provide this technology.

Thin Cloning Technologies

The main requirement of database cloning is that the database files and logs must be in a consistent state on the copied system. This can be achieved either by having datafiles in a consistent states (via a cold backup) or with change logs that can bring the datafiles to a consistent state. The cloning process must also perform prerequisite tasks like producing startup parameter files, database definition files, or other pre-creation tasks for the target database to function. For example, Oracle requires control files, password files, pfile/spfiles, and other pre-created components before the target can be opened for use. Controlfiles, logfiles, and datafiles together constitute a database that can be opened and read by the appropriate DBMS version.

In order to share data between two distinct, non-clustered instances of a database, the two instances must believe they have sole access to the datafiles. Modifications to the datafiles in one instance (a production database, for example) cannot be seen by the other instance (a clone) as it would result in corruption. For thin clones, datafiles are virtual constructs that are comprised of the shared common data and a private area specific to the clone.

There are a number of technologies that can be leveraged to support thin cloning. These technologies can be broken down into

  • Application software made to manage access to shared datafile copies
  • Copy-on-write filesystem snapshots
  • Allocate-on-write filesystem snapshots

The difference in these technologies is substantial and will determine how flexible and manageable the final thin cloning solution will be.

Software Managed Thin Cloning

Because it is part of the existing stack, one approach to sharing databases is to have the database software itself as the thin cloning technology. In this scenario the DBMS software itself orchestrates access to a shared set of database files while modifications are written to an area that is private to each database clone. In this way the clones are managed as part of the DBMS software itself. This is the approach Oracle has taken with Clonedb, a software model introduced in Oracle 11gR2 patchset 2.


Clonedb manages the combination of shared and private data within an Oracle environment. By maintaining a central set of read only datafiles and a private area for each clone. The private area is only visible to each clone, guaranteeing that cross-clone corruption cannot occur. The Oracle RDBMS software orchestrates access to the read only datafiles and private areas maintained by the clones.

Oracle’s Clonedb is available starting in Oracle version and higher. The cloned option takes a set of read only datafiles from an RMAN backup as the basis for the clone. Clonedb then maps a set of ‘sparse’ files to the actual datafiles. These sparse files represent the private area for each clone where all changes are stored. When the clone database needs a datablock, it will look in the sparse file; if it is not found there, then the cloned database looks in the underlying read only backup datafile. Through this approach, many clones can be created from a single set of RMAN datafile backups.

The greatest benefit of this technology is that by sharing the underlying set of read only datafiles, the common data shared between each clone does not have to be replicated. Clones can be created easily and with minimal storage requirements. With time and space no longer a constraint, cloning operations become far more efficiently and with minimal resource requirements.

Many of the business and operational issues summarized in chapters 1 and 2 of this book can be alleviated with this technology. For instance, DBAs can use Clonedb to provision multiple developer copies of a database instead of forcing developers to share the same data set. By using multiple developer copies many delays and potential data contamination issues can be avoided, which speeds development and efficiency during application development. Developers will be able to perform their test work, validate it against his or her private dataset, and then commit their code and merge it into a shared development database. The cloned environment can be trashed, refreshed, or handed over to another developer.

On the other hand, Clonedb requires that all clones be made from the source database at the same point in time. For example, if Clonedb is used to provision databases from an RMAN backup taken a day ago and developers want clones of the source database as it is today, then an entire new RMAN backup must be made. This dilutes the storage and timesavings advantage that Clonedb originally brought. While it is possible to use redo and archive logs to bring the previous day’s RMAN backups up to speed (all changes from the last 24 hours would be applied to yesterday’s RMAN datafile copies), the strategy would only work efficiently in some cases. The farther the clone is from the original RMAN datafile copies, the longer and more arduous the catching up process would be, resulting in wasted time and resources.

Clonedb functionality is effective and powerful in some situations, but it is limited in its ability to be a standard in an enterprise-wide thin cloning strategy.

 Screen Shot 2013-06-03 at 10.28.11 AM

Figure 1. Sparse files are mapped to actual datafiles behind the scenes. The datafile backup copy is kept in a read only state. The cloned instance (using Clonedb) first looks for datablocks in the sparse file. If the datablock is not found, it will then read from the RMAN backup. If the Clonedb instance modifies any data, it will write the changes to the sparse file.

In order to implement Clonedb, Oracle or higher is required. Additionally, Direct NFS (dNFS) must be used for sparse files. The sparse file is implemented on a central NFS mounted directory with files that can be accessed via Oracle’s direct NFS implementation.

To create this configuration, the following high-level steps must be taken:

  • Recompile the Oracle binaries with Oracle dNFS code
  • Run the script, available through Metalink Document 1210656.1
  • Startup the cloned database with the startup script created by

The syntax for is relatively simple: initSOURCE.ora create_clone.sql

Three environment variables must be set for the configuration:


Once is run, running the output file generated by the script will create the dabase clone.

sqlplus / as sysdba @create_clone.sql

The create clone script does the work in four basic steps:

  1. The database is started up in nomount mode with a generated pfile (initclone.ora in this case).
  2. A custom create controlfile command is run that points to the datafiles in the RMAN backup location.
  3. Maps the sparse files on a dNFS mount to the datafiles in the RMAN backup location. For instance: dbms_dnfs.clonedb_renamefile(‘/backup/file.dbf’, ‘/clone/file.dbf’);
  4. The database is brought online in resetlogs mode: alter database open resetlogs;

Screen Shot 2013-06-03 at 10.28.22 AM

Figure 2. This image shows that multiple Clonedb instances can share the same underlying RMAN backup. Each Clonedb instances writes its changes to its own private sparse files

Screen Shot 2013-06-03 at 10.28.31 AM
Figure 3. A graphical outline of the process. An RMAN backup is taken of the source database and placed in a location where the Clonedb instances can access them (in this case, an NFS mount). A Clonedb instance can be set up on any host that has access to the NFS filer via dNFS. The Clonedb instances will create sparse files on the NFS filer. The sparse files map to the datafiles in the RMAN backup.

NOTE:    If Clonedb instances are going to be created from two different points in time, then a new RMAN backup has to be taken and copied to the NFS server before it can be used as the source for new clones as shown in Figure 3.

Because Clonedb adds an extra layer of code that requires reads to both the sparse files over dNFS and the RMAN datafile backups, there is a performance hit for using Clonedb. The biggest drawback is the requirement for multiple copies of source databases in order to create clones from different points in time, which diminishes the storage savings. The Clonedb functionality is a powerful option that should be in the back pocket of any Oracle DBA but it has limited use for an automated strategy involving thin cloning.


The best write-up on clonedb is Tim Hall’s blog post at


Which to tune ? Application, Database or Hardware ?

May 27th, 2014

I. “The Database is hanging!” AKA “the application has problems, good grief”

I wonder if you can imagine, or have had the experience of the application guys calling with anger and panic in their voices saying “the database is sooo slow, you’ve got to speed it up.”
Screen Shot 2014-05-21 at 8.08.47 AM
What’s your first reaction? What tools do you use? How long does it take to figure out what’s going on?
Screen Shot 2014-05-21 at 8.08.47 AM
When I get a call like this I take a look at the database with “Top Activity” screen in Oracle EM or a tool like Lab 128 or  DB Optimizer. All of these tools have a screen that shows Average Active Sessions (AAS)  over time and grouped into their wait groups and all these tools use the same color coding for the wait groups:

Screen Shot 2014-05-21 at 8.08.47 AM

Screen Shot 2014-05-21 at 8.08.47 AM
From the above chart, I can clearly see that the database is not bottlenecked and there must be a problem on the application.
Why do I think it’s the application and not the database? The database is showing plenty of free CPU in the load chart, the largest chart, on the top, in the image above. In the load chart, there is a horizontal red line. The red line represents the number of CPU’s on the system, which in this case is 2 CPUs. The CPU line is rarely crossed by bars which represent the load on the database, measured in average number of sessions or AAS. The session activity is averaged over 5 samples over 5 seconds, thus bars are 5 seconds wide. The bars above fall mostly about 1 AAS and the bars are rarely green. Green represents CPU load. Any other color bar indicates a sessions waiting. The main wait in this case is orange, which is log file sync, ie waits for commits. Why is the database more or less idle and why are most of the waits we do see for “commit”? I look at the code coming to the database and see something like this:
Screen Shot 2014-05-21 at 8.08.47 AM
insert into foo values ('a');
insert into foo values ('a');
insert into foo values ('a');
insert into foo values ('a');
insert into foo values ('a');
insert into foo values ('a');
insert into foo values ('a');
Screen Shot 2014-05-21 at 8.08.47 AM
Doing single row inserts and committing after each is very inefficient. There is a lot of time wasted on network communication which is why the database is mainly idle, when the application thinks it’s running full speed ahead, it is actually waiting mainly on network communication and commits. It’s like going to the grocery store with a long list of things to buy but only buying one and then going home dropping it off and then going back to the grocery store. In this databse example, if we were to commit less and batch the work we send to the database, reducing network communications, we will run much more efficiently. Changing the code to
Screen Shot 2014-05-21 at 8.08.47 AM
for i in 1..1000 loop
insert into foo values ('a');
-- commit;
end loop;
Screen Shot 2014-05-21 at 8.08.47 AM
reduces the communication delay and now we get a fully loaded database but we run into database configuration issues.
Screen Shot 2014-05-21 at 8.08.47 AM
II. It *is* the database (ie DBA get to work)

 Screen Shot 2014-05-21 at 8.08.47 AM

In the above DB Optimizer screen, the same workload was run 4 times, which created 4 large colored areas in the chart labled “Run 1″, “Run 2″, “Run 3″ and “Run 4″.  We can see that the time (width of the load) reduced, and the percent of activity on CPU increased over each succeeding run. What happened differently in each of these runs
Screen Shot 2014-05-21 at 8.08.47 AM
  1. “log file sync” , the orange color, is the biggest color area, which means uses are waiting on commits, still even though we are committing less in the code. In this case we moved the log files to a faster device. (you can see the checkpoint activity in light blue just after run 1 where we moved the log files)
  2. “buffer busy wait” , the burnt red, is the biggest color area. We drilled down on the buffer busy wait event in the Top Event section and the details instruct use to move the table from a normal tablespace to an Automatice Segment Space Managed (ASSM) tablepace.
  3. ”log file switch (checkpoint incomplete)” , the dark brown, is the largest color area, so we increased the size of the log files. (you can see the IO time spent creating the new redo logs just after run 3 )
  4.  the run time is the shortest and all the time is spent on the CPU which was our goal – ie to take advantage of all the processors and run the batch job as quickly as possible.
Screen Shot 2014-05-21 at 8.08.47 AM
III.  It’s the machine (rock paper scissors)
Screen Shot 2014-05-21 at 8.08.47 AM
Now that the application is tuned and the database is tuned let’s run a bigger load:
Screen Shot 2014-05-21 at 8.08.47 AM
Screen Shot 2014-05-21 at 8.08.47 AM
We can see that the CPU load is constantly over the max CPU line. How can we have a bigger CPU load than there are actually CPUs on the machine? Because, this actually means that the demand for CPU is higher than the CPU available on the machine. In the image above there are 2 CPUs on the machine but and average of 3 users who think they are on the CPU, which means that on average 1 users is not really on the CPU but ready to run on the CPU and waiting for the CPU.
Screen Shot 2014-05-21 at 8.08.47 AM
At this point we have two options: either add more CPU or tune the application code.  In this case we are only running one kind of load, ie the insert.
Screen Shot 2014-05-21 at 8.08.47 AM
If we look at the bar chart to the bottom right of the load chart, we see that the top bar is much larger than the ones below this. This bar represents the SQL statement with the largest load. In this case our insert statement is the highest load SQL so we can concentrate on tuning that statement.
Screen Shot 2014-05-21 at 8.08.47 AM
For inserts we can actually go even further tuning this insert and use Oracle’s bulk load commands:
Screen Shot 2014-05-21 at 8.08.47 AM
for i in 1..8000 loop
end loop;
INSERT INTO foo ( dummy )
VALUES ( MY_IDX(indx) );
Screen Shot 2014-05-21 at 8.08.47 AM
Using the above code the load runs almost instantaneously and creates almost no visible load in the load chart!

Screen Shot 2014-05-21 at 8.08.47 AM

What if on the other hand there was no out standing SQL taking up most of the load ? and instead the SQL load was well distributed across the system? Then we’d have a case for adding more hardware to the system. Making the decision to add more hardware can be a difficult decision because in general the information to make the decision is unknown, unclear or just plain confusing, but the “Top Activity” chart along with top  SQL bar chart makes it easy and clear, which can save weeks and months of wasteful meetings and debates. For example

Screen Shot 2014-05-21 at 8.08.47 AM

Screen Shot 2014-05-21 at 8.08.47 AM
If we look in the bottom left, there is no SQL that takes up a significant amount of load, ie there is no outlier SQL that we could tune and gain back a lot of wasted CPU. We’d have to tune many many SQL and make improvements on most of them to gain back enough CPU to get our load down below the max CPU line. In this case, adding CPUs to the machine might be the easiest and most cost affective solution.
Screen Shot 2014-05-21 at 8.08.47 AM
Screen Shot 2014-05-21 at 8.08.47 AM
With the load chart we can quickly and easily identify the bottlenecks in the database, take corrective actions, and see the results. In part I we had an application problem, in part II we had 3 database configuration issues and in part III we had a hardware sizing issue. In all 3 cases the load cahrt provides a clear and easy presentation of the data and issues making solutions clear.


Data masking made easy with agile data

May 23rd, 2014


How long does it take you to make a copy of production, find the sensitive data, mask that data and then make multiple copies for development and QA? Hours? Days? Weeks? Months? How about a few minutes.

Check out the new data masking module in the Delphix agile data appliance.



Graphing Oracle v$sysstat

May 23rd, 2014

One thing I’ve failed to clearly document is that my tool ASHMON not only graphs ASH data but graphs v$sysstat as well.

To access statistics in ASHMON click on the “functions” menu and choose “stats”. ( the other choices are “ash init” which goes the the ASH data graph and the third choice is “waits” which graphs data from v$session_event)

Graphing v$sysstat can be overwhelming as the number of statistics is in the hundreds, so what does one look at? The important statistics are what I concentrate on now, but that list is another discussion and up for debate. On the other hand when one wants to explore the stats over time how can one do it in a way that is manageable?

For manageability in ASHMON, I grouped the stats  into 20 groups. Having 20 groups is still too much to display, so none are shown by default. By default one has to click on the group button to see it. One can click on as many groups as they way. The first click shows the group and the second click hides.

Inside a group it can be hard to see which line is which statistics, so by passing the mouse over the stat, the line is highlighted.

Also a statistics can have high values which hide the the activity in stats with lower values. To  address this there are three options.  To access these options right click on the statistic and get a popup menu and either chose

  • hide
  • log scale
  • resize

I typically use “hide” and get rid of any values that are too big or not of interest. Once they are gone, a right click in the legend brings back up the menu and I can pick

  • show

which will redisplay any hidden stats.

Another option is “log scale” which will  make it easier to see movement across different orders of magnitude. A final option is “resize”. Resize will offer the option of changing the Y axis scale, but I rarely do this as the graph autosizes to the maximum point.

Stats with “table”, “cursor” and “enqueue” selected (the groups selected are highlighted in red at the top)

A different set of stats selected, in this case  session (ses), transaction (trnx),  CPU, log file I/O (log_io) and physical I/O (phys_io).

The right click menu shown with log scale, resize, hide and show.

and the ASH screen

The “wait” screen which I never use these days. This is the way I use to show load on the database 10 years ago before ASH data. This screen could be really nice, if I changed the centi_secs per sec, to average wait time per event – maybe I’ll do that soon!f


Virtual Cloning Oracle EBS stack in minutes

May 22nd, 2014

How does one clone in a few minutes and for almost no storage the entire EBS stack including the database, EBS application stack and Oracle binaries, no matter the size?

Watch this quick easy demonstration of doing just that with Delphix:


5 Databases issues that are costing you time and money

May 21st, 2014


Companies rely on applications to supply the information that business uses to make decisions that generate revenue.

Companies are constantly improving applications and creating new applications. Application development depends upon provisioning environments for developers and QA.

The slower and more costly provisioning development and QA environments the more delays, bugs there will be in the development of applications and the less revenue businesses will generate.

Businesses are constantly trimming IT budgets which leads to slower and few environment being provisioned which means more delays in the applications that the business depends on to generate revenue which in turn puts more pressure to trim IT budgets.

As examples

  • costly and slow environment provisioning means most of QA costs are spent on provisioning instead of actually QA’ing applications
  • Slow QA environment provisioning means it takes longer to find bugs and the longer it takes to find bugs the costlier it is to fix them (and sometimes the bugs actually have to go do production because of time pressures)
  • Developers sharing environments means developers block each other over those shared resources slowing down development
  • Developers who have their own synthetic , non-representational of production, environments create bugs that are not caught until late in the development cycle or even in production.
  • Slow environment builds in general mean delays
In an interview with Gene Kim, author of The Phoenix Project, he said

“One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need it“

Delphix agile data platform transforms environment provisioning from an expensive slow process that requires days or weeks and teams of people and TB of disk to an automated, self service process that can be done in minutes for almost no storage overhead.
Screen Shot 2014-05-21 at 8.08.47 AM
With Delphix, all developers  can be given full size copies of the production database and make changes as fast as they want.
Screen Shot 2014-05-21 at 8.08.47 AM
The trouble is how does one integrate all those changes?
Screen Shot 2014-05-21 at 8.08.47 AM
DBmaestro tracks all the changes and and will merge the changes and highlight conflicts.
Screen Shot 2014-05-21 at 8.08.47 AM
DBmaestro and Delphix have just announced a partnership to bring these to powerful and complementary products closer together.
Screen Shot 2014-05-21 at 8.08.47 AM
Learn more about DBmaestro and Delphix on this webinar recording:




Businesses today face a difficult conundrum: industry giants such as Google, Facebook, and others have taught people to expect service that’s reliable and fast, yet budgets are getting tighter by the fiscal year. Instead of being able to provide quality, on-time applications that are bug free and high performance, struggling IT departments are forced to cut back on systems and resources for developers.

The most vital resource for applications can also be one of the most cumbersome: the database. Data is the backbone to our most important applications, and as such it is stockpiled and maintained in a robust database engine—yet the contents can become quite large and unruly. This makes data difficult to manage, particularly when mobility is required. A lack of data mobility can cause severe issues in the lifecycle of a product as it impacts good QA and development.

This problem has become so widespread in fact that development teams have turned to new measures in order to complete their work despite bottlenecks in data management. These measures are collectively summed up as the DevOps movement.

From Wikipedia: DevOps (a portmanteau of development and operations) is a software development method that stresses communication, collaboration and integration between software developers and information technology (IT) professionals. DevOps is a response to the interdependence of software development and IT operations. It aims to help an organization rapidly produce software products and services.

One of the most important aspects of DevOps is release management. Standardized development environments are becoming more common and tools like git and SVN are almost universally usable to ensure a quality release and deployment process. This facilitates the goals of fast development, quality assurance, and rapid feature/maintenance release schedules. There’s only one problem with this: the database doesn’t play along. Yet.

Databases have no release-based data control like developers have for code. With code it is easy to have a trunk with branches, merge in changes, and preserve integrity with a check-out and check-in process. In fact, most databases used for development purposes are not only woefully behind production in terms of data freshness, but also are used for development across multiple releases without any possibility of identifying how the data looked or should look at the time of release. The database is just there—a giant glob of data that doesn’t follow the same rules as the code and often gets in the way of an otherwise efficient process.

In the database world, it’s common to hear the idiom “Cheap, fast, or good – pick two.” Some might even say you can only pick one . In all, the outlook is very bleak.

Here is what we at Delphix are seeing in the industry:

  • Inefficient QA ends up costing companies thousands or millions as bugs make their way into production.
  • QA delays mean extra work for project managers, QA teams, and IT teams as people scramble to find and fix issues and manage project deadlines.
  • Developers and QA teams sharing databases causes bottlenecks that result in difficult coding and testing practices and slower time to delivery.
  • Creating development or QA databases from subsets of production data (usually due to size) results in bugs due to impractical data volumes.
  • Slow provisioning and infrastructure rollout means delays that impact an application from the very foundation on up and result in massive schedule overruns.

Let’s look at these issues in a bit more detail.

 1. Inefficient QA

Screen Shot 2013-09-05 at 5.26.27 PM

One Delphix customer we were talking to spent 94% of every QA cycle building the QA environment and only 6% actually running the QA testing suite. This cycle was repeated every single time a set of features required testing and had to happen for every QA suite. The long pole in the environment build was provisioning the database the code had to run on. The burden was so overwhelming it completely monopolized their allotted time.

2. QA Delays make fixing bugs more expensive

Screen Shot 2013-09-05 at 5.38.40 PM

When building a QA environment takes a long time the integrity of the application suffers. In the above case, if only 6% of QA time is spent on actual testing it is very easy for bugs to go unnoticed. When this happens there can only be two possible outcomes: either the bug finally does get caught in QA but due to the delay in bug capture it requires more code or testing to roll back and fix, or the bug doesn’t go noticed and makes its way into production where it can cost customer confidence or large sums of revenue. The longer it takes for a problem to be found and fixed, the more cost goes up.

Screen Shot 2013-09-05 at 5.55.14 PM

3. Using subset databases causes bugs

Screen Shot 2013-09-05 at 5.27.23 PM

Screen Shot 2013-09-05 at 5.27.31 PM

For most companies it is impossible to give all developers their own full size copy of the production database (at some companies, they can’t even provide one full size copy). As such, they will create subset copies of production for development and QA purposes. This takes a huge amount of time to write and maintain scripts to create data subsets and verify data coherency and referential integrity; in fact, one customer of ours estimates that 50% of their DBA time is dedicated to creating subset database clones. And because the data is not representative of production data, it often leads to the introduction of bugs. When the QA environment is similarly crippled, it may be impossible to isolate and verify a fix to the bug once it is caught. And with a subset, there will be bugs—another customer of ours estimated that 20% of their production bugs were due to using subsets.

Queries that run fine on subsets either run into new anomalies on production data, or hit the wall as performance of a query tanks when run on a full dataset size.

4. Development teams sharing on full copy leads to bottlenecks

Screen Shot 2013-09-05 at 5.27.42 PM

An alternative to subsets (or sadly, sometimes used in combination) is having the development team share a single copy of the production database. Sharing a full copy requires that any changes to the database schema, content, or metadata pass through code review to ensure that the changes won’t break the work of other developers. Instead of developers being able to perform their own work quickly and efficiently and merge it into the final release, they are forced to work in a synchronous manner that wastes time. A large online auction site we talked to said this review cycle took one to two weeks before a developer could get changes into a shared database copy. In today’s fast paced world that is simply unacceptable.

Because so many people are using the shared resource and because that resource is so big and time consuming to build, it can be a very long time between refreshes of the data. This leads to old unrepresentative data, which incurs its own set of risks every bit as dangerous as data subsets.

Screen Shot 2013-09-05 at 5.58.58 PMScreen Shot 2013-09-05 at 5.59.10 PM

 5. Slow environment builds and the culture of “NO”

Screen Shot 2013-09-05 at 5.27.49 PM

A huge IT hurdle to tackle is creating development environments in the first place. Data is growing faster than our infrastructures are improving, and the size of our databases means that cloning or refreshing environments can take days or even weeks to accomplish. Because development environments are incredibly dependent on real-time data, this results in lost productivity and reliability. One large global bank told us that it generally took 6 months to build their environments out. Another customer who used NetApp storage snapshots said it still took them 2-4 days due to their system of entering tickets for requests and then trying to schedule time and resources from the storage admin, system admin, and DBA teams. All of this hassle leads to a culture of “NO”, where developers and other front-line product creators are simply told that meeting their needs is not possible. Sadly, many developers are used to the idea that if they need a copy of the data they will be told “NO” in most cases.

Can I get a copy ?  No !                         Result: Developer motivation

No     forlorn cat

Data is the Problem

Getting the right data to the right people at the right time and in the right form is the problem. The problem is data agility—or more to the point, a lack thereof.

A lot of people think they understand the extent of their data problems. But let’s go through what we’ve learned from our customers in the industry:

  • 96% of QA time is spent on building environments instead of running QA tests
  • 90% of lost development days attributed to waiting for development environments
  • 50% of DBA time spent making database copies
  • 20% of bugs because of using subset data instead of full copies

Budget cuts put pressure on IT teams which limits system resources, storage, and personnel. These limits reduce the number of systems available and increase the time to provision these resources. These limits impact developers and contractors that work on these systems leading to project delays and reduced quality.

The Total Cost of Data

Data is expensive. In most calculations, the total cost of data only includes storage costs; however, storage costs are just the tip of the iceberg. For every GB or TB of storage there are teams of storage admins, system admins, and DBAs who have to manage the storage, move data, backup data, and provision clones for development and other needs. The more data there is to move, the longer it takes and the more expensive it is due to the personnel involved, schedule overruns, and time spent discussing the requirements and tasks in meetings. The total cost of data goes far beyond simple storage costs.

Delphix Solves the Data Problem

I’ve written extensively on Delphix elsewhere. You can see some of the more pertinent cases here:

In a nutshell, Delphix makes creating clone copies so fast that it is nearly unimaginable to even the most modern of IT teams. These clones can be provisioned in only a few minutes no matter what the source database size and require almost no storage overhead because duplicate blocks are shared across all clone copies. Because the clones are fast and almost free they can be provisioned for all the developers; in fact, a developer can easily be given multiple copies of the full production database. The clones can also be made fresh, just a few minutes behind production, in a matter of minutes.

When this technology was explained to one company we work with, the overwhelming response was “No way.” When the first clone of a 7TB database was created in 7 minutes and a second clone (from a time period in the past, no less) was created in an additional 10 minutes the response was, “No freaking way.” The implications of having two full-sized fresh clones to any point in time in a matter of minutes with no additional storage overhead were almost impossible to comprehend.

Database Clone Copies with Delphix: Full , Fresh, Fast, Free

Screen Shot 2013-09-05 at 9.19.25 PM

What the Industry can accomplish with Delphix

  1. Efficient QA: Low cost, high utilization
  2. Quick QA: Fast bug fixes and problem resolution
  3. Developer Agility: Everyone gets a database for parallelized development
  4. Full DBs: Less bugs in development, QA, and production
  5. Fast builds: Fast development and the culture of Yes

1.Efficient QA

Screen Shot 2013-09-05 at 8.55.02 PM

With Delphix, QA environments can be spun up in minutes and branched directly from the development database.

2.Quick QA

Screen Shot 2013-09-05 at 8.55.10 PM

Because QA environments can be spun up quickly, QA can test immediately at the end of a sprint to find bugs quicker and reduce the time and work needed to fix those bugs.

3.Developer Agility

Screen Shot 2013-09-05 at 8.55.31 PM

With Delphix, each developer can have one or more full copies of a database and can refresh with new data in minutes. Developers will no longer be blockers to each other because of a single point of wait.

4.Full DBs

Screen Shot 2013-09-05 at 8.55.38 PM

With Delphix, the development databases are full copies so there are no longer bugs due to small and inadequate representations of real data.

5.Fast Builds

Screen Shot 2013-09-05 at 9.27.17 PM

With Delphix, developers can provision their own clone copies in minutes without having to coordinate with storage admins, system admins and DBAs.

Dev Ops Management with Delphix

A big question is: If each developer has their own database, how do we manage the merging of their changes into a release? This is where DBmaestro comes into play. For example, consider this process:

  1. Delphix: Create 3 Virtual Databases
    1. DEV1
    2. DEV2
    3. TRUNK
  2. Have two developers perform their work
    1. Developer 1 works on DEV1
    2. Developer 2 works on DEV2
  3. DBmaestro performs merges
    1. DEV1 changes move into TRUNK
    2. DEV2 changes move into TRUNK

From a timeline point of view, the process is as simple as it is agile:

Screen Shot 2013-09-05 at 9.32.36 PM

Quality Assurance can easily be a part of this process. The steps are more robust, but with near instant cloning it becomes trivial while remaining thorough:

Screen Shot 2013-09-05 at 9.32.42 PM


See also our previous webinar last year with DB Maestro



How to get dreadful database performance

May 21st, 2014


I love this video on “getting dreadful database performance” by Stephane Faroult.
The video is hilarious and informative:

Screen Shot 2014-05-21 at 8.08.47 AM

Screen Shot 2014-05-21 at 8.08.47 AM
He tackles top 5 keys to dreadful performance

Screen Shot 2014-05-21 at 8.08.47 AM

  1. EAV “schema” 2:10
  2. Unusable indexes 6:00
  3. Hard parsing 7:36
  4. Unitary processing 10:40
    • user functions (instead of join, ex: currency conversion) 12:19
  5. views of views 14:46

Conclusion: most developers are young and clueless about database (java programmers?) and code must be refactored by developers not by DBAs in production databases. Hopefully your developers will get guidance by an experienced (older) expert.
Screen Shot 2014-05-21 at 8.08.47 AM
Screen Shot 2014-05-21 at 8.08.47 AM

In my Experience

Screen Shot 2014-05-21 at 8.08.47 AM

The first topic EAVs was a huge problem at 3 of the last 4 companies I was at, including Oracle’s OEM 10g Grid control, despite solid analysis of pro’s and con’s by John Beresniewisz.Screen Shot 2014-05-21 at 8.08.47 AM

Check out more on the
Screen Shot 2014-05-21 at 8.08.47 AM
Screen Shot 2014-05-21 at 8.08.47 AM

Origins of Stephane’s Video  Style

Screen Shot 2014-05-21 at 8.08.47 AM
Emailing Stephan, I found out that his video was inspired by this fun video (though less related to UI or databases):
Screen Shot 2014-05-21 at 8.08.47 AM
Identity 2.0
Screen Shot 2014-05-21 at 8.08.47 AM
I like this kind of presentation. Fun and informative as well as efficiently re-iterating the important points.

Screen Shot 2014-05-21 at 8.08.47 AM

Screen Shot 2014-05-21 at 8.08.47 AM

My Approach to Tuning SQL

Screen Shot 2014-05-21 at 8.08.47 AM

One of the main things I do to tune SQL is to encourage the optimizer to take path.
I often determine the most important steps to do first and put them in a sub select. I’ve been meaning to write something up about this for years. Unfortunately all the great examples have been lost at customer sites happily left behind after my work was done. Only later did I kick myself for not having saved more. I have not taken the time or energy to come up with some good examples. Well, all is not lost because here is another great video from Stephene Faroult on this exact subject:
Screen Shot 2014-05-21 at 8.08.47 AM
The ideas he lays out

Screen Shot 2014-05-21 at 8.08.47 AM

  • Take the highest seletivity steps and do them first (this is where I often use “inline views”)
  • Sort the least amount of data , ie sort the key column and not all the data columns
  • Join in non-key columns as late as possible
Screen Shot 2014-05-21 at 8.08.47 AM
To summarize, I look for the central data (what Stephen calls the “key”) and put this in an inline view , then sort, then finally join in any missing columns