Database Thin Cloning: Copy on Write (EMC)

Copy on Write

Copy on write is a storage or filesystem mechanism that allows storage or filesystems to create snapshots at specific points in time. Whereas Clonedb is a little known and rarely used option, storage technologies are widely known and used in the industry. These snapshots maintain an image of a stroage a specific point in time. If the active storage makes a change to a block, the original block will be read from disk in its original form and written to a save location. Once the block save is completed, the snapshot will be updated to point to the new block location. After the snapshot has been updated, the active storage datablock can be written out and overwrite the original version.

Screen Shot 2013-06-03 at 10.28.39 AM

Figure 4. This figure shows storage blocks in green. A snapshot will point to the datablocks at a point in time as seen on the top left.

Screen Shot 2013-06-03 at 10.28.44 AM

Figure 5. When the active storage changes a block, the old version of the block has to be read and then written to a new location and the snapshot updated. The active storage can then write out the new modified block.

Using storage snapshots, an administrator can snapshot the storage containing datafiles for the database and use the snapshot to create a clone of a source database. With multiple snapshots, multiple clones with shared redundant blocks can be provisioned.

On the other hand, if the source database is an important production environment then creating clone databases on the same storage as the production database is generally not a good practice. A strategy that allows the cloned database files to be stored off of the production storage environment will be more optimal for performance and stability.

EMC Snapshot with BCV

EMC has a number of technologies that can create database thin clones. In the simplest case the clone databases can share the same storage as the source databases using snapshots of the storage. The storage snapshot can be taken and used to make a thin clone. EMC supports up to 16 writeable storage snapshots allowing up to 16 thin clones of the same source datafiles (while sharing the same storage as the source database). If the source database consists of several LUNs then snapshots must be taken of the LUNs at the same point in time. Taking consistent snapshots of multiple LUNs at the same point in time requires the EMC Timefinder product that will manage taking snapshots of multiple LUNs at the same point in time.

Taking load off of production databases and protecting production databases from possible performance degradation is an important goal of cloning. By taking snapshots of the production LUNs one incurs an extra read and extra write for every write issued by the production database. This overhead will impact both production and the clone. On top of the extra load generated by the snapshots, the clones themselves create load on the LUNs because of the I/O traffic they generate.

In order to protect the performance of the production database, clones are often provisioned on storage arrays that are separate from production. In the case where production LUNs are carved out of one set of isolated physical disk spindles and another set of LUNs are carved out of a separate set of physical spindles on the same array, it may be acceptable to run the clones within the same array. In this case, Business Continuance Volumes (BCV) can be used to mirror production LUNs onto the LUNs allocated for the clones. Then shapshots can be taken of the mirrors and those snapshots can be used for thin clones; or, in order to protect the production LUNs from the overhead generated by snapshots, the BCV mirrors can be broken and the LUNs allocated for cloning can be used to start up thin clone databases. Filesystem snapshots can be used to clone up to 16 thin clone databases using the LUNs mirrored from production.

More often than not, however, snapshots are taken of BCVs or the BCVs are broken and then copied to a second non-production storage array where snapshots can be taken and clones provisioned off of the snapshots. In this case, though the EMC environment is limited to only 16 clones and if those clones are from yesterday’s copy of production, then a whole new copy of production has to be made to create clones of today’s copy of production. This ends up taking more storage and more time, which goes against the goal of thin cloning.

EMC’s goal has been backup, recovery, and high availability as opposed to thin cloning; however, these same technologies can be harnessed for thin cloning.

The steps to set this configuration up on EMCs system are:

  1. Create BCVs and then break the BCVs
  2. Zone and mask a LUN to the target host
  3. Perform a full copy of the BCV source files to target array
  4. Perform a snapshot operation on target array
  5. Startup database and recover using  the target array

Screen Shot 2013-06-03 at 10.29.07 AM

Figure 6. Timefinder is used to snapshot multiple LUNs from the production filer to the non-production filer to be used for thin provision clones.

EMC is limited to 16 writeable snapshots and shapshots of snapshots (also known as branching) is generally not allowed. On some high-end arrays it may be possible to take a single snapshot of a snapshot, but not branch any deeper.

EMC VNX

While copy on write storage snapshots are limited to 16 snapshots, there are other options available in order to increase the number and to enable branching of oclones. EMC has another technology called VNX which improves upon previous Snapview snapshots. The VNX technology:

  • requires less space
  • has no read+write overhead of copy on first write (COFW)
  • makes snapshot reads simpler
  • supports clones of clones (branching)

When the older Snapview snapshots were created they required extra storage space at creation time. The newer VNX snapshots don’t require any extra storage space when they are created. The older COFW feature caused more writes for the storage than before the snapshot was in place. With newer VNX Snapshots the storage writes become Redirect on Write (ROW) where each new active storage modification is written to a different location with no extra read or write overhead.

Another benefit of VNX is how blocks are read from the source LUNs: in  the older Snapview, reads from snapshot had to merge data from the storage with the Reserve LUN Pool (RLP) where the original data blocks that have been modified are kept. With the newer VNX the snapshot data is read directly from the snapshot source LUN.

EMC’s Timefinder capability is also no longer necessary with VNX. Up to 256 snapshots can be taken in a VNX environment, and snapshots can be made of multiple LUNs simultaneously without needed additional software capabilities to create a consistent copy.

Despite all the improvements on VNX, VNX is still considered a lower end storage solution as compared to Symmetrix arrays that have all the short comings described above.

VNX relaxes some of the constraints of the older Snapview clones; however, in both cases the problem of efficiently bringing new changes from a source array to arrays used for development still exists. After a copy is brought over to a target array from source database LUNs, changes on the source (fresh data) cannot easily be brought over to the target array without a full new copy of the source database. Multiple point in time snapshots are also difficult, as having a target database on the development array share duplicate blocks with another version of the target database (different point in time) is impossible with this architecture. Instead, multiple copies will take up excess space on the target array, and none of the benefits of block sharing in cache or on disk will apply if multi-versioned clone databases are required.

EMC Snapshots with SRDF and Recover Point

A major challenge of both BCVs and VNX is keeping the remote storage array used for clones up to date with the source database. EMC has two solutions to this challenge; each provides a way of continuously pulling in changes from the source database into the second storage array in order to keep it up to date and usable for refreshed databases:

  • Symmetrix Remote Data Facility (SRDF)
  • RecoverPoint

SRDF streams changes from a source array to a destination array on Symmetric storage arrays only.

RecoverPoint is a combination of a RecoverPoint Splitter and a RecoverPoint appliance. The splitter splits writes, sending one write to the intended destination and the other to a RecoverPoint appliance. The splitter can live in the array, be fabric based, or host based. Host based splitting is implemented by installing a device driver on the host machine and allows RecoverPoint to work with non-EMC storage; however, because the drivers are implemented at the OS level the availability will depend on the operating system that has been ported. The fabric based splitters currently work with Brocade SAN switches and Cisco SANTap. Fabric splitters open up the usage of RecoverPoint with non-EMC storage. The RecoverPoint appliance can coalesce and compress the writes and send them back to a different location on the array or send them off to a different array either locally or in another datacenter.

One advantage of RecoverPoint over SRDF is that SRDF will immediately propagate any changes from the source array to the destination. As with all instant propagation systems if there is a logical corruption on the source (for instance, a table being dropped), it will immediately be propagated to the destination system. With RecoverPoint changes are recorded and the destination can be rolled back to before the point in time of the logical corruption.

SRDF could be used in conjunction with Timefinder snapshots to provide a limited number of consistent point-in-time recovery points for groups of LUNs. RecoverPoint on the other hand can work with consistency groups to guarantee write order collection over a group of LUNs, and provides continuous change collection. RecoverPoint tracks block changes and journals them to allow rolling back target systems in the case of logical corruption or the need to rewind the development system.

Screen Shot 2013-06-03 at 10.29.21 AM

Figure 7. EMC SRDF or RecoverPoint can propagate changes from source filer LUNs to the target filer dynamically, allowing better point in time snapshotting capabilities.

Using SRDF or RecoverPoint allows propagation of changes from a source array to a target array. On the target array, clones can be made from the source database at different points in time while still sharing duplicate blocks between the clones no matter which point in time they came from.

In all these cases, however, there are limits to the snapshots that can be taken as well as technical challenges trying to get the source changes to the target array in an easy and storage-efficient manner.

More information on EMC snapshot technologies can be found via the following website links:

Summary

With EMC, thin cloning can only be achieved by using backup technology; in essence, the process has to be architected manually in order to support databases. How can the same goals be achieved but with database thin cloning specifically in mind? See the following blogs on Netapp, ZFS and Delphix.

Addendum

I’ve been getting questions about how EMC compares with Delphix. Delphix offers technology that is completely missing from EMC arrays

Screen Shot 2014-05-30 at 12.07.53 PM

EMC historically only supports 16 snapshots and no branching. EMC has no tools to transfer changes of a database from the production storage to the development storage. In theory one could use either SRDF which only works between compatible Symmetrix arrays for sending changes from one to the other or they could use Recover Point. Recover Point requires two additional appliances to capture changes on the wire and then play them onto different storage. Neither is setup for databases specifically to take into account things like file system snapshots with putting the database in hot backup mode. I haven’t met anyone with EMC that thinks that EMC could do much of what Delphix does when we explained what we do.
We have 3 parts
  1. Source sync
    • initial full copy
    • forever incremental change collection
    • rolling window of save changes with older replace data purged
  2. DxFS storage on Delphix
    • storage agnostic
    • compression
    • memory sharing of data blocks (only technology AFAIK to do this)
  3. VDB provisioning and management
    • self service interface
    • rolls, security, quotas, access control
    • branching, refresh, rollback
Of these EMC only has limited snapshots which is a part of bullet 2 above but for bullet 2 we also have unlimited, instantaneous snapshots that work on any storage be it EMC, Netapp or JBODs. Also if one is considering a new SSD solution like Pure Storage, Violin, Fuision IO etc, only Delphix can support them for snapshots.  We also compress data by 1/3 typically along data block lines. No one else AFAIK is data block aware and capable of this kind of compression and fast access. There is no detectible overhead for compression on Delphix.
No one in the industry does point 1 above of keeping the remote storage in sync with the changes.Netapp tries with a complex set of products and features but even with all of that they can’t capture changes down to the second.
Finally point 3, provisioning. No one has a full solution except us. Oracle tries to with EM 12c but they are nothing without ZFS or Netapp storage, plus their provisioning is extremely complicated. Installation takes between 1 week to 1 month and it’s brand new in 12c so their are bugs. And it does’t provide provisioning down to any second nor branching etc.

Delphix goes way beyond just data

  • SAP endorsed business solution
  • EBS automated thin cloning of full stack – db, app, binaries
  • Application stack thin cloning

Delphix customers have seen an average application development throughput of 2x.

One SAP was able to expand their development environments from 2 to 6 and increased the project output from 2 projects every 6 months to over 10.

Points to consider

• Storage Flexibility: EMC cloning solutions only work with EMC storage – increasing
lock-in at the storage tier. In contrast, Delphix is storage vendor agnostic and can be
deployed on top of any storage solution. As companies move towards public clouds,
influence over the storage tier vendor diminishes. Unlike EMC, Delphix remains
relevant on-premise and in the cloud (private or public).

• Application Delivery: Database refresh and provisioning tasks can take days to weeks
of coordinated effort across teams. The sheer effort becomes an inhibitor to
application quality and a barrier to greater agility. Delphix is fundamentally designed
for use by database and application teams, enabling far greater organizational
independence. Delphix fully automates various functions like refreshing and
promoting database environments, re-parameterizing init.ora files, changing SIDs, and
provisioning from SCNs. As a result, with Delphix, database provisioning and refresh
tasks can be executed in 3 simple clicks. The elimination of actual labor as well as
process overhead (i.e. organizational inter-dependencies) has allowed Delphix
customers to increase application project output by up to 500%. In contrast, EMC
cloning products increase cross-organizational dependencies and are primarily
designed for storage teams.

• Storage Efficiency: While EMC delivers storage efficiency simply through copy on write
cloning, Delphix adds intelligent filtering and compression to deliver up to 2-4x greater
efficiency (even on EMC storage!). Additionally, most customers realize more value
from other Delphix benefits (application delivery acceleration; faster recovery from
downtime etc.) that EMC does not offer or enable.

• Data Protection and Recovery: While EMC only allows for static images or snapshots
of databases at discrete points in time, Delphix provides integrated log shipping and
archiving. This enables provisioning, refresh, and rollback of virtual copies to any point
in time (down to the second or SCN) with a couple of clicks. It also enables an
extended logical, granular recovery window for edge-case failures and far better RPO
and RTO compared disk, tape or EMC clones. Many Delphix customers have wiped out
the cost of backup storage as well as 3rd party backup tools for databases with this
Delphix “Timeflow” capability.

 2nd Level Virtualization: Delphix can create VDBs (virtual databases) from existing
VDBs, which is extremely valuable given the natural flow of data in application
lifecycles from development to QA to staging etc. For example, a downstream QA
team may request a copy of the database that contains recent changes made by a
developer. EMC cloning tools can only create first generation snapshots of production
databases and do not reflect the real need or data flow within application
development lifecycles.

• Integrated Data Delivery: Many enterprise applications (ex: Oracle EBS, SAP ECC etc.)
are comprised of multiple modules and databases that have to be refreshed to the
same point in time for data warehousing, business intelligence, or master data
management projects. Delphix uniquely supports integrated and synchronized data
delivery to the exact same point in time or to the same transaction ID.

• Resource Management: Delphix offers resource management and scheduling
functionality such as retention period management, refresh scheduling, and capacity
management per VDB that is lacking in EMC’s cloning products. For example, some
VDBs for a specific source database may be retained for a few weeks while specific
quarter ending copies can be retained for extended durations (for compliance).
Delphix also supports prioritizing server resources allocated to process IO requests per
VDB. This is important in environments where DBA teams must meet SLAs that vary by
lines of business or criticality of applications.

 Security and Auditability: Physical database copies and EMC clones alike constantly
proliferate and increase the risk of audit failures and data breaches when sensitive
data is involved. Delphix delivers a full user model, centralized management, retention
policies (for automated de-provisioning), and complete auditing for VDBs. Delphix also
integrates with homegrown and 3rd party masking tools so virtual copies can be
centrally obfuscated – avoiding tedious masking steps per copy.

• V2P (Virtual to Physical): In the event that customers experience downtime across
primary and standby databases, Delphix can quickly convert a VDB (from any point in
time) from virtual to physical form to minimize the cost and risk of downtime. This
provides an extended recovery layer and also a quick path to creating physical copies
for other purposes like performance testing.

 


Uncategorized

  1. Trackbacks

  2. No trackbacks yet.
  1. Comments

  2. Noons
    | #1

    One word of caution with SRDF snapshots and COFW: if the source DB gets lots of updates and the snapshot gets them too, the I/O performance will tank for both storage areas. This is regardless of using same or different storage for the snapshot (not filesystems: they don’t exist in the SRDF storage universe – only LUNs).
    I’d say (although it goes without saying! :) ) that for normal mixed use, they’d have to be different: “copy” is not a free and infinitely fast operation in ANY storage system, be it on first write or not!
    I found that out the hard way when I first had to restore a backup to a file system mounted on a snapshot… From 200MB/s write speed down to 20!

  3. khailey
    | #2

    @Noons: thanks for the first hand experiences and throughput numbers. I’d heard some client complain about the impact of snapshots with COFW but didn’t know the actual extent of the impact.
    Yes, should have used storage subsystem instead of filesystem in the case of EMC. When it comes to ZFS , Btrfs and Dxfs I think in terms of filesystem.


− seven = 1