Archive

Archive for January, 2015

Data Center of the Future – now

January 30th, 2015

9087977014_a5a2d4bc11_k

 

photo by youflavio

In a recent blog post Eric D. Brown defined an Agile Data Center as

An Agile Data Center is one that allows organizations to efficiently and effectively add, remove and change services at the speed of the business, not the speed of technology – Eric D. Brown

In follow up post he said that a Agile Data Center could be implemented by Software Defined Data Center (SDDC) for example using machine virtualization to spin environments up and down.

With SDDC, it is possible for companies to replace their data center’s infrastructure with a virtualized environment and then deliver services and software as a service – Eric D. Brown

The question arrises, what technologies constitute and agile SDDC? What software should be leveraged to succeed at having an agile data center, a SDDC? The most important software to look at is software that addresses the top constraints in the data center. As the theory of constraints says, any improvement not made at the constraint is an illusion. So what are the top constraints in the data center. The top constraint, as found after working with 100s of companies and surveying 1000s of companies, is provisioning environments for development and QA. Why is that? It’s because almost every industry now is finding itself to be more and more a software industry from stock trading to booksellers to taxi companies to hotels. The competitive advantage is more and more about the software used to provide and sell the service. To build that software requires development and QA and thus development and QA environments. The hard part of an environment to provision is no longer the machine thanks to machine virtualization. The hardest part of the environment to provision is the data. Data that represents the production system is required to develop applications that use, display and manage that data. Data is typically kept in large complex databases such as Oracle, SQL Server, Sybase, Postgres and DB2.

Provisioning development and QA environments that rely on databases  can be an expensive, slow endeavor. But like machine virtualization there is a new technology data virtualization that instead of making full physical copies, instead makes one copy of each unique data block on the source including a stream of changes blocks. With this “time flow” of unique blocks from the source database, data virtualization can provide copies in minutes not by actually making copies but by providing pointers back to the existing blocks. These existing blocks are read/writeable thanks to a technology of redirect on write which saves modified blocks in a different location than the original. It all sound a bit complex but when that’s the beauty of data virtualization solutions. They take the complexity, wrap it up into automated software stack and provide simple interface and APIs to provision full developer environments from the binaries, to the code files to the most complex and difficult part of the environment provisioning which is provisioning full running copies of the data. Included in most data virtualization solutions is masking as well since sensitive data is often required to be masked in development environments. The software defined data centers (SDDC) depend on machine virtualization and data virtualization.

What other technologies are also required?

Check out  Software defined everything: Throwing more software into the mix 

Uncategorized

10046.pl by Clive Bostock

January 28th, 2015

Note:

This is a reposting of an old blog post that was on dboptimizer.com but is no longer accessible

More trace file analyzer tools at  http://ba6.us/node/177

Related blog post: Oracle “Physical I/O” ? not always physical with a 10046 parser specifically for I/O parsetrc.pl and readme


Often when I have a 10046 trace file, especially when looking at I/O issues, I want a histogram of I/O response time.  To get I/O response time I’ve hacked out incomple awk scripts from time to time, always meaning to write a more complete one, well now I don’t have to. It’s already been done!

Here is a cool perl script from Clive Bostock: README.TXT    10046.pl

(also checkout orasrp which produces a more indepth report in HTML. I like both. I like 10046.pl as a short  easy portable script that I can modify, whereas orasrp is a binary and only works on some ports)

For example, if I trace a session with 10046, and retrieve the tracefile, then I can run:

$ 10046.pl -t  mytrace.trc

and it will output  a header and three sections

Header

  • Summary of all events for tracefile
  • Events by object summary
  • Events by object histogram

This looks like

Header

Trace file mytrace.trc
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u02/oracle
System name:    SunOS
Node name:      toto
Release:        5.10
Version:        Generic_142900-12
Machine:        sun4u
Instance name: orcl
Redo thread mounted by this instance: 1
Oracle process number: 177
Unix process pid: 16553, image: oracle@toto
Trace input file : mytrace.trc

Wait summary

EVENT AGGREGATES
================
Wait Event              Count Elapsed(ms)   Avg Ela (ms)  %Total
~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~
db file sequential read  2715      11593              4    3.74
       direct path read  4484       4506              1    1.45
 db file scattered read   141        898              6    0.29
          log file sync     3          8              2    0.00
                              ~~~~~~~~~~~~
             Total Elapsed:       309821

Wait Summary by object

Object Id  : Wait Event                  Count Tot Ela (ms) %Total Avg Ela (ms)
~~~~~~~~~~ : ~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~ ~~~~~~~~~~~~ ~~~~~ ~~~~~~~~~~~~~
28581      : direct path read            4484         4506   1.45            1
1756433    : db file sequential read      725         1891   0.61            2
764699     : db file sequential read      332         1762   0.57            5
37840      : db file sequential read      200         1044   0.34            5
38018      : db file sequential read      108         1009   0.33            9
81596      : db file scattered read       140          887   0.29            6

wait histogram by object

EVENT HISTOGRAM BREAKDOWN
===========================
This section splits the event counts into elapsed time
buckets so that we can see if there are any suspiciousn
or anomalous response time / frequency patterns.
Object Id : Wait Event              <1ms <2ms <4ms <8ms <16ms <32ms <64ms <128ms <256ms <512ms >=1024ms
~~~~~~~~~ : ~~~~~~~~~~~~~~~~~~~~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~~~
28581     : direct path read        7680   87  148  221  144    40     4     0      0       0        0
1756433   : db file sequential read  606  268   45   35   66     6     2     0      0       0        0
764699    : db file sequential read   74  119   11   78   78     9     0     0      0       0        0
37840     : db file sequential read   50   72    6   45   47     5     0     0      0       0        0
38018     : db file sequential read   12   38    7   10   30    12     5     0      0       0        0
81596     : db file scattered read    64    4   13   62   18     8     3     0      0       0        0
41995     : db file sequential read   20   39    0    7   16     8     4     0      0       0        0
108718    : db file sequential read   74   54    5   12   24     4     0     0      0       0        0
33490     : db file sequential read    0    5   11   25   19     4     0     0      0       0        0

Uncategorized

Put Delphix on your laptop at Oracle Jan 28 !

January 27th, 2015

 

#CloneAttack

Create an army of clone databases and applications in minutes

cloneattack

Tomorrow Jan 28 we will be installing Delphix on people’s laptops at the BIWA conference at Oracle conference center at Oracle head quarters in Redwood Shores.

Prerequisites

  • Laptop, either
    • Mac: VMware Fusion or VirtualBox
    • Linux: VMware Fusion or VirtualBox
    • Windows: VMware Workstation or VirtualBox
  • at least 8 GB RAM
  • at least 50 GB free disk space, but preferably 100 GB free
  • at least 2 Ghz CPU, preferably dual-core or better

We’ll provide a USB stick with 3 virtual machine file OVA files. Just start up the VMs and in a few minutes you will be thin cloning Oracle databases, Postgres databases and web applications.

Example of the installation

Example of provisioning a database with web application using #CloneAttack

Uncategorized

Succeeding with Test Data Management

January 27th, 2015

 

 

Screen Shot 2015-01-26 at 6.00.46 PM
Test data management is difficult, time consuming and expensive leading to incorrect implementation of test data management and significant losses in revenue from high QA costs to bugs in production. Fortunately there is a solution that alleviates the majority of the huge resource, time and planning of conventional test data management and that new technology is called data virtualization.
 
Dependable QA testing requires that code be tested on data that represents the data to be encountered on the final production system. The data has to respect the business rules and data has to correlate between one business rule and another. For example a customer order record has to correlate to existing order items and an existing customer. The records also have to cover the date ranges being searched. If for example the test code searches date ranges outside of the test data then the test code won’t even touch any data. It’s difficult nearly impossible to create from scratch a full set of data that represents all the possible combinations of data  that will be encountered in production.
 
 
If test data management is done incorrectly then the lack proper data in testing leads bugs making their way into production leading to significant impact on the bottom line.
 

The absence of proper test data causes nearly one-third of the incidents we see in QA/nonproduction environments and is a major reason why nearly two-thirds of business applications reach production without being properly tested. The resulting application failures cause significant amounts of downtime, with an average price tag of $100,000 per hour for mission-critical applications, according to industry estimates. – Cognizant

After talking to hundreds of companies about their test data management we have found that QA systems typically use data that  is partial, representing a subset of data in production, or the data is synthetic and had been generated to simulate data in production. In both cases the data isn’t sufficient to cover all the data possibilities that are encountered in the production system(s). In order to address these missing case, the testing plan typically includes a QA cycle on a full copy of the production system near the end of the development cycle. At this point many bugs are found leading to the project either having to delay the project release or to release on time but with existing bugs.
 
The question arises “why isn’t the code tested on a full copy of production earlier in the development cycle?” If code testing was run on a full copy of production earlier in the code cycle then bugs would be found earlier, fixed earlier and fixed with less code rework.
 
The problem is that production systems usually run on larger complex databases and making copies of these databases is difficult, time consuming and resource intensive. Even when a copy is made, if the test cycles modify data as is typically the case where data is created, modified and deleted then the same test data will change from one QA run to another leading to diverging results. This leads to the need to refresh the data to the state it was before the QA cycle started.  With large complex data sets the refresh process can be deemed prohibitively expensive and time consuming. Thus the industry had come to think that testing on full size production data is not feasible.
 
In fact using production data is possible but it requires a technology called data virtualization. Data virtualization is a technology that allows almost instant cloning of data for almost no storage. There is a storage requirement for the original source data and even that requirement can be  eased through compression such that storage requirement is on the order of one third of the original source data. Once data virtualization is linked to the source data, i.e. there is an initial copy of the source data, then from there new clones can be made for almost no storage because the clones aren’t actual copies but point to the already existing data. The beauty is that these clones are full read and write copies because when these clones modify that data, these modifications are stored separately from the original data and are only visible to the clone that made the changes. These clones are called thin clones because they don’t initial take up additional storage.  The whole process is called data virtualization and when it comes to databases that use this technology they are called virtual databases. On top of the storage savings data virtualization technologies come with automated collection of changes on the source data creating a time flow of changes. The time flow means that there is never a full copy of the source taken again. Only changes are collected. Changes older than the time window, which is usually a couple of weeks, are purged out. Virtual database can be provisioned from any point in time down to the second from this time flow. Most data virtualization technologies come with automated provisioning of virtual databases making the provisioning of a up and running database on a target machine a matter of a few clicks of a mouse and a few minutes. Data virtualization also generally includes options for data masking  improving security coverage in the testing cycles.
 

A few companies we have worked with have had test cycles that repeated over and over and between testing code the database had to be refreshed. In one specific case the refresh took 8 hours for only 20 minute of actual code testing. Going from this architecture to an architecture of virtual database we were able to use full fresh copies of production data catching bugs earlier and reduce the refresh time down to a few minutes drastically reducing the overhead of QA and increasing test coverage:

 

Screen Shot 2015-01-26 at 5.50.45 PM

 

 

Uncategorized

5 ways to boost your career with Social Media

January 26th, 2015

8562448300_7f375c9de9_o

Photo by Kevin Dooley

If you are in the Bay Area tomorrow, Jan 27, come see myself, Yann Ropars and Yury Velikanov talk about how to leverage social media to boost your career. We will be talking at Oracle head quarters at the NoCOUG/BIWA conference at 2:30 pm 

Why use social media as an IT technician? Because

vibrant social networks are key to landing jobs, moving forward in your career, and securing personal happiness

Richard Florida  professor Rotman School of Management at the University of Toronto

 Success no longer comes from possessing knowledge; instead, you have to participate with others in creating a flow of knowledge. 

– Walter Isaacson, President and CEO, the Aspen Institute, and author of Einstein: His Life and Universe

Networks matter – who do you turn to for help? Who can you trust? How do you get a new job? How do you know what products to buy? What are the latest discoveries? How do you know if 12c is ready for prod?

Networks of people can accomplish things an order of magnitude faster than a lone individual. Look at wikipedia or open source software.

My first big social networks “ah ha” was 15 years ag  with a group of Oracle technicians  that Mogens Nørgaard had created an email list for. With the help from several different people on this list such as Anjo Kolk, James Morle and Jonathan Lewis, I was able to put together a program in a week to read Oracle’s SGA directly with C instead of SQL. Really cool stuff! Without each of their input I might have spent a year instead of a weekend.

That was 15 years ago and I’m convinced  more than ever that social networks are hugely important

My second big social network “ah ha” came in the world economic meltdown in 2009.  My company at the time wasn’t strong but I was happy there because I was running my own project and was head down working on my project ignoring the rest of the world. Then when the economy slipped I realized that my company was in a bad position as they cut salaries across the board, laid people off and cut my project team down to a handful and dumped 4 other software projects on by desk that I had no interest in. With almost no time to work on my project that I was passionate about, hardly any team left, more work than I could handle on other weak software packages and a lower salary, I was chomping at the bit to get out, but I realized I didn’t have any one one turn to.  All my contacts were at Oracle which I didn’t want to go back to and my current company. I realized I need a strong Linkedin network to leverage to find a new job. Ultimately it was the network that Mogens had set up that saved me and connected me to Delphix. Before this experience I had thought linkedin was just a place to put your resume online. I didn’t realize it was a strong social network that could actually help me get a job.

Now days, if I meet a DBA who is not on Linkedin, I assume that the person is a 9am-5pm “lifer” at the company they are currently at. It leads me to think they are happy where they are, have no plans on ever leaving, and no ambition to leave.  My assumptions are probably often wrong. Maybe the person just doesn’t know the power of social networks but there are so many people on social media who are easy to connect with, why bother with people who are hard to connect with?  In person connections were the only way to connect 20 years ago. In person connection, or local connections are still important, which is why Silicon Valley is so powerful. Silicon Valley is so powerful because there are so many smart, motivated, passionate IT technicians who can easily hop from one job to another, meet at bars, meetups, hackathons, conferences etc. But now days online social connections are becoming more and more important. When communication is rapid, things happen faster.

Social media for me is

  • email groups, like Oracle-L
  • forums, like OTN
  • twitter
  • linkedin
  • facebook

 

5 ways to boost your career with social media:

  1. Find jobs
    • become more well known by your presence
    • create a network of people you can turn to and who can validate your skills
    • share your knowledge and generate
  2. Find employees
    • someone send you a resume? See who they are connected to. Are they connected to a strong network?
  3. Solve technical problems
    • have an error, ask if others have seen it
  4. Create projects together
    • github – get input and help on scripts and projects
  5. Get feedback on technology
    • what’s new
    • what’s useful
    • what can you ignore
    • what problems should you watch out for

 

How do you get started?

Hop on twitter. Find someone important who only follows a few people. For example in the Oracle world, go to Tom Kyte‘s twitter profile.  Tom is only following 22 people and you can bet those 22 people are probably pretty good people to follow, so go through his list and pick out most or all and follow them. Then you will be on a good start to creating a nice Oracle twitter feed.

Retweet other peoples tweets. It will get  you noticed and help you build a following.

Twitter can be hard to sort the wheat from the chaff so I use a couple of tools.

Prismatic will go through tweets and pull just the tweets with links to articles and then server up the articles in a news reader. I love it.

Tweetdeck will allow me to track words or hashtags. For example I can monitor all tweets with my company name, Delphix, or during a conference like Oracle Open World I can track all the tweets with #OOW and #OOW15.

Start a blog if you can. I often use my blog as just a way to document stuff I’ll forget and will want to find later.

Comment on other peoples blogs.

Get on technical forums and participate. My favorite Oracle forum is the Oracle-L list.

If you write scripts to help you, share them on github and help other people with their scripts on github.

On linkedin, connect with everyone you can. If your company is small enough, just connect with everyone at your company. Connect with friends, colleagues, customers, everyone you can.

 

5440123633_4c85b23958_z

photo by  Daniel Iversen

12869913094_96695c7918_z

photo by neliO

 

Uncategorized

Free Version of Delphix !

January 13th, 2015

Delphix is now available for  30 day trial direct download ! ( if you would like a longer version please contact me at kyle@delphix.com year trials and even indefinite trials  potential possible for partners, bloggers, Oracle Aces etc)

Just got to the

The Delphix download trial consists of 3 pre-configured virtual machines downloadable as OVA files

  • source machine with Oracle XE and Postgres  (1.3 GB)
  • Delphix engine (1.2 GB)
  • target machine with Oracle XE and Postgres binaries (1.9 GB)

Just startup the source, target and Delphix VM and you are ready to go.  After starting up the source, target and Delphix the lab will automatically link Delphix to the source databases. After a few minutes the source databases will show up in the the Delphix console in a browser. Access Delphix via the browser by simply typing in the IP address of Delphix. Once the sources are visible in the Delphix console you can start creating thin clones on the target machine. The thin clones only take a couple minutes to make and take up almost no space.


The Delphix download trial consists of 3 pre-configured virtual machines downloadable as OVA files

  • source machine with Oracle XE and Postgres  databases (1.3 GB)
  • Delphix engine (1.2 GB)
  • target machine with Oracle XE and Postgres binaries (1.9 GB)

Just startup the source, target and Delphix VM and you are ready to go.  After starting up the source, target and Delphix the lab will automatically link Delphix to the source databases. After a few minutes the source databases will show up in the the Delphix console in a browser. Access Delphix console by simply typing in the IP address of Delphix into a browser. Once the sources are visible in the Delphix console you can start creating thin clones, i.e. virtual databases, on the target machine. The thin clones only take a couple minutes to make and take up almost no space.

Prerequisites on the machine where the lab is installed:

  • Mac, Linux or windows (laptop or desktop or workstation)
  • O/S virtualization  either
    • Virtualbox or
    • VMware
      • Mac or Linux: VMware Fusion (free trial version download)
      • Win: VMware Workstation (free trial version download)
  • at least 8 GB RAM
  • at least 50 GB free disk space, but preferably 100 GB free
  • at least 2 Ghz CPU, preferably dual-core or better

There is a Vimeo channel for videos of the lab at   https://vimeo.com/channels/landshark

For example the lab setup video for VMware Workstation:

Example of provisioning a virtual database and a virtual application

There is also a full online community where you can find answers and ask questions

 https://community.delphix.com/delphix/categories/delphix_landshark

The Delphix demo lab is nicknamed “landshark” and the hands-on labs given at conferences is called “#CloneAttack”.

Get one-on-one help with installing and running the demo at the nearest Oracle conference to you that has a Clone Attack event such as RMOUG, Collaborate, UKOUG, DOAG, OUGN etc

Screen Shot 2014-09-24 at 9.41.34 AM

Uncategorized

Top 3 criteria to choose a virtual data solution 

January 6th, 2015

3105581280_58d4132191_z

photo by Thomas Hawk

Data virtualization solutions also known as Copy Data Management (CDM), Virtual Copy Data (VCD) and Virtual Data Appliances (VDA) are rising rapidly as over 100 of the Fortune 500 have adopted data virtualization solutions between 2010 and end of 2015. Adoption is hardly surprising given that virtual data reduces the time to provision copies of large data sets from days down to minutes and eliminates most of the space required for copies of data. How many copies of large data sets do companies have? Database vendor Oracle claims that on average a customer has 12 copies of production databases in non-production environments such as development, QA, UAT, backup, business intelligence, sand boxes, etc  and Oracle expects the number of copies to double by the time their latest version of Oracle, 12c, is fully adopted. With Fortune 500 companies often having 1000s of databases and these databases reaching multi terabytes in size, the down stream storage costs of these data copies can be staggering.

There are a number of virtual data solutions coming onto the market and several already in the market place such as Oracle, Delphix and Actifio. Delphix and Actifio are listed in The 10 Coolest Virtualization Startups Of 2014 and Delphix is listed in TechTarget‘s Top Ten Virtualization Companies in 2014 as well as Forbes Magazine Names Delphix One of America’s Top 25 Most Promising Companies of 2014. Oracle as well is flooding their product offerings with data virtualization solutions such as Clone DB, Snap Clone, Snapshot Management Utility for ZFSSA and ACFS thin cloning in Oracle 12c and new vendors will be coming to market over the next year. 
 
Question to ask when look at data virtualization solutions are
  • What unique features does each vendor provide to help achieve my business goals?
  • Does the solution support the my full IT environment, or is it niche/vendor specific?
  • How much automation, self-service and application integration is pre-built vs. requires customization?
  • Are their customers similar in size and nature to myself using the solution?
  • Is the solution simple and powerful or just complicated?
Picking between the available solutions is further complicated by the common claims used by all the solutions in the market, thus we’ve come up with a list of the top 3 criteria to choose between these solutions.
 
Top 3 criteria for choosing a virtual data solution
 
The top 3 questions to ask when looking at a virtual data solution are
  1. Is the solution addressing your business goals
  2. Is the solution supporting your entire IT landscape
  3. Is the solution automated, complete and simple
1. Address business goals
The first step is to identify the business problems and clarify if the solutions meet your business goals. The top use cases for data virtualization in the industry are:
    • Storage savings
    • Application development acceleration
    • Data protection & production support
Deciding which of the above uses cases apply will help in determining the best solution.

Storage savings

All data virtualization solutions offer storage savings by the simple fact that virtual data provides thin clones of data meaning that each new copy of data initially takes up no new space. New space is only used after the data copies begin to modify data. Modified data requires additional storage.

Comparing storage savings

To compare the storage savings of various solutions find out how much storage is required to store new modifications and how much storage is required to initially link to a data sources. Of the solutions we’ve looked at the initial required storage ranges from 1/3 the size of the source data up to 3x the size of the source data. Of the solutions we’ve looked at some can store new modified data in 1/3 the actual space thanks to compression. Other solutions don’t have compression and some solutions have to store redundant copies of changed data blocks.

Data agility more important that storage savings

 
Storage savings can be massive but surprisingly enough of the 100s of virtual data adopters we’ve talked to most mention that the data agility far is by far more important than storage savings. Agility means that a virtual copies can be made in minutes instead of the more traditional full physical copies which can take hours or days or even weeks to make when making copies of large databases.

Application development acceleration

 
The agility that virtual data provides such as provisioning a full read writable copy of a multi TB database in minutes can improve the efficiency of many different aspects of a company but the area we see the biggest improvement is application development. Companies report 20-80% improvement in application development timelines after moving to data virtualization solutions.  Application development typically requires many copies source data when developing and/or customizing an application. These copies of data are required not only by the developers but also QA.

User friendly self service interface
When it comes to identifying the best data virtualization solution for application development look for solutions that provide user friendly self service developer specific interfaces. Some solutions only provide interfaces for a DBA or storage administrators. Administrator specific interfaces will continue to impede developers as developers will have to request copies from these administrators incurring wait time especially when those other administrators are already busy. The improvements to application development come when the solution gives users self service interfaces where they can directly make copies of data eliminating the costly delays of waiting for data.
Developer Centric Interface
When looking at application development acceleration make sure the solutions have a developer centric interface with per developer logins that can supply the correct security level of  limiting  what data developers have access to, how many copies they can make and how much extra storage they can use when modifying data. Data typically has sensitive content that should be masked before giving the data to developers. In the case of sensitive data look for solutions that include data masking.  Important as well is looking for developer interfaces that give developers standard development functionality such as data versioning, refreshing, bookmarking and rollback. Can one developer bookmark a certain version of a database and can another developer branch a copy from that bookmark to look at a certain use case or bug?
Branching of virtual data copies crucial for QA support
The most important feature for application development acceleration is the ability of the solution to branch data copies. Branching data copies means making a new thin clone copy from an existing thin clone copy. Some solutions have this feature and some do not. Why is branching important? Branching is important for a number of reasons such as being able to branch a developers copy of data from a time before they might have made an error in data changes such as dropping a table. More importantly though branching is essential for being able to spin up copies of data for QA directly from development. One of the biggest bottlenecks in development is supplying QA with the correct version of data or database to run the QA tests.  If there is a development database with schema changes and/or data modifications, then instead of having to build up a new copy for QA to use, with data virtualization and branching, one can branch a new clone, or many clones for that matter, and give them to QA in minutes. All the while development can go ahead and continue to use the data branch they were working on.
Data protection for developer virtual copies
Finally some data virtualization solutions offer by default data protection for development databases. Development databases are often not backed up as they are considered “just development” but we see an order of magnitude more incidences of developers inadvertently corrupting data on development databases than production DBAs accidentally damaging data on production databases.  Ask the data virtualization solutions if they can provide branches of a damanged development database down to the second at a point in time before the developer accidentally damaged the development database.  Some solutions offer no protection, others offer manual snapshots of points in time, and finally the best simply and automatically provide a time window of multiple days into the past from which a virtual database can be branched off if there were any mistakes or data corruption.
Data protection & production support

Data virtualization solutions can provide powerful data protection. For example if someone corrupts data on production such as dropping a table or a batch job that only half completes modifying some data but not all data before erroring out, a virtual database can be spun up in minutes and the uncorrupted data exported from the virtual database and imported into the production database.  We have heard numerous stories of the wrong table being dropped on production or a batch job deleting and/or modifying the wrong data with the changed propagated immediately to the standby thus being unrecoverable from the standby.
Data virtualization can save the day recovering the data in minutes. Data virtualization can offer impressively fine grain and wide time windows for Recovery Point Objects and fast Recovery Time Objectives.
Time window size and granularity
When looking at data virtualization solutions for data protection make sure the solution provides a time flow, ie a time window of changes from the source data from which virtual copies can be made. Some solutions have no time window, other solutions have occasional snapshots of past states of data and the best solutions offer recovery to any point in time down to the second in a time window.
Time window storage requirements
The larger the time window of changes collected from the past the more storage will be required. Find out how much storage is required to maintain the time window. Some solutions require significant storage for this time window and some solutions can store the entire time window for multiple weeks in the size of the original data source thanks to compression.
Time and ease of provisioning

Finally look into how easy or difficult it is to provision the data required. If the data required is a database then provisioning the data can be a complicated task without automation. Does the solution offer a point and click provisioning of a running database down down to the second at a past point in time? How easy or difficult is it to chose the point in time from which the data is provisioned? Is choosing a point in time a simple UI widget or does it require manual application of database logs or manual editing of scripts?

. Support your entire IT landscape

Is the solution a point solution or does it expand to all the needs of the IT department?
Is the solution specific to a few use cases or does it scale to the full Enterprise requirements?
Is the solution a single data type solution such as only Oracle databases?

Is the solution software running on any hardware or does it require specialized hardware? Does the solution use any storage system in your IT landscape or is it restricted to specialize storage systems? Will the solution lock you into a specific storage type or will it allow full flexibility to use new storage types as they become market leaders such as new, better and more affordable flash storage systems. Does your IT landscape use the cloud and does the solution support your IT department’s cloud requirements?

Does the solution support all of your data types and operations systems? For example does your IT landscape use any of the  following databases and does the solution automate support for these databases?
    • Oracle
    • Oracle RAC
    • SQL Server
    • MySQL
    • DB2
    • Sybase
    • PostGres
    • Hadoop
    • Mongo

Does your IT landscape require data virtualization for any of the following and does the solution automate support for these data types

    • web application tiers
    • Oracle EBS
    • SAP
    • regular files
    • other datatypes

Does your IT landscape use and does the solution support all of your operating system types

    • Linux
    • HP/UX
    • Solaris
    • Windows
    • AIX

3. Fully Automated, Complete and Simple

Automated 

How automated is the solution? Can an end user provision data or does it required a specialized technician such as a storage admin or DBA? When provisioning databases such as Oracle, SQL Server, MySQL etc does the solution fully and automatically provision a running database or are manual steps required? For example some solutions only provision data from a single point in time from the data source. What if a user requires a different point in time? How much manual intervention is required? Some solutions only support provisioning data from specific snapshots in the past. What if a user requires a specific point in time in the past that is between snapshots. How much manual intervention is required? Does the solution collected changes automatically from the data source or does the solution require some other tools or manual work to collect changes from the source or get newer copies of source data?

Complete
How complete is the solution?
Is the solution a point solution for a specific database like Oracle or does it support multiple database vendors as well as application stacks and other file types?
Does the solution include masking of data?
Does the solution include replication or other backup and fail over support?
Does the solution sync with a data source and collect changes or is it simply a interface to manage storage array snapshots?
Does the solution offer point in time recovery down to the second or is it limited to occasional snapshots?
Does the solution provide interfaces for your end user self-service?
Does the solution offer performance monitoring and analytics?
Does the solution only provide data sharing on disk only or does it share data at the caching layer as well?

Simple

How long does it take to install the solution? We’ve seen systems set up in 15 minutes and others take 5 days.
How easy or hard is it to manage the solution? Can the solution be managed by a junior DBA or junior IT person or does it require expert storage admins and DBAs?

Does the solution come with an alerting framework to make administration easier?

Does the interface come with a “single pain of glass” to expand to 1000s of virtual data copies across potentially 100s of separate locations in your IT landscape?

Is it easy to add more storage to the solution? Is it easy to remove unneeded storage from the solution

 

In Summary

Find out how powerful, flexible and complete the solution is.

Is the solution a point solution or a complete solution ?
Some solutions are specific point solutions for example only for Oracle databases. Some solutions are point solutions to specific hardware or storage systems while others are complete software  solutions. Complete flexible solutions sync automatically with source data, collect all changes from the source providing data provisioning down to the second from anywhere within that time window, will support any data type or database on any hardware and support the cloud.

Does the solution provide self service and user functionality?

  • Point-in-time provisioning
  • Reset, branch and rollback of environments
  • Refresh parent and children environments with the latest data
  • Provision multiple source environments to the same point in time
  • Automation / self-service / auditing capabilities

Some simple technical differentiators

  • support your data and database types on your systems and OS
  • support your data center resources or require specialized hardware or storage
  • sync automatically with source data or does it leave syncing as a manual exercise or require other solutions
  • provision data copies down to the second from an extended time window into the past
  • branch virtual data copies
  • cloud support included

But when it comes down to it, even after asking all these questions, don’t believe the answers alone. Ask the vendor to prove it. Ask the vendor to provide in house access to the solution and see how easy or hard it  is to install, manage and  execute the functionality required.

 

For more information also see

 

 

Uncategorized