Archive

Archive for November, 2014

Is Continuous Integration compatible with database applications ?

November 10th, 2014

Continuous integration and continuous delivery offers huge efficiency gains for companies  but is continuous integration even possible when the application’s backbone is a massive relational database. How can one spin up database copies for developers, QA, integration testing,  and delivery testing ?  Its not like Chef or Puppet can spin up a 10TB database copy in a few minutes the way one can spin up a Linux VM.

There is a way and that way is called data virtualization which allows one to spin up that 10TB database in minutes as well as branch a copy of that 10TB from Dev to QA, or for that matter branch several copies and all for a very little storage.

227840.strip.print

Old methods of application project development and rollout have a solid history of failure to meet deadlines and budget goals.

Repeating the old methods and expecting different results is what Einstein would call insanity

“Insanity: doing the same thing over and over again and expecting different results.” – Einstein

Continuation Integration (CI),  Continuous Delivery  and Agile offer an opportunity to hit deadlines on budget with tremendous gains in efficiency for companies as opposed to waterfall methods. With waterfall methods we try to get all the requirements, specifications  and architecture designed up front and then set the development teams working and then tackle integration and deployment near the end of the cycle. It’s impossible to be able to precisely target dates when the project will be completed and sufficiently QA’ed. Even worse during integration problems and bugs start to pour in further exacerbating the problems of meeting release dates on time.

Agile, CI and CD fix these issues, but there is one huge hurdle for most shops and that hurdle is getting the right data, especially when the data is large, into the Agile, CI and CD life cycle and flowing through that lifecycle.

With Agile, Continuous Integration and Continuous Delivery we are constantly getting feedback on where were are, how fast we are going and we are altering our course. Our course is also open to changing as new information comes in on customer requirements.

Agile development calls for short sprints, among other things,  for adding features and functions and with continuous integration those features can be tested daily or multiple times a day. Further enhancing continuous integration is continuos delivery, for those systems that make sense, where new code that has passed continuous integration can be rolled into continuous delivery meaning testing the code deployment in a test environment. For some shops, where it makes sense such as famously flickr, Facebook, Google the code can be passed into continuous deployment into production.

By using agile programming methods and constantly doing integration testing one can get constant feedback, do course correction, reduce technical debt and stay on top of bugs.

Compare the two approaches in the  graphs below.

In the first graph we kick off the projects. With waterfall we proceed until we are near completion and then start integration and delivery testing. At this point we come to realize how far from the mark we are. We hurriedly try to get back to the goal, but time has run out. Either we release with less than the targeted functionality or worse the wrong functionality or we miss the

Screen Shot 2014-11-07 at 2.39.09 PM

With Agile and CI its much easier to course correct with small iterations and the flexibility to modify the designs based on incoming customer and market requirements.

Screen Shot 2014-11-07 at 2.40.47 PM

With Agile and CI, code is tested in an integrated manner as soon as the first sprint is done so bugs are caught early and kept in check. With waterfall, since it takes so much longer to get working set of code working and integration isn’t even started until near the end of the cycle, bugs start to accumulate significantly towards the end of the cycle.

Screen Shot 2014-11-07 at 1.19.46 PM

In waterfall, deployment doesn’t even happen until the end of the cycle because there isn’t an integrated deployable set of code until the end. The larger the project gets and the more time goes by the more expensive and difficult the deployment is. With agile and CI the code is constantly deployable and so the cost of deployment stays constant at a low cost.

Screen Shot 2014-11-07 at 1.19.35 PM

A waterfall project can’t even start to bring in revenue until it’s completely finished but with agile, there is usable code early on and with continuous deployment that code can be leveraged for revenue early.

With all these benefits, more and more shops are moving towards continuos integration and continuos.  With tools like Jenkins,  Team City, Travis to run continuos integration test and virtualization technologies such as VMware, AWS, Openstack, Vagrant, Docker and tools like Chef, Puppet and Ansible to run the setup and configuration many shops have moved closer and closer to continuos integration and delivery. 

But there is one huge road block.

 Gene Kim lays out the top 2 bottlenecks in  IT

  1. Provisioning environments for development
  2. Setting up test and QA environments

and goes on to say

One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need it.

From Contino’s recent white paper 

Having worked with many enterprise organisations on their DevOps initiatives, the biggest pain point and source of wastage that we see across the software development lifecycle is around environment provisioning, access and management.

From an article published today in Computing [UK]  we hear the problem voiced:

“From day one our first goal was to have more testing around the system, then it moves on from testing to continuous delivery,” 

But to achieve this, while at same time maintaining the integrity of datasets, required a major change in the way Lear’s team managed its data.

“It’s a big part of the system, the database, and we wanted developers to self-serve and base their own development in their own controlled area,” he says.

 Lear was determined to speed up this process, and began looking for a solution – although he wasn’t really sure whether such a thing actually existed.

This road block as been voiced by experts in the industry more and more as the industry moves towards continuos integration.

continuous-deliveryWhen performing acceptance testing or capacity testing (or even, sometimes, unit testing), the default option for many teams is to take a dump of the production data. This is problematic for many reasons (not least the size of the dataset),

Humble, Jez; Farley, David (2010-07-27). Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation (Addison-Wesley Signature Series (Fowler)) (Kindle Locations 7285-7287). Pearson Education. Kindle Edition.

What can we do about this enormous obstacle to Continous Integration of providing environments that rely on databases and these databases are too big and complex to provide copies for development, QA and continuos integration?

Fortunately for us there is data virtualization technology. As virtual machine technology opened the door to  continuos integration, data virtualization swings it wide open for enterprise level application development that depend on large databases.

Data virtualization is an architecture (that can be encapsulated in software as Delphix has done) which connects to source data or database , take an initial copy and then and forever collects only the changes from the source (like EMC SRDF, Netapp SMO, Oracle Standby database). The data is saved on storage that has either snapshot capabilities (as in Netapp & ZFS or software like Delphix that maps a snapshot filesystem onto any storage even JBODs). The data is managed as a timeline on the snapshot storage. For example Delphix saves by default 30 days of changes. Changes older than the 30 days are purged out, meaning that a copy can be made down to the second anywhere within this 30 day time window. (some other technologies that address part of the virtual data stack are Oracle’s Snap Clone and Actifio).

Virtual data improves businesses’ bottom line by eliminating the enormous infrastructure, bureaucracy and time drag that it takes to provision databases and data for business intelligence groups and development environments. Development environments and business intelligence groups depend on having a copies of production data and databases and data virtualization allows provisioning in a few minutes with almost no storage overhead by sharing duplicate blocks among all the copies.

 

As a side note, but important, development and QA often require that data be masked to hide sensitive information such as credit cards or patient records,  thus its important that a solution come integrated with masking technology.  Data virtualization combined with masking can vastly reduce the surface area (amount of potentially exposed data) required to secure but eliminating full copies. Also data virtualization structure includes chain of authority where who had access to what data at what time is recorded.

The typical architecture before data virtualization looks like the following where a production database is copied to

  1. backup
  2. reporting
  3. development

In development the copies are further propagated to QA, UAT, but because of the difficulty in provisioning these environments, which takes number teams and people (DBA, storage, system, network, backup) , the environments are limited due to resource constraints and often the data is old and unrepresentative.

Screen Shot 2014-11-09 at 6.28.32 PM

 

With data virtualization, there is a time flow of data states stored efficiently on the “data virtualization appliance” and provisioning a copy  only takes a few minutes, little storage and can be run by a single administrator or even run as self service by the end users such as developers, QA and business analysts.

Screen Shot 2014-11-09 at 6.28.54 PM

 

With the ease of provisioning large data environments quickly easily and for low resources, it become easy to quickly provision copies of environments for development and to branch in minutes those copies into multiple parallel QA lanes to enable continuous integration:

 

 

 

 

Screen Shot 2014-11-09 at 6.32.22 PM

The duplicate data blocks, which is the majority  in the case of large data sets, can even be shared across development versions:

Screen Shot 2014-11-10 at 4.49.14 AM

Go From

Waterfall Deployment

photo

word_in_dev

 

to Continuous Integration

surfers_small_waves

 

 

Uncategorized

#CloneAttack online ! (and in person at #ECO14 & #DOAG14)

November 5th, 2014

I’ll be at #ECO14 today in Raleigh,NC. Catch me in person to get a copy of #CloneAttack. I’ll also be at #DOAG14 in Nuremberg Nov 18-20 running an official #CloneAttack along side of #RacAttack and #RepAttack. Will also be at #BGOUG in Sofia Bulgaria Nov 14-16.

For online #CloneAttack info read on:

Screen Shot 2014-09-24 at 9.41.34 AM

We are starting a limited publicly accessible download of a trial version of Delphix and lab.

If interested connect with me on linkedin at  www.linkedin.com/in/kylehailey/ and I will send you the download info via linkedin messages.

Delphix trial version and lab:
The lab consists of 3 virtual machines. The virtual machines are available from the download link as OVA files. You can start up the virtual machines by simply clicking on the OVA files after you have VMware fusion installed.
Virtual Machines are
  1. Source Linux machine with Oracle database
  2. Target Linux machine with just Oracle binaries. We will create thin clone databases, ie virtual databases here
  3. Delphix machine already linked to the source database on the source Linux virtual machine
Because we can not distribute Oracle 11gR2 software, we ask that you download that from Oracle and we provide a script to automate the install of Oracle software on the Source and Target Linux virtual machines.
The lab requires 11.2.0.1 of Oracle (other versions will fail automated install using our scripts, but you are free to install them manually yourself on the VMs and then use those versions with Delphix)
Steps
  1. Import the OVAs into VMWare (right click on OVA, chose open with VMware Fusion)
  2. Note IP Address on the boot screen of each VM (via Console)
  3. Download the Oracle 11.2.0.1 install for “Linux x86-64″
  4. execute automated Oracle install script (from directory with Oracle install zip files)
    • On MAC  lsde_install_mac_v1.2.sh
    • On PC lsde_install_v1.2.sh
    • On Linux lsde_install_v1.2.sh
  5. Login to the Delphix Admin portal (type in the IP address of Delphix VM in a browser)
  6. Go to Manage->Environments
  7. Choose the appropriate environment and then change the Host Address to the IP address from step 4
  8. Refresh Environment (click on the blue & green double arrow on the Target and then Source)
 

Prerequisites

Here is the download URL and login info. Please let me know your experiences. We are actively improving the lab experience and depend on your  feedback to improve it.
 
Account: delphixdemokit
Username: lsde_shared
Password: contact me via linkedin at www.linkedin.com/in/kylehailey/
Path: All Buckets /landshark/DE/1.5.6
 
There is an online community for posting and asking questions here:
My previous blog postson the labs with general info is at
NOTE: this is a trial version is for functional testing and not for performance testing. The trial version is a version trimmed down for  a laptop install thus its not valid for performance tests. For performance tests Delphix requires a minimum of 32GB of RAM and 8 vCPUs and vmxnet3 network driver. This same trial version can successfully be used for performance tests by  being installed on an ESX machine and configured for performance but high end performance setups should ideally include assistance from Delphix pre-sales. Also the performance of the underlying disk subsystem that Delphix relies on will, obviously, be a major factor in any performance. Finally for performance work, ALL router hops between Delphix and target machines that run the virtual databases should be eliminated.

Reference

9009_03

Here is a video showing the lab setup. A later video will show some exercises we can do with the lab setup.

Here is a video of an exercise to do on the lab VMs after install

Linux notes

NOTE: 

– when booting the source/target VMs, the boot process occasionally hangs at:

Bringing up interface eth0:

Determining IP information for eth0…init: vmware-tools pre-start process (779) terminated with status 1

a simple VM restart solves the problem.

NOTE Linux

need adobe flash to run Delphix console in a browser

installing flash on Fedora

$ yum install flash-plugin

NOTE Fedora Linux install

use Linux lsde_install_v1.2.sh

but change “ping -n ” to “ping -c ”  and ” grep TTL ” to “grep rtt”

will work on a new linux install script

Fedora VMware Workstation install

download

 VMware-Workstation-Full-10.0.4-1379776.x86_64.bundle

Following notes from  from http://www.linux.com/community/forums/cloud-management/how-to-install-vmware-1001-on-fedora-20-x86-64

Install GCC, dev tools and kernel headers

# yum -y install gcc kernel-headers kernel-devel
# ln -s /usr/src/kernels/$(uname -r)/include/generated/uapi/linux/version.h /usr/src/kernels/$(uname -r)/include/linux/version.h[/code]

 

Start the VMWare installation

# chmod -x VMware-Workstation-Full-10.0.1-1379776.x86_64.bundle
# ./VMware-Workstation-Full-10.0.1-1379776.x86_64.bundle

Apply patch for a bug with netfilter

WARNING: see https://wiki.archlinux.org/index.php/VMware#3.17_kernels to get correct version of patch for your kernel
$ curl http://pastie.org/pastes/8672356/download -o /tmp/vmware-netfilter.patch
$ cd /usr/lib/vmware/modules/source
# tar -xvf vmnet.tar
# patch -p0 -i /tmp/vmware-netfilter.patch
# tar -cf vmnet.tar vmnet-only
# rm -r vmnet-only
# vmware-modconfig --console --install-all

 

Start VMWare

$ vmware

 

Uncategorized

Jonathan Lewis explains Delphix

November 4th, 2014

OOW 2014 was the best so far for me and a whirl wind.  After not having a presentation accepted since I left Oracle 10 years ago, I got not only a presentation accepted but 3 presentations accepted. Woohoo! Two of my presentations are available on youtube at

On top of that with the awesome support of the Oaktable, Delphix, Pythian and Enkitec , I was able to secure a great venue for the 3rd annual Oaktable World at OOW where we had the leading Oracle performance and internals experts speak for 2 solid days. You can see videos of the Oaktable World presentations at

Then with the help of Tim Gorman and Steve Karam also of the Oaktable, along with colleague and mastermind Adam Bowen, we ran a hands on lab where we installed Delphix on peoples laptops at the OTN lounge and called it #CloneAttack. We collaborated the next day with DBvisit and Solarwinds as well to hold hold a full day of hands on labs #RepAttack and #MonitorAttack.

The evenings were packed between events like the fun Oracle Ace dinner and the awesome Pythian party at the W and the bloggers get together at Jillians with customer meetings and dinners scattered in between.

The Delphix booth was exciting as it was huge 20×40 between Cisco and Cognizant. Its been fun watching Delphix go from a 10×10 slot on the outskirts of the show floor, to 10×20, to 20×20 to 20×40 right in the middle of the industry movers and shakers.

Finally had the honor of having Jonathan Lewis explain how Delphix works better than I can and Jonathan has only worked with Delphix for a few days. Incredible. Here is Jonathan’s presentation at the Delphix booth at Oracle Open World

Screen Shot 2014-11-03 at 3.56.46 PM

Uncategorized