Archive for November, 2015

Docker Data Containers

November 20th, 2015

photo by

The appeal of Docker is clear.
The Docker platform enables multiple applications to run concurrently on a single copy of an OS, either deployed directly onto a physical server or as a virtual machine (VM).

One challenge with Docker is having persistent storage for a container especially when that container gets restarted on another VM host and we want it to point to the same data.

Here is a video explaining a little of the problem and possible future solutions.


But what if you want a solution now?  A solution now is available with Delphix. If we add in Delphix with Docker we can easily move persistent data and a docker container to a new host.

For the docker container we will use wordpress that leverages a MySQL database for it’s persistent datastore.

In this example I will be using Delphix Express version and the Landshark source and target environment and will be using a wordpress docker container that points to a persistent MySQL data store..

The source and target machines already have docker and MySQL on them, so all we need to do is start MySQL, create a wordpress schema, start docker, get the docker wordpress container, then start a docker container with wordpress and point to the MySQL database.

On the Source
ssh  delphix@source
# password is delphix
su –
# password is delphix
vi /etc/my.cnf
#add line “user=root” then save file
service mysqld start
mysql -u root -p
CREATE USER wordpressuser;
SET PASSWORD FOR wordpressuser= PASSWORD(“password”);
GRANT ALL PRIVILEGES ON wordpress.* TO wordpressuser IDENTIFIED BY ‘password';

Then start docker and download the “wordpress” container

service docker start
docker pull wordpress

Start docker wordpress container

docker run -p 80:80 –name wordpress  \
           -e WORDPRESS_DB_HOST= \
           -e WORDPRESS_DB_USER=wordpressuser \
           -e WORDPRESS_DB_PASSWORD=password \
           -d wordpress

Now, on the source machine, we have a docker wordpress container using a persistent MySQL database for storage.

Now to move this docker container to another host, we can link in the MySQL database to Delphix. Once linkedin, we can provision a copy out to any registered host.
In Delphix, I just click + in the top left to add a datasource. The MySQL database should show up in the top of source.
Screen Shot 2015-11-19 at 4.49.43 PM
Screen Shot 2015-11-19 at 4.50.17 PM
Once linking is finished we can provision the MySQL out to another target machine and startup a new docker container pointing to this new virtual MySQL database (VDB).
I just click on the dSource on the left, then on the right click on provision and choose the target machine and a port. I’ll use 3306 for the port.

Now I have a thin clone of the MySQL database, a virtual database (VDB), running on the target machine. I just need to create a docker wordpress container to use it.

service docker start
docker pull wordpress

docker run -p 80:80 –name wordpress \

           -e WORDPRESS_DB_HOST= \
           -e WORDPRESS_DB_USER=wordpressuser \
           -e WORDPRESS_DB_PASSWORD=password \
           -d wordpress

Now I can access my wordpress blog on my target machine and modify it separately from the source. If the target machine goes down, I can migrate the VDB to another host and startup the container there and point the wordpress container to the same VDB now running on a new host.

One other change I made on the target VDB is changing the siteurl and home to be the new IP of the target machine:

mysql -u wordpressuser -ppassword -h -P 3306 wordpress << EOF

update wp_options set option_value=’′ where option_id<3;

select option_value,option_id from wp_options where option_id < 4;



From here we can set up architectures where the source is hosted directly on Delphix and Delphix can manage version controlling the source MySQL database and WordPress application.

We can spin out multiple VDBs in minutes for almost no storage to give to multiple developers to try modifications and merges of changes on.




Delphix versus Storage Snapshots

November 11th, 2015

4333881004_ff0835e8cc_zphoto by Gonzalo Iza

This article lists some of the key capabilities that Delphix provides over and above Storage Snapshot based cloning solutions to meet the increasing business demand for Agile Development.

I’ve blogged about this before in

First it is useful to contrast the distinct goals and implementation behind Storage Snapshots and Delphix.


Storage Snapshots


The primary use for storage snapshots is to enable backups of active database systems. A backup takes a long time and read more …



DB2 virtualized on Delphix

November 10th, 2015


Delphix is tightly integrated with PostGres, Oracle, Sybase and SQL Server but here is an example of  impressively easy workflow with DB2 which can be setup with a few extra scripts.



Second wave of cloud migration

November 9th, 2015
A second wave of cloud migration is now happening after the initial hiccups of


Reaction to Shadow IT

  • Misaligned Expectations
  • Increased Data Exposure and Administrative Surface Area of Risk
  • Choice of newer better cloud options now available after initial “wrong choice”
Reaction to Shadow IT: A significant amount of public cloud use over the past few years was done without IT oversight and approval. As companies develop their formal cloud strategies, it often means moving away from the cloud provider that was initially chosen by a line of business or an individual group and moving to a different public/private cloud or even back onsite. 

Misaligned Expectations: Many companies are realizing that they had not accurately estimated things like cost savings when they selected their public cloud environment. For example: while migrating data into a public cloud may be inexpensive, many companies have been surprised by the hidden cost of extracting data from these environments or moving data between regions.  Also, without constant monitoring and management of public cloud environments, many enterprise companies failed to efficiently decommission under-utilized workloads and therefore never realize any significant pay-as-you-go cost model savings.

Another area of misaligned expectations is in technical support. Simply put, most large public cloud providers do not offer the level of technical support expected by the majority of enterprise companies. This issue becomes critical given that the typical enterprise company may only have a handful of internal employees that have a deep understanding of public cloud architectures. When you combine limited cloud expertise with the inability to get enterprise-level phone and tech support, we often see the “scaling or phase two” portion of cloud projects hit an impassable roadblock.

Increased Data Exposure and Administrative Surface Area of Risk: While network-layer and physical security in public cloud data centers has proven to be adequate for most enterprise companies, many CISOs and security execs are very concerned about the risk of data exposure. This risk can be greatly increased in public clouds for two reasons:One, there are new administrative accounts that will have access to data and workloads. These accounts must now be managed and monitored by the IT security team and these accounts represent both a data exposure risk and a data protection risk (for example if an admin accidentally deletes a workload that hasn’t been backed up).

Two, the ability for multiple copies of sensitive data to be replicated in a cloud environment. This may be due to the public cloud provider’s underlying replication technology, or simply due to a company’s lack of data security controls in a cloud environment. For example, if a company had planned to implement business analytics in the cloud, it may require multiple copies of sensitive data to be sent to the cloud in order to complete the analysis reports. Even if this data is protected with encryption and identity management, a simple identity breach (the most common type of breach today) could expose that data in cleartext.

The result of this increased risk, especially given today’s threat climate, is causing many companies to abandon certain cloud projects unless more comprehensive data protection technologies (such as tokenization or data masking) can be implemented.

Choice: One of the simplest reasons for these statistics is the recent availability of choice. While it may be hard to believe, computing giants such as Google, Microsoft and VMware had no public cloud offering for enterprises just a couple years ago – and even companies like  IBM had only very limited private and hybrid cloud options. As enterprises develop a more mature cloud strategy, they are now able to select the best cloud environment for their individual cloud projects.


Second wave of cloud migration


As these initial experiences and hiccups have been encountered, understood and dealt with, we are seeing a more serious and consolidated move to the cloud – a second wave of cloud migration.


Say DevOps one more time …

November 6th, 2015

Screen Shot 2015-11-04 at 4.02.56 PM

Credit: Matthias Weinberger (CC2)

What is Devops?

As has been noted many of times, DevOps is a term that is hard to define.

DevOps is a term that is so easily misapplied in the enterprise that it’s popularity threatens adoption.

If you can’t define it for the bean counters – it might as well not exist at all.

 Tim O’Brien, Gleanster

Thus being able to clearly define DevOps is crucial to its success.

DevOps is a goal 

One of the problems in defining DevOps is that no two  DevOps teams are doing the same thing, so defining DevOps by what they do is currently impossible. On the other hand all DevOps teams have the same goal. That goal has been clearly articulated by Gene Kim as

DevOps enable fast flow of features from development to IT operations to the production.  

One could expand a bit more and say

DevOps enables Development, QA, IT Operations, Information Security and Product Management to work together to enable the fast flow of features from the business, to development, to QA, to IT to the customer, while preserving the stability, availability, security, and manageability of the production environment.

Screen Shot 2015-11-04 at 4.04.58 PM

How is the DevOps goal obtained ?

Everyone is in agreement as to the goal but the 6 million dollar question is how does one attain the goal? What tools and processes ?  There is a gnawing lack of discussion on what are the best tools and processes required to achieve the DevOps goal of fast flow of features. In order to increase flow in any system, the only productive action to take is to improve the constraint. The constraint is the slowest point in the flow. Any improvement not made at the constraint is an illusion as demonstrated by the works of Eli Goldratt. Thus the first step in DevOps is to identify the top constraint(s) and optimize them.

Gene Kim who has worked with 100s of CIOs and surveyed 14,000 companies listed the top 5 constraints in the fast flow of features as

  1. Provisioning environments for development
  2. Setting up test and QA environments
  3. Architecting code in development to facilitate easy changes in code
  4. Development speed of coding
  5. Product management  feature input

Screen Shot 2015-11-04 at 4.05.06 PM

The only way to optimizer the fast flow is to take the top constraint tune it, then move to the next constraint. The first 2 constraints that are most commonly seen are both centered around environment provisioning.  As Gene Kim puts it

One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need it. – Gene Kim

Thus the question for most companies is how to improve the provisioning of environments for development and QA.

Why is environment provisioning such a bottleneck?

Environment provisioning is such a bottleneck because historically environments take a long time and many resources to create and provision and developers depend on these environments to code the applications.  QA depend on these environments to test the code . IT  depend on these environments to verify deployment processes. When environment provisioning can be done quickly, repeatedly and efficiently then one can accomplish the core  of DevOps: Continuous Integration and Continuous Deployment.  Continuos Deployment is an automated process where developers code check-ins are integrated multiple times a day, run through QA and verified in IT for correct deployment. Some companies take it even farther with Continuos Delivery where the changes  that pass testing are rolled via automation into production.

Continuos Deployment enables the fast flow of features from development, to QA, to UAT, to IT  (and even to production in the case of Continuos Delivery).

What tools enable Continuos Deployment ?

Continuos deployment depends on automation and efficient environment provisioning to work, thus tools that automate code management, automate environment provisioning and improve efficiencies are key.

Minimum prerequisites for continuous integration

    • version control system like GIT
    • test automation like Jenkins
    • environment provisioning

Version Control

Version control can be implemented with any number of tools, of which one of the most popular  now is Git.

Test automation

Test automation can such as Jenkins and Team City can orchestrate running tests when changes are introduced in source control repositories.

But to move to continuous integration where code changes can be QA’ed  and verified for deployment multiple times day requires efficient provisioning of QA and verification environments.

Environment provisioning

To reach the goal CI/CD of multiple deployment tests at day requires not only integrating  code checkins in source control with automated test management tools but it also requires fast efficient access to environments to run those tests.

Machine virtualization

One of the first decisive technologies that came along to support efficient environment provisioning was machine virtualization such as VMware. With machine virtualization, virtual machines (VM), could be spun up or down quickly and easily thus providing the efficiency gains needed for DevOps and opening up machine provisioning to automation now that it was controlled by software.

Configuration tools automate VM provisioning

Configuration tools like Chef, Puppet and Ansible are programmed to automate and manage the provisioning of VMs with the desired configurations, packages and software.

Continuous Integration (CI/CD)  Landscape

Thus to implement CI/CD requires

  1. version control system like GIT
  2. test automation like Jenkins and Team City
  3. efficient environment provisioning
    • Machine virtualization (VM) : VMware, KVM, Docker etc
    • VM provisioning tools : Chef, Puppet, Ansible

Screen Shot 2015-11-04 at 4.01.39 PM


Biggest Obstacle to  CI/CD 

Now that code changes are automatically QA’ed and VMs can be spun up in the right configuration, there is still the problem of “how do we get data to run QA tests on?” For example what if the application depends on data managed by a large Oracle database? How can one efficiently produce copies of this database for use in the flow from development to testing to deployment ?

Fortunately there is a technology called data virtualization. As virtual machine technology opened the door to  continuos integration, data virtualization swings it wide open for enterprise level application development that depend on large databases.

Data virtualization is an architecture (that can be encapsulated in software as Delphix has done) which connects to source data or database, takes an initial copy and then and forever collects only the changes from the source (like EMC SRDF, Netapp SMO, Oracle Standby database). The data is saved on storage that has either snapshot capabilities (as in Netapp & ZFS or software like Delphix that maps a snapshot filesystem onto any storage even JBODs). The data is managed as a timeline on the snapshot storage. For example Delphix saves by default 30 days of changes. Changes older than the 30 days are purged out, meaning that a copy can be made down to the second anywhere within this 30 day time window.

Thus to implement CI/CD requires efficient environment provisioning. Efficient environment provisioning depends upon tools and technology such as:

  1. Configuration Management: Chef, Puppet, Ansible
  2. Machine virtualization  : VMware, KVM, Docker etc
  3. Data virtualization :  Delphix

Once environment provisioning is streamlined then the next step is automating the flow of developer commits through QA and deployment testing with test automation tools like  like Jenkins and Team City.

There are examples coming out to document how to wire these tools togethers such as a recent one on Jenkins and Ansible.

Efficient environment provisioning and test automation are just the core.  From there we can build out more of the processes to attain the goals of DevOps.


The ambiguous descriptions of DevOps are undermining the movement, but DevOps is defined as the tools and culture that support a continuous delivery value chain. Now with a stake in the ground to rally around, the goal is no longer saying what is or what is not DevOps but instead becomes mapping out the best tools, processes, methods and cultural changes  to support the DevOps definition and mapping these to the various situations encountered in the industry. Everyone is doing DevOps, i.e. software development, integration, QA and release, the questions is how well are you doing it and can you improve it?

Virtual data improves businesses’ bottom line by eliminating the enormous infrastructure, bureaucracy and time drag that it takes to provision databases and data for business intelligence groups and development environments. Development environments and business intelligence groups depend on having a copies of production data and databases and data virtualization allows provisioning in a few minutes with almost no storage overhead by sharing duplicate blocks among all the copies.


Screen Shot 2015-11-04 at 4.00.54 PM


Here is an episode from Arrested DevOps talking about the problem with data and databases in DevOps



IDC report: Virtual Data 461% ROI over 5 years

November 5th, 2015

read the full IDC report on Virtual Data here.



Here are some key sound bites from the report:

“Data has turned IT departments into business enablers. Data acts as a tax on IT operations when IT departments cannot do what businesses need them to do because they are too busy ‘keeping the lights on.’ ”

“When we bought Delphix, the original reason was all of the terabytes we were avoiding. But now, the main benefit for us is the agility that it gives us — the increase in productivity. We didn’t realize when we bought it that the agility would be so great.”

Companies using virtual data “have increased the number of copies per database they maintain from 6 to 17.”

“Within a year, we were running three times the number of testing environments, but without expanding our storage needs.”

” ‘We would have had to provision another five to six physical environments at about a million dollars cost’ to approach the functionality unleashed by Delphix.”

“Companies say that their application development teams are much more productive with Delphix. This productivity stems from their spending far less unproductive time waiting for fresh databases and having to resolve fewer data-related issues.”

“Now, these developers are probably saving about two weeks of time in terms of getting their databases per testing cycle.”

“Staff reallocated due to time savings with Delphix gets to focus on other things that need work, such as automation and innovation.”

“These companies are reducing downtime instances by putting more recently refreshed and higher- quality data into their development and testing databases as a result of using Delphix.”

Report says that “because they can do database refreshes much faster, companies need less time to fix data-related problems. One company noted that it runs 40 regression tests per database during development efforts with Delphix instead of only 5 or 6 without taking any additional time. ”

“The companies interviewed for this study also report that they are achieving significant business productivity gains by using Delphix’s data virtualization software, including higher revenue.”

“Delphix is helping eliminate data’s drag on business operations for these companies and driving business value for them by:

  • Reducing the time needed to push applications through to end users and customers
  • Allowing companies to undertake more data-driven projects
  • Enabling productivity gains for employees across these organizations

“With Delphix, database refresh time has gone from an average of four days to minutes, making the companies’ use of data more agile and productive.”

“Faster refreshes mean that more testing is occurring, data quality is higher and more accurate, and employees of all types are spending less unproductive time waiting for data to be provided.”

“As one customer explained, ‘We’ve gone from three days to provision an environment where the data was two days stale when it arrived to minutes. That’s a complete paradigm shift.’ ”

“The companies reported that they have reduced the average time it takes them to deploy a customer-facing application from more than two months to less than one month because they are getting the needed data much faster with Delphix.”

“Delphix is also helping these companies complete internal upgrades 34% faster, meaning that they are only taking five months on average to complete such deployments with Delphix, down from eight months before.”

“Interviewed companies report completing 29.4% more data-driven projects per year with Delphix and a strong 8.5% employee productivity gain on projects impacted by Delphix.”

“According to one company, Delphix has helped a team of 100 employees achieve up to a 25% productivity gain by making possible the automation of more than 1,000 reports.”

“His company estimated that Delphix was helping it realize at least an additional million dollars in revenue per year by speeding up projects through improved data provisioning.”

“The chief cause of slow database application development, inefficient database testing, and resultant application failures and interruptions in business is not storage per se but the complex tasks associated with making and refreshing test and development copies of databases, including resource deployment, staff time, and recurring tasks for testing. These things don’t change regardless of how cheap the storage becomes or how easy it is to use.”

“A substantial roadblock to the agility of the enterprise is its inability to evolve database applications fast enough to meet the changing requirements of a rapidly evolving business climate.”

“The dramatic simplification of database administration tasks saved the time usually spent in defining development and test databases and getting them up and running so that database administrators could concentrate on the more high-value tasks of supporting the applications teams in adjusting and perfecting both the database and its application.”

“There’s a simple fact preventing most enterprises from transitioning to rapid
development iterations. It just takes too much time to test critical systems
against a snapshot of production data.”

Report says “93% of top-perfoming companies view the need to increase the frequency of
software development deployments as a top reason to invest in a virtual database. “


Make it easy or die (software is eating the world)

November 5th, 2015

Screen Shot 2016-01-20 at 11.56.47 AM



Software is eating the world

  • We see taxis being beaten out by Uber and Lyft.
  • We see hotels being undercut by Airbnb.
  • We see brokerage firms undercut by Ameritrade and Etrade.
  • We see retailers under cut by Amazon.
  • Video stores have been supplanted by Netflix

Today it’s all about developing software that makes access to your product easier.

We are also seeing the same thing happen in IT where cloud such as AWS is undercutting industry hardware vendors like Oracle, Netapp, EMC

NetApp lost 25% revenue. IBM storage revenues dropped 19%.  It’s like the collapse of SGI, Sun and DEC

It’s all about how fast you an get ease of use to the market.

That’s why DevOps and Cloud is so important. It’s about speeding up the flow of features requests through development to QA to production.

Screen Shot 2015-11-04 at 7.21.57 PM



Photo by angelo Yap


photo by Joel Kramer


photo by Robert S


photo by Yaniv Yaakubovich



photo by F33


Just today this article came out on the growth of AWS.

From inside a data center, especially at some old, crusty IT department of some large corporations, it may seem like any significant usage of the cloud could never happen, but looking at the growth rate, it’s hard to say the impact won’t be huge.



Related Article:  The Uber effect: Digital disruption a growing concern for companies



Steve Jobs : the journey of simplicity

November 4th, 2015

Steve Jobs sets a great perspective on the journey of simplicity. It starts from simple, goes through complexity and ends up in simplicity. 

“When you start looking at a problem and it seems really simple, you don’t really understand the complexity of the problem. Then you get into the problem, and you see that it’s really complicated, and you come up with all these convoluted solutions. That’s sort of the middle, and that’s where most people stop. But the really great person will keep on going and find the key, the underlying principle of the problem — and come up with an elegant, really beautiful solution that works.” – Steve Jobs



Diff’ing AWR reports

November 4th, 2015

I don’t know if you are ever asked to compare to AWR periods. AWR period comparison is pretty easy if you have access to the two periods in the same AWR repository. AWR in the same repository can be compared with

         [db_id ],
         [instance id],
         120, -- start snapshot id
         121, -- end snapshot id
         [db_id of target,
         [instance id] ,
         122, -- start snapshot id
         123  -- end snapshot id));

and it can be run for single instance as

              (select dbid from v$database),
              120, -- start snapshot id
              121, -- end snapshot id
              (select dbid from v$database),
              122, -- start snapshot id
              123  -- end snapshot id));

This puts out a bit of a messy but useful report.

Here is an example of using it from Doug Burns

A similar, but cleaner simpler report that I partially designed can be run from OEM

but what if someone sends you two AWR reports? How can they be compared? These days I’m receiving at least a couple a week to compare, so I put together a compare script.
usage: [type] file1 file2

where type

  • sevt = system events , ie wait events
  • stats = system statistics
  • load = load profile section
  • init = init.ora

for example sevt awr1.txt awr2.txt
... Statistics requested is load
... 1st report.txt
... 2nd report.txt

============================= load_psec ==============================
Name                               Ratio 1/2   Value1     Value2     Delta
Physical_reads:                   :    0.29:    266.20:    905.33:    639.13
Physical_writes:                  :    0.70:    585.32:    836.75:    251.43
Logons:                           :    0.86:      1.27:      1.48:      0.21
Logical_reads:                    :    1.04: 747342.68: 718259.28:  -29083.4
Redo_size:                        :    1.17:3516126.09:2995591.47:   -520535
Sorts:                            :    1.31:   3981.16:   3027.78:   -953.38
User_calls:                       :    1.38:  16476.53:  11948.71:  -4527.82
Parses:                           :    1.39:   4541.51:   3279.06:  -1262.45
Executes:                         :    1.44:  10619.75:   7350.55:   -3269.2
Hard_parses:                      :    1.89:      0.17:      0.09:     -0.08
Block_changes:                    :    2.38:  18936.62:   7942.27:  -10994.3

============================= load_ptrx ==============================
Name                               Ratio 1/2   Value1     Value2     Delta
Logons:                           :    0.00:      0.00:      0.01:      0.01
Physical_reads:                   :    0.11:      0.43:      3.94:      3.51
Physical_writes:                  :    0.26:      0.95:      3.64:      2.69
Logical_reads:                    :    0.39:   1218.11:   3123.70:   1905.59
Redo_size:                        :    0.44:   5730.99:  13027.80:   7296.81
Sorts:                            :    0.49:      6.49:     13.17:      6.68
User_calls:                       :    0.52:     26.86:     51.96:      25.1
Parses:                           :    0.52:      7.40:     14.26:      6.86
Executes:                         :    0.54:     17.31:     31.97:     14.66
Block_changes:                    :    0.89:     30.87:     34.54:      3.67

of course if your AWR report is an html file, then the current script won’t work. One workaround is to run the html through a text converter like

Again the script is available here:

This script was originally written back before statspack and was based on utlstat. If you look closely you will even see that the code is actually modified by Connie Dialeris, aka the writer of statspack. Before Connie put together statspack, she was looking at the usability of my scripts. I had written a couple of scripts, and The idea of these scritps was to continuously looped collecting database statistics to flat files. Flat files were used to avoid an extra overhead of inserting data into the database. The data could be formatted the into a utlstat like report with Instead of writting a diff report on the raw data, I wrote a diff report that could be used for two different utlstat reports from customers as well as the raw data. This strategy was lucky because it was easy to update the diff script for statspack and AWR.



November 4th, 2015

Here is a short video

D3 Show Reel from Mike Bostock on Vimeo.

Here is a longer tutorial video

Data-driven Documents from London Web Standards on Vimeo.

You can go through the actual presentation slides at
NOTE: these”slides” are active pages. You can click on the pages and interact with them.
They are in live d3. Try this page for example and click on one of the  points in the graph

This page is a fun one:

Drag your mouse around the page – it eventually draws a picture


Get started yourself