Archive

Archive for December, 2013

automate or die – the revolution will be automated

December 27th, 2013

:mobilestreetlife_1

photo by David Blackwell.

The worst enemy of companies today is thinking that they have the best processes that exist, that their IT organizations are using the latest and greatest technology and nothing better exists in the field. This mentality will be the undermining of many companies.

The status quo is pre-ordained failure.

Innovation is and always has separated the victors in industry. Innovation is constantly pushing what  can be automated.  Most tech companies are essentially american auto companies pre-Ford.

 

 

Other guys Fort Model T
$2000-$3000 $850
made by hand assemly line
12.5 hours to build 1.5 hours

 

by 1930, 250 car companies had died

What we do manually is constantly being automated.  Why?  Because automation saves time, improves efficiency and enables  feedback control.  Without automation tasks have to be done manually which increases errors.

People make mistakes, call in sick, forget to perform tasks, leave and get things wrong. A computer, when properly programmed, gets it right all the time. – Hilton Lipschitz

If  the  tasks involve  multiple steps  done by different people then the total time to completion rises exponentially. Why does time to completion  rise when different steps of a task ? The time rises because of the queueing time between people.

The book, The Phoenix Project, lays out the impact of passing a task to another person or group in the following graphic:

 

Screen Shot 2013-11-12 at 12.18.37 PM

What the graphic shows is that the busier the person is who is responsible for the next step the longer it takes from them to complete the task and the time goes up exponentially the busier they are.  At one major customer of mine on Wall St., a single request to provision a copy of a database  required 34 approvals. No wonder that customer took typically 3-6 months to provision a copy of a database.

If you are a CIO of a company and you want to improve the productivity  and agility of your development teams, what is the most important improvement you can make? The most important improvement you can make depends on what the bottleneck is in you development organization. The only way to improve the efficiency of an organization is to improve the efficiency of the main constraint, the bottleneck. Improving efficiency elsewhere is just an illusion at best or detrimental at worst. Only by tuning the bottleneck will you increase productivity.  One you find the bottleneck automate it if possible.

Considering IT as a data factory, we could apply the DFA principles to data and enumerate them as:

  • Reduce the number of people, steps, and parts to get data clones provisioned.
  • Simplify and standardize the steps that must remain
  • Automated provision wherever possible
  • Use standard Data Management Processes across data clones of all flavors.
  • Simplify the process to connect data clones to the hosts and applications that need it.
  • Encapsulate or eliminate the need for special knowledge to get a data clone to operate; Make it Dead Simple.

First task is to find out what the the bottleneck is.  The most common bottleneck is provisioning environments for development and test.

Operations can never deliver environments upon demand. You have to wait months or quarters to get a test environment.  When that happens terrible things happen. People actually horde environments.  They invite people to their teams because the know they have  reputation for having a cluster of test environments so people end up testing on environments that are years old which doesn’t actually achieve the goal. –  Gene Kim

The most powerful thing that orgs can do is to enable dev and testing to get environment when they need it” – Gene Kim

Much of environment provisioning has begun to be automated already with Puppet, Chef and virtual machines. What is not been automated until recently is the provisioning of database copies. What is the impact of not automating database provisioning? What is the cost to companies of being constrained by the enormous bureaucracy  of provisioning QA, development and reporting environments?

The impact we’ve seen have been

  • 96% of QA cycle time spent building  for QA environments instead of testing
    • single threaded QA work because of limited ability to provision concurrent environments
  • 95% data storage spent on duplicate data with storage limits constraining and impeding what data can been copied
  • 90% of developer lost time due to waiting for data in development environments
  • 50% of DBA time spent making database copies constraining availability on other important work
  • 20% of bugs of production bugs slipping in because of using subsets in development and QA

A clear indication of the impact is to compare efficiencies before automating database provisioning with virtual databases and after. After companies have implemented Delphix automated virtual database provisioning we see

  • QA has gone from 4% efficiency to 99% efficiency meaning 99% of a QA cycle is actually running the QA suite instead of waiting for a QA environment build
    • Accelerated QA work with the ability to provision many environments concurrently
  • Petabytes of storage freed and little to no limit on number of environments that can be provisioned
  • Companies have doubled or more development team output
  • DBA’s have gone from 8000 hours/year of database copying to 8 hours
  • Elimination of bugs slipping into production due to using old or subset data for QA

Delphix  accelerates application releases  driving revenue growth while driving costs down.

This is why Delphix is used by

  • Fortune #1 Walmart
  • #1 pharmaceutical Pfizer
  • #1 social Facebook
  • #1 US bank  Wells Fargo
  • #1 networking  Cisco
  • #1 cable provider Comcast
  • #1 auction site Ebay
  • #1 insurance New York Life
  • #1 chip manufacture Intel

The list goes on.

“ What is so provocative about that notion is that any improvement not made at the constraint is an illusion.  If you fix something before the constraint you end up with more work piled up in front of the constraint.  If you fix something after the constraint you will always be starved for work. In most transformations, if you look at what’s really impeding flow,  the fast flow of features, from development to operations to the customer,  it’s typically IT operations. …  Operations can never deliver environments upon demand. You have to wait months or quarters to get a test environment.  When that happens terrible things happen. People actually horde environments.  They invite people to their teams because the know they have  reputation for having a cluster of test environments so people end up testing one environments that are years old which doesn’t actually achieve the goal. …  One of the most powerful things that organizations can do is to enable development and testing to get environment they need  when they need it” –  Gene Kim

 

 

Uncategorized

Continuous Integration for Database Development

December 26th, 2013

Below is a good presentation from Vladimir Bakhov that Dominic Delmolino pointed me to.
( Dominic just did a good presentation with Redgate on CI and TDD available at http://youtu.be/smkb4Sa54d0 )

Screen Shot 2013-12-26 at 10.48.45 AM

 

 

 

DataBase CI ENG

Uncategorized

Christo Kutrovsky on Oracle, memory & Linux at Oaktable World 2013

December 23rd, 2013

 

 

 

 

 

Screen Shot 2013-12-23 at 3.46.31 PM

Christo Kutrovsky, from Pythian, gives an awesome talk at Oaktable World 2013 (see http://oaktableworld.com )

 

Uncategorized

Dear Santa

December 20th, 2013

 

dba_gitfts

 

Dear Santa,

It’s no secret that there are some tasks that overwhelm existing database architectures, making projects consistently come in over budget and overdue.  This holiday season, on behalf of Database Administrators everywhere, we’d like to ask Santa for:

  1. An end to the constant struggle for more and more disk space for databases and copies

  2. An army of smart elves to run and test backups

  3. Less dependence on physical/virtual administration and storage teams to create working environments for development and QA

  4. A better monitoring tool (say EM on steroids and runs on all databases SQL Server, DB2 etc)

  5. Self service tools f0r developers that can take some of the load of DBAs

  6. Less tedium to common tasks like cloning databases or providing work areas for other teams to use

  7. A team of developers who love to write both elegant SQL code and efficient database interfaces

  8. An evolved role as the central hub for all needs data related so that the experienced DBA is protected from being a bottleneck

  9. A pair of Google glasses (just because they are cool)

  10. New flash storage array

 

And Santa, if you can’t help with the beleaguered DBAs’ wish-list this year – we know it’s a lot – Delphix might be the answer.

 

Sincerely,

DBAs Everywhere

 

Uncategorized

Data IS the constraint.

December 20th, 2013

7557271168_41e1584f65_c

photo by David Blackwell

by Woody Evans

You are paying a data tax. You are paying a data tax moving and copying data and moving and copying data over and over again. Moving the data IS the big gorilla.  This gorilla of a data tax is hitting your bottom line hard.  When moving data is too hard, then the data in non production systems such as reporting, development or QA becomes older, and the older the data, the less actionable intelligence your BI or Analytics can give you. The less intelligence, the more missed revenue. The longer it takes you to match data from system A and system B, the more opportunities for your customer to get frustrated with the fact that the left hand hasn’t talked to the right hand – that you have no intimacy with them – you don’t seem to even know them. The longer it takes your systems to be tested (because it takes so long to reload the data) the fewer real features make it to market, and the more you put your market share at risk.

Business skeptics are saying to themselves that data processes are just a rounding error in most of their project timelines, and that they are sure their IT has developed processes to fix that. That’s the fundamental mistake. The very large and often hidden data tax lay in all the ways that we’ve optimized our software, data protection, and decision systems around the expectation that data is simply not agile. The belief that there is no agility problem is part of the problem.

How big is the data tax? One way we can measure it is by looking at the improvements in project timelines at companies that have eliminated this data tax through implementing data virtualization appliance (DVA) and creating an agile data platform (ADP). Agile data is data that is delivered to the exact spot it’s needed just in time and with much less time/cost/effort. By looking at productivity rates after implementing an ADP compared to before the ADP we can get an idea of the price of the data tax without an ADP.  IT experts building mission critical systems for Fortune 500 companies have seen real project returns averaging 20-50% productivity increases after having implemented an ADP.  That’s a big data tax to pay without an ADP. The data tax is real, and once you understand how real it is, you realize how many of your key business decisions and strategies are affected by the agility of the data in your applications.

Took us 50 days to develop an insurance product … now we can get a product to the customer in 23 days with Delphix

Data Agility by the Numbers

Let’s take a high level look at the kinds of cost, revenue, and risk impact that agile data can have to your business in four key areas: Business Level Initiatives, Application Operations, IT Operations, and IT Systems. In each of these cases, we’re incurring cost we can avoid or  we are missing revenue that we could be capturing or accepting business risk when we don’t have to do so.

Business Level

At the business level, we’ve lived with the constraint of slow data for a long time. We may offshore the data management task of our nightly BI refresh. We may only allow business users to run their reports at night because the cost to copy data off production is too much, and the need for fresh data never goes away. We may live with week old data in our BI because our network is built for data protection and can only handle a full copy during off hours. To get features out the door, we may spend less time testing or simply accept the fact that there will be more errors post production and ignore that cost because it is born by operations. But what if data were agile?

If data were agile then, instead of paying for 2 full time offshore resources to get data that is already day old when we get it the next day, we could instead have minutes old data in minutes and get it automatically. With agile data margins go up and revenue opportunities increase (for example, wouldn’t it be good for Walmart in California to know that Lego Batman sold like hotcakes as soon as it hit the shelves in New York that morning?). Multiply that by 100 applications, and you’re talking about real money and competitive advantage. Instead of running 5 tests in two weeks (because it takes me 2 days to rollback after each of my 1 hour tests) and paying the cost of bugs slipping into production, what if I could run 15 tests in that same two weeks and have no bugs at all in production? Costs fall, quality rises, customer satisfaction rises, competitive position strengthens. Even better, what if I could get more features into my release because I knew my quality testing would be robust enough to handle it, and I had enough time in my schedule? How much is having new features faster worth? What about really big problems like consolidating data center real estate, or moving to the cloud? If you can non-disruptively collect the data, and easily and repeatedly present it in the target data center, you take huge chunks out of these migration timelines. Moreover, with data being so easy to move on demand, you neutralize the hordes of users who insist that there isn’t enough time to do this, or its too hard, or too risky. Moving the data IS the big gorilla. Eliminating the data tax is crucial to the success of your company. And, if huge databases can be ready at target data centers in minutes, the rest of the excuses are flimsy. We know from our experience that there are some $1B+ Data center consolidation price tags. Taking even 30% of the cost out of that, and cutting the timeline, is a strong and powerful way to improve margin.

When the cost to get fresh data falls precipitously, better decisions and faster delivery mean better margins, higher profitability, better revenue growth, and faster time to market. And, for the ones capable and willing to change the way they do business to take maximum advantage, it means better EPS, higher EBITDA, and a business ready to succeed.

Application Operations

Forrester estimates that companies will spend $97.5B in application outsourcing and management in 2013 (Forrester publication 83061). For large ecosystems, such as Oracle E*Business or SAP, the complexity of data management can be sprawling, with each landscape consuming dozens of databases, and landscapes being built not only for lifecycle copies for current development (Dev, Test, QA, Etc.) but for multiple streams of development (Release 2.1, 2.2, 2.3, 2.4, etc.)

For application teams, data constraints are often masked as something else. For example, maybe the application is only allocated so much storage, so there can only be so many full size copies, so the developers have to make do with stripped down copies (that can have unexpected results in production) or shared copies (which often involves all sorts of extra reset/rollback operations as well as freezes and holds so that one developer doesn’t clobber another). We trade productivity for cost in this phase. But, the primary cost sink is the data – storing it, moving it, copying it, waiting to be able to use it. So, business responds. Sometimes, the business lives with the risk of using subsets and slower test cycles by pushing the timeline: Saving cost at the expense of delivering to market quickly. Sometimes the business invests in a lot more hardware (and for many of our customers – a runaway storage bill): Delivering Quickly at the expense of higher cost and lower margin. Sometimes, the business just squeezes testing: Delivering lower quality applications sooner at the cost of higher remediation, and lower customer satisfaction. The point is that data is the big culprit in all of these.

Agile data – virtualized data – uses a small footprint. A truly agile data platform can deliver full size datasets cheaper than subsets. A truly agile data platform can move the time or the location pointer on its data very rapidly, and can store any version that’s needed in a library at an unbelievably low cost. And, a truly agile data platform can massively improve app quality by making it reliable and dead simple to return to a common baseline for one or many databases in a very short amount of time. Applications delivered with agile data can afford a lot more full size virtual copies, eliminating wait time and extra work caused by sharing, as well as side effects. With the cost of data falling so dramatically, business can radically increase their utilization of existing hardware and storage, delivering much more rapidly without any additional cost. An agile data platform presents data so rapidly and reliably that the data becomes commoditized – and servers that sit idle because it would just take too long to rebuild can now switch roles on demand.

Next Post: IT Operations, and IT Systems.

Uncategorized

Data Flood

December 16th, 2013

 

tjblackwellIt is likely you are familiar with the statement that 71% of the Earth’s surface is water.  But did you know that overall 92% of the cost of business — the financial services business — is “data”?

http://www.wsta.org/resources/industry-articles/

 

 

 Data Flood

 

 

Production data is clearly critical for the core of businesses, such as

  • Product orders received, opened, processed
  • Back accounts deposits, withdrawals, interest accrued
  • Telephone calls, times and duration
  • Quote-to-Cash

photo by Tom Blackwell

but the data is also fundamental to other parts of the business.  The data is constantly being copied and pumped across corporate infrastructure. Critical business databases  always have a failover database that can be used in case the primary database ever goes done. The failover database is constantly pulling in changes from production and just the changes,  but on top of the fail over database, businesses incur a triple data copying tax. For every piece of production data, a copy is also made  typically   for each of

  1. Business intelligence and analytics
  2. Project data for development, testing, QA, integration, UAT and training
  3. Data protection backups for recovery

These copies of production data are  repeatedly  made which  taxes corporate IT infrastructure.

Copying the production data requires considerable operational expense requiring work from the applications administrators,  DBAs,  system administrators, storage administrators, backup administrators and network administrators.

All of the copies of course require considerable capital expenditure such as storage space which requires more hardware and datacenter space. The data has to be transferred over company networks  clogging communication pathways. The data has to be read from one set of storage taxing that storage system and written to a remote storage system impacting the performance of that system as well.  For example the recommended method of backing up an Oracle database is to take a full backup every weekend and incremental backups every day in between. Thus every weekend, for the full backup, there is a significant strain put on disk, network and infrastructure.  The key here is it’s not just the backup copy that is created  but other systems require the data as well and the data is copied to all those systems. Data warehouses require the data and either go through data warehouse refreshes or  are populated by running extract, transform and load (ETL) jobs on the production data. ETL jobs and data warehouse refreshes are limited by nightly batch windows but has data sets grow these jobs are pushing up against the batch windows. If the jobs overrun the batch window and have to be killed then it can mean a business decision group working on old data for the next week. With old data  the business risks making decisions that are incorrect and have negative financial impact. Furthermore, corporations are going global which eliminates any windows to run data warehouse refreshes or ETL jobs further complicating the system and ins some cases causing businesses to run these jobs at times where some part of the global corporation has to suffer the performance overhead impact.

Due to the considerable capital expenses, operational expenses, infrastructure tax and work required to make copies of production data , the copies often are slow to make and or hit delays.  These delays and wait times impact the business analysis who have to make the decisions the business depends on to guide the company. The delays impact application development teams who build the applications that companies depend on to generate revenue. The delays reduce productivity and quality and ultimately negatively impact revenue.

All 3 of these data copy classes, BI, backup and application development, cross over organizational division boundaries adding to dependencies and management complexity. The more complex the management and the more teams involved the more delays these copies suffer.

All of the infrastructure, operational and capital expenses as well as management overhead add up to a lot of expenses.  For example, in financial services,  one study calculated that the cost of data alone is responsible over 90% of the overall business cost. Of the cost of handling data much is due to the costs of acquiring and of course processing the data but surprisingly more than  60% of the cost is due to the more pedestrian  tasks of storing, retrieving, distributing and delivering.   (http://www.wsta.org/resources/industry-articles/)

Of course as anyone who had been reading my blog over the last 2 or 3 years knows, the majority of this data flood can be eliminated with Delphix. With Delphix , there is only one copy of unique data but that one copy can be shared by multiple databases. Each database copy shares the duplicate blocks for which there is only one copy. Any blocks that are modified by a database are kept private to that database. As far as new changes from the source, or production database, only the changes are captured into Delphix putting only a light load on infrastructure.  Delphix is fully automated taking care of all the steps of capturing changes from the production database and makes the creation of data copies a simple self service exercise that anyone can do in a few minutes.

 

Uncategorized

I/O Benchmarking tools

December 2nd, 2013

This blog post will  be a place to park ideas and experiences with I/O benchmark tools and will be updated  on an ongoing basis.

Please feel free to share your own experiences with these tools or others in the comments!


There are a number of tools out there to do I/O benchmark testing such as

  • fio
  • IOZone
  • bonnie++
  • FileBench
  • Tiobench
  • orion

My choice for best of breed is fio
(thanks to Eric Grancher for suggesting fio).

Orion

For Oracle I/O testing , Orion from Oracle would be the normal choice but I’ve run into some  install errors which were solved  but more importantly run into  runtime bugs.

IOZone

IOZone, available at  http://linux.die.net/man/1/iozone, is the tool I see the most references to on the net and google searches. The biggest drawback of IOZone is the there seems to be no way to limit the test to 8K random reads. Example

Bonnie++

http://www.googlux.com/bonnie.html

Bonnie  is a close to IOZone, but not quite as flexible and even less flexible than fio. Example.

FileBench

Haven’t investigated FileBench though looks interesting.

Tiobench

http://sourceforge.net/projects/tiobench/

not much info

Fio

flexible I/O tester

Here is a  description from the above URL:
“fio is an I/O tool meant to be used both for benchmark and stress/hardware verification. It has support for 13 different types of I/O engines (sync, mmap, libaio, posixaio, SG v3, splice, null, network, syslet, guasi, solarisaio, and more), I/O priorities (for newer Linux kernels), rate I/O, forked or threaded jobs, and much more. It can work on block devices as well as files. fio accepts job descriptions in a simple-to-understand text format. Several example job files are included. fio displays all sorts of I/O performance information. Fio is in wide use in many places, for both benchmarking, QA, and verification purposes. It supports Linux, FreeBSD, NetBSD, OS X, OpenSolaris, AIX, HP-UX, and Windows.”
with fio, setup options in a file with benchmark configuration, example

 

# job name between brackets (except when value is "global" )
[read_8k_200MB]

# overwrite if true will create file if it doesn't exist
# if file exists and is large enough nothing happens
# here it is set to false because file should exist
overwrite=0

#rw=
#   read        Sequential reads
#   write       Sequential writes
#   randwrite   Random writes
#   randread    Random reads
#   rw          Sequential mixed reads and writes
#   randrw      Random mixed reads and writes
rw=read

# ioengine=
#    sync       Basic read(2) or write(2) io. lseek(2) is
#               used to position the io location.
#    psync      Basic pread(2) or pwrite(2) io.
#    vsync      Basic readv(2) or writev(2) IO.
#    libaio     Linux native asynchronous io.
#    posixaio   glibc posix asynchronous io.
#    solarisaio Solaris native asynchronous io.
#    windowsaio Windows native asynchronous io.
ioengine=libaio

# direct If value is true, use non-buffered io. This is usually
#        O_DIRECT. Note that ZFS on Solaris doesn't support direct io.
direct=1

# bs The block size used for the io units. Defaults to 4k.
bs=8k

directory=/tmpnfs

# fadvise_hint if set to true fio will use fadvise() to advise the kernel
#               on what IO patterns it is likely to issue.
fadvise_hint=0

# nrfiles= Number of files to use for this job. Defaults to 1.
nrfiles=1
filename=toto.dbf
size=200m

Then run

$ fio config_file

read_8k_200MB: (g=0): rw=read, bs=8K-8K/8K-8K, ioengine=libaio, iodepth=1
fio 1.50
Starting 1 process
Jobs: 1 (f=1): [R] [100.0% done] [8094K/0K /s] [988 /0  iops] [eta 00m:00s]
read_8k_200MB: (groupid=0, jobs=1): err= 0: pid=27041
  read : io=204800KB, bw=12397KB/s, iops=1549 , runt= 16520msec
    slat (usec): min=14 , max=2324 , avg=20.09, stdev=15.57
    clat (usec): min=62 , max=10202 , avg=620.90, stdev=246.24
     lat (usec): min=203 , max=10221 , avg=641.43, stdev=246.75
    bw (KB/s) : min= 7680, max=14000, per=100.08%, avg=12407.27, stdev=1770.39
  cpu          : usr=0.69%, sys=2.62%, ctx=26443, majf=0, minf=26
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w/d: total=25600/0/0, short=0/0/0
     lat (usec): 100=0.01%, 250=2.11%, 500=20.13%, 750=67.00%, 1000=3.29%
     lat (msec): 2=7.21%, 4=0.23%, 10=0.02%, 20=0.01%

Run status group 0 (all jobs):
   READ: io=204800KB, aggrb=12397KB/s, minb=12694KB/s, maxb=12694KB/s, mint=16520msec, maxt=16520msec

 

Uncategorized