Theory of Constraints: the rise of NoSQL



Why did NoSQL arise?  The rise of NoSQL makes no sense if you read an article like “Why You Should Never Use MongoDB”  and the ycombinator replies . Then I read something like this “Why You Should Use MongoDB” and don’t find any reasons just  ranting about why the first article was shoddy.

One theory proposed by Martin Fowler is that NoSQL arose as a way to get around the constraint of the DBA bottleneck. When developers want to change the schema or get copies of the source database the DBAs are just seen as an obstacle. The obstacle of provisioning environment is told in nail biting prose in Gene Kim’s “The Phoenix Project”

Long before NoSQL and The Phoenix project, about 13 years ago, I was the DBA at a startup, back in the first dot com bubble. As a startup we were trying to get our project to market ASAP. The development team was working crazy hours and the schema was changing all the time.  As the DBA, it was my responsibility to validate all the schema changes and apply the schema changes to the development database. Such validation and application could take a couple hours considering all the other work that was on my plate. This couple hour delay was considered such a hinderance to developers that they unilaterly decided to dump the structured schema and go to an EAV  data store. EAV stands for “Entity Attribute  Values” model. From wikipedia

There are certain cases where an EAV schematic is an optimal approach to data modelling for a problem domain. However, in many cases where data can be modelled in statically relational terms an EAV based approach is an anti-pattern which can lead to longer development times, poor use of database resources and more complex queries when compared to a relationally-modelled data schema.


The EAV “schema” (more like schema-less) meant that perfromance was dreadful and the SQL was impossible to debug as it was completely obfuscated. An example of an EAV query might look ike

Select f.symbol
from strings_eav  eav1,
       integer_eav  eav2,
       strings_eav  eav3,
       fundmstr f
where =
 and =
 and =
 and eav1.attr_num = (select attar_num from attributes where attr_id = :"SYS_B_00")
 and eav1.attr_value=:"SYS_B_01"
 and eav1.act_date <= SYSDATE and eav1.end_date > SYSDATE
 and eav2.attr_num = (select attar_num from attributes where attr_id = :"SYS_B_02")
 and eav2.attr_value=:"SYS_B_03"
 and eav2.act_date <= SYSDATE and eav2.end_date > SYSDATE
 and ((f.def_attr_templ = :"SYS_B_04"
         and eav3.attr_num=(select attar_num from attributes where attr_id = :"SYS_B_05")
         and eav3.attr_value >= to_date(:"SYS_B_06")
         and eav3.act_date <= SYSDATE and eav3.end_date > SYSDATE 
         and exists ( select null from integer_eav eav4 
                           where =
                                     and in ( select attr_num 
                                                             from attributes 
                                                             where attar_id = :"SYS_B_07"
                                                                  or  attar_id = :"SYS_B_08")
                                     and eav4.attr_value = :"SYS_B_09"
                                     and eav4.act_date <= SYSDATE and eav4.end_date > SYSDATE)
       and not exists ( select null from integer_eav eav5
                              where =
                                     and = ( select attr_num 
                                                             from attributes 
                                                             where attar_id = :"SYS_B_10")


Eventually as development stabilized and performance was abominable we were able to get everyone on board and move the EAV back into a structured schema. Once we went to a structured relational schema the load on the database fell from completely saturated to almost idle.

Moral of the story

The ironic thing is that many of the NoSQL setups are basically EAVs and/or databases that allow users to make “schema” type changes with out the intervention of a DBA. For example in Mongo or Couchdb, since they are JSON based, you can just add new fields to the JSON and the database deals with it. If you are using a database like Riak, it’s just a key  value store, which is basically an EAV.

EAV can be good for quick development but for production it’s completely unreadable thus impossible to debug and the performance is atrocious when it comes to relational type database operations of selecting all the fields in a row and joining rows from different “tables” together.

Instead of using EAV one could give each developer a virtual relational database, like using Delphix with Oracle, and let them work as fast as they can and then let them test out merging there changes in virtual databases before merging them in to trunk which is itself a virtual database that can be branched from and merged into.

Virtual databases take up almost no storage and can be made in minutes with a developer self service interface.

Screen Shot 2014-07-03 at 10.18.02 AM






  1. Trackbacks

  2. No trackbacks yet.

  2. | #1

    The funny thing of course for those of us who learned data management well before SQL existed is that SQL and its relational framework was created EXACTLY AND PRECISELY to avoid the effects of prima-donna duhvelopers who kept wrecking projects with their constant changes, lack of ANY technical discipline and no IT background whastoever. And the horrendous subsequent disasters.

    Then they invented the modern IT manager. Who essentially hasn’t got a clue but dispenses “authority” as if that was proof enough of knowledge and experience.

    And the development cycles blew up again and we see the horrendous results day to day in just about every IT discipline involving programming.

    Ah well, some people love spirals!

nine + 4 =