The NoSQL moniker that was coined circa 2009 marked a move from the “traditional” relational model. There were quite a few non-relational databases around prior to 2009, but in the last few years we’ve seen an explosion of new offerings (you can see,for example, the “NoSQL landscape” in a previous post I made). Generally speaking, and everything here is a wild generalization, since not all solutions are created equal and there are many types of solutions – NoSQL solutions mostly means some relaxation of ACID constraints, and, as the name implies, the removal of the “Structured Query Language” (SQL) both as a data definition language, and more importantly, as a data manipulation language, in particular SQL’s query capabilities.
ACID and SQL are a lot to lose and NoSQL solutions offer a few benefits to augment them mainly:
- Scalability – either as relative scalability, meaning scale cheaper than a comparable RDBMS at same scale point; or absolutely – as in scale better than an RDBMS can. Scalability is usually achieved by preferring partition tolerance over consistency in Eric Brewer’s CAP theorem and relying on “eventual consistency” (more on this later)
- Simpler models – i.e. the mapping of programming structures to storage structure is straight forward and thus avoid the whole “object/relations mapping quagmire” (or as Ted Neward called it “Vietnam of computer science” ). I have to say that in my experience this is only a half truth as it only holds to a point and when you need to scale and/or have high-performance requirements you need to carefully design your schemas and it isn’t always “simple”.
- Late binding schemas – This is a real flexibility boon as you can store data in forms that are close to the origin form and apply the schemas on read so you can deliver poly-strctured data and handle semi-structured data easily.
Eventual consistency and simple query mechanisms can work for a while and some use cases but as adoption of NoSQL solutions got more widespread we can see that markets needs more.
Eventual consistency
Eventual consistency means that if new updates stop flowing in after a while all reads will return the last updated value – As new updates rarely stops and as “after a while” is not well defined – this is a rather weak guarantee and we see some efforts to make stronger guarantees. Peter Bailis and Ali Ghodsi, published a good paper called “Eventual Consistency Today: Limitations, Extensions, and Beyond” where they go over some of the options. The NoSQL landscape is too wide to say this is happens everywhere but some solutions move in this direction, for example, in HBase (the NoSQL I’ve used most in the past few years) we’ve seen the addition of “Multi-Version Concurrency Control” which provide ACID guarantees for single row operations (which can be tuned down for performance)
Nevertheless, providing real guarantees under real conditions can prove to be rather tricky. I highly recommend reading Kyle Kingsbury series of great posts on Jepsen where he looks at how Postgres, MongoDB, Redis and Riak handle writes under network partitioning.
Queries
When we look at the NoSQL space we see that a lot of the technologies get better, more advance query languages e.g. mongoDB find some nice features ; cassandra’s query language is at its third version but one technology where introducing queries in general and SQL specifically is becoming form a trend into a stampede is Hadoop. Hadoop has a multi-vendor, multi-distro ecosystem (not unlike Linux) and it seems each and everyone of them wants to introduce its own SQL solution : Cloudera offers Impala, Hortonworks is working on Stinger initiative to enhance Hive, Pivotal (nee EMC greenplum) has Hawq , IBM is working on BigSQL and even SalesForce.com (which does not offer a distro) offers an SQL skin for HBase called Phoenix . The last Hadoop summit had a panel where some of these players debated the merits of their respective platforms which is worth listening to
The examples I’ve given above are mainly around hadoop – naturally, as this is the environment I’ve been working with I am more familiar with it, but more importantly it seems Hadoop has managed to place itself as the main NoSQL, large scale (a.k.a. big data) solution and as such this reSQL trend is more apparent there and it will (and it does) also affect other NoSQL offerings.
The thing is that NoSQL dropped SQL capabilities for simplicity – wider adoption draws all the capabilities and complexity back,I guess the main problem is that the situation is even more complicated when we’re also dealing with big data and its implications (e.g. late binding schema vs. the schema needs for the *structured* query language; immovable or hard to move data vs joins etc.)