I’ve been working with Hadoop for a few years now and the platform and ecosystems has been advancing at an amazing pace with new features and additional capabilities appearing almost on a daily basis. Some changes are small like better scheduling in Oozie; some are still progressing like support for NFS some are cool like full support for CPython in Pig but, in my opinion, the most important change is the introduction of YARN in Hadoop 2.0.
Hadoop was created with HDFS, a distributed file system, and Map/Reduce framework – a distributed processing platform. With YARN hadoop moves from being a distributed processing framework into a distributed operating system.
“operating system”, that sounded a little exaggerated when I wrote it, so just for fun, I picked up a copy of Tanenbaum’s “Modern Operating Systems”*, I have lying around from my days as … Read More »
Every now and then I get some question by email, I usually just answer them directly but considering I got 2 such questions this week and that I have’t blogged for awhile (I do have a post about YARN which I hope to finish soon) – I thought I’d also publish my replies here.
Question #1 from Simon:
In your very interesting article “Bridging the Impedance Mismatch Between Business Intelligence and Service-Oriented Architecture” you highlight the challenges for BI and SOA to co-exist – that was 6 or so years ago – have you seen any advances that would cause you to revise that view?
I think the gap and dissonance between SOA needs and BI needs is still there. However, in addition to event publishing mentioned in the article, I see the approach to getting to BI on SOA getting more standardized. … Read More »
The NoSQL moniker that was coined circa 2009 marked a move from the “traditional” relational model. There were quite a few non-relational databases around prior to 2009, but in the last few years we’ve seen an explosion of new offerings (you can see,for example, the “NoSQL landscape” in a previous post I made). Generally speaking, and everything here is a wild generalization, since not all solutions are created equal and there are many types of solutions – NoSQL solutions mostly means some relaxation of ACID constraints, and, as the name implies, the removal of the “Structured Query Language” (SQL) both as a data definition language, and more importantly, as a data manipulation language, in particular SQL’s query capabilities.
ACID and SQL are a lot to lose and NoSQL solutions offer a few benefits to augment them mainly:
Scalability – either as relative scalability, … Read More »