Skip to content

Hadoop and the OpenDataPlatform

hadoop-logo-square

Pivotal, IBM and Hortonworks announced today the “Open Data Platform” (ODP) – an attempt to standardize Hadoop. This move seems to be backed up by IBM, Teradata and others that appear as sponsors on the initiative site.

This move has a lot of potential and a few possible downsides.

ODP promises standardization – Cloudera’s Mike Olson downplays the importance of this “Every vendor shipping a Hadoop distribution builds off the Hadoop trunk. The APIs, data formats and semantics of trunk are stable. The project is a decade old, now, and the global Hadoop community exercises its governance obligations responsibly. There’s simply no fundamental incompatibility among the core Hadoop components shipped by the various vendors.”

I disagree. While it is true that there are no “fundamental incompatibility” there is a lot of non-fundamental ones. Each release by each vendor includes backport of features that are somewhere on the main trunk but far from the stable release. This means, that as a vendor, we have to both test our solutions on multiple distributions and work around the  subtle incompatibilities. We also have to limit ourselves to the lowest common denominator of the different platforms (or not support a distro) – for instance, until today, IBM did not support Yarn or Spark on their distribution

Hopefully standardization around common core will also mean that the involved vendors will deliver their value-add on that core unlike today where the offerings are based on proprietary extensions (this is true for Pivotal, IBM etc. not so much for Hortonworks). Today, we can’t take Impala and run it on Pivotal can we take Hawk and run it on HDP . With ODP we would, hopefully,  be able  mix-and-match and have installations where we can, say,  use IBM’s BigSQL with GemFire HD running on HDP and other such mixes. This can be good news for these vendors by enlarging their addressable market and for us a users by increasing our choice and reducing lock-in.

So what are the downsides/possible problems?

Well, for one we need to see that the scenarios I described above will actually happen and this isn’t just a marketing ploy. Another problem, the elephant in the room if you will,  is that the move is not complete –  Cloudera, a major Hadoop player, is not part of this move and as can be seen in the post referenced above, are against it. This is also true for MapR. With these two vendors out we still have multiple vendors to deal with and the problems ODP sets to solve will not disappear. I guess if ODP was led by the ASF or some other more “impartial” party it would have been easier to digest but as it is now all I can do is hope that both ODP will live to its expectations and that in the long run Cloudera and MapR will also join this initiative .

 

 

Published inBig DataBlogFeatured Posts