Using pypark’s pandas integration via apply_batch and transform_batch is very powerful but lacking documentation can cause hard to trace bugs – hopefully my experience (below)…
The nexus of technology, business & people
Using pypark’s pandas integration via apply_batch and transform_batch is very powerful but lacking documentation can cause hard to trace bugs – hopefully my experience (below)…
I gave a general overview of Apache Spark to our R&D teams. You can find the slides below
I watched (COVID19-era version of “attended”) the latest spark Summit and in one of the keynotes Reynold Xin from Databricks, presented the following two images…
Back in ancient history (2004) Google’s Jeff Dean & Sanjay Ghemawat presented their innovative idea for dealing with huge data sets – a novel idea…
A couple of weeks ago I presented @ BDX2016 The slides are available on Slideshare Big data in the cloud – welcome to cost oriented…