Welcome


Apache Spark, ETL and Parquet

Posted on September 14th, by Arnon Rotem-Gal-Oz in Big Data, Blog, Featured Posts. 1 Comment

One of the projects we’re currently running in my group (Amdocs’ Technology Research) is an evaluation the current state of different option for reporting on top of and near Hadoop (I hope I’ll be able to publish the results when we’d have them). Anyway, part of the preparations for the benchmark includes ingesting a lot of events (CDRs) into the system and creating different aggregations on top of them for instance, for voice call billing events we create yearly, monthly, weekly and daily and hourly aggregations on the subscriber level which include measures like : count of calls, average duration, sum of pricing, median balance, hourly distribution of calls, popular destinations etc.

We are using spark to do the ingestion and I thought that there are two interesting aspects I can share, which I haven’t seen too many examples on the … Read More »

Tweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+Buffer this pageShare on RedditShare on StumbleUponEmail this to someone


Introduction to big data presentation

Posted on August 8th, by Arnon Rotem-Gal-Oz in Big Data, Blog, Featured Posts. No Comments

I presented big data to Amdocs’ product group last week. One of the sessions I did was recorded so I might be able to add here later. Meanwhile you can check out the slides.

Note that trying to keep the slide visual I put some of the information is in the slide notes and not on the slides themselves.

Big data Overview from Arnon Rotem-Gal-Oz

Tweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+Buffer this pageShare on RedditShare on StumbleUponEmail this to someone

Is there a future for Map/Reduce?

Posted on June 3rd, by Arnon Rotem-Gal-Oz in Big Data, Blog, Featured Posts. 3 comments

Google’s Jeffrey Dean and Sanjay Ghemawat filed the patent request and published the map/reduce paper  10 year ago (2004). According to WikiPedia Doug Cutting and Mike Cafarella created Hadoop, with its own implementation of Map/Reduce,  one year later at Yahoo – both these implementations were done for the same purpose – batch indexing of the web.

Back than, the web began its “web 2.0″ transition, pages became more dynamic , people began to create more content – so an efficient way to reprocess and build the web index was needed and map/reduce was it. Web Indexing was a great fit for map/reduce since the initial processing of each source (web page) is completely independent from any other – i.e.  a very convenient map phase and you need  to combine the results to build the reverse index. That said, even the core google algorithm –  the famous pagerank is … Read More »

Tweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+Buffer this pageShare on RedditShare on StumbleUponEmail this to someone