The Perfect AI Storage: Trino From Facebook And Iceberg From Netflix? - The Next Platform

When it comes to solving data analytics problems at scale, it is tough to beat the hyperscalers. And that is why a combination of technologies that were originally developed at Facebook (now Meta Platforms) and Netflix could end up being the perfect pairing to create a “lakehouse” underpinning AI training and other applications. Not surprisingly, everyone who builds a high performance...
Show HN: Polytomic Connect – API for two-way ETL and data syncs with customers

Hello HN, we’re Ghalib and Nathan from Polytomic. Today we’re launching our second product, Polytomic Connect: https://www.polytomic.com/connect (documentation here: https://apidocs.polytomic.com).Connect is an API you can use in your own products to either pull your customers’ data into your own systems, or push data from your own systems to your customers’, or both. We have first-class support for data warehouses too.You...
Scale your relational database for SaaS, Part 2: Sharding and routing

This post is a continuation of our series on scaling your relational database for software as a service (SaaS). SaaS providers commonly use relational databases, such as Amazon Relational Database Service (Amazon RDS) and Amazon Aurora, in their solutions. In Part 1, we looked at some common ways to scale or optimize your relational database architecture. Those methods focused on...
Understanding Delta Lake's consistency model

submitted by /u/ketralnis [link] [comments]
Investing in the Future: The Case for Hiring and Mentoring Junior Tech Talent

submitted by /u/fractalfellow [link] [comments]
Traefik Proxy v3.0.0 Released

You can’t perform that action at this time.

Category: Featured Posts

Software architecture workshop (slides)

Published by Arnon Rotem-Gal-Oz on November 29, 2023

The title says it all – These are slides from a session I was working on to explain the basics of software architecture based on…

pandas on spark apply_batch/transform_batch broken? (tl;dr; No – but it isn’t well documented)

pandas on spark apply_batch/transform_batch broken? (tl;dr; No – but it isn’t well documented)

Published by Arnon Rotem-Gal-Oz on October 16, 2022

Using pypark’s pandas integration via apply_batch and transform_batch is very powerful but lacking documentation can cause hard to trace bugs – hopefully my experience (below)…

Intro to Apache Spark (slides)

Published by Arnon Rotem-Gal-Oz on December 16, 2020

I gave a general overview of Apache Spark to our R&D teams. You can find the slides below

Where is Apache Spark heading?

Where is Apache Spark heading?

Published by Arnon Rotem-Gal-Oz on December 4, 2020

I watched (COVID19-era version of “attended”) the latest spark Summit and in one of the keynotes Reynold Xin from Databricks, presented the following two images…

Big data isn’t – well, almost

Big data isn’t – well, almost

Published by Arnon Rotem-Gal-Oz on March 23, 2019

Back in ancient history (2004) Google’s Jeff Dean & Sanjay Ghemawat presented their innovative idea for dealing with huge data sets – a novel idea…