Overview

Apache Spark provides a useful framework for building data pipelines, which form the core of any data infrastructure. Often, the need is to build incremental data processing pipelines; for example, transactional data that is ingested from online transaction processing (OLTP) databases. In this workshop, we walk you through the steps of building and optimizing Spark-based pipelines for upserts and incremental data processing using Apache Hudi (Incubating). We start with the basic pipeline and slowly introduce each new concept with practical examples. Bring your laptop so that you can build these pipelines yourself