Apache Spark Internals : As Easy as Baking a Pizza!

Pizza Making shop floor (Source: Getty Images)

DAG Scheduler (Planning)

Pizza Job’s DAG
DAG Scheduler’s output is a set of tasks

Task Scheduler & Cluster Manager (Execution)

Worker Rooms of the pizza shop floor with executors coming in when summoned

How many tasks can an executor execute?

Repartition and Coalesce

Shuffle Spill, Shuffle Write & Read

Client, Spark Driver & Spark Session

Driver / Application is a series of jobs

Dynamic Allocation vs Static Allocation


Spark Master + Cluster Manager

Putting it all together — Who is the Master Chef?

Cluster Mode vs Client Mode (Driver)

The pizza order is ready!



