Apache Spark Internals : As Easy as Baking a Pizza!

Pizza Making shop floor (Source: Getty Images)

DAG Scheduler (Planning)

Pizza Job’s DAG
DAG Scheduler’s output is a set of tasks

Task Scheduler & Cluster Manager (Execution)

Worker Rooms of the pizza shop floor with executors coming in when summoned

How many tasks can an executor execute?

Repartition and Coalesce

Shuffle Spill, Shuffle Write & Read

Client, Spark Driver & Spark Session

Driver / Application is a series of jobs

Dynamic Allocation vs Static Allocation

spark.dynamicAllocation.enabled=true 
spark.dynamicAllocation.minExecutors=1
spark.dynamicAllocation.maxExecutors=<N>
spark.dynamicAllocation.enabled=false
spark.executor.instances=<>

Spark Master + Cluster Manager

Putting it all together — Who is the Master Chef?

Cluster Mode vs Client Mode (Driver)

The pizza order is ready!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store