SelectFrom

A vocal community of enthusiastic developers. We speak all things data, code and engineering.

Follow publication

Building Pipelines for Serverless Spark

Mrudula Madiraju
SelectFrom
Published in
7 min readJun 13, 2022

Table Of Contents

Overview

Sample Workflow: Weekly Sales Data Analysis by Region, followed by overall trend analysis

Introduction To Apache Airflow

See the beautiful historical runs by date, time and breakdown of tasks for each run

Serverless Spark Submission — How does Airflow fit?

Serverless Pipeline Sample DAG

P.S — This code does not consider the case of ‘failed’ but you get the idea…
Order of execution from left to right
You can see the progress of application with the different color legends
Also see the duration and other details per task

Couple of Beginner Tips

A Realistic Workflow — Sales Data Analysis

Tasks Sequencing

DAG Sequencing for parallel tasks and join

Python Operator/Sensor

Tracking Application ID for Parallel Submissions

Branching

Side Note

Source Code

References

Published in SelectFrom

A vocal community of enthusiastic developers. We speak all things data, code and engineering.

Written by Mrudula Madiraju

Dealing with Data, Cloud, Compliance and sharing tidbits of epiphanies along the way.

No responses yet

Write a response