Orchestrating data and ML workflows with Apache Airflow | Tamara Janina Fingerlin | Conf42 ML 2023

Conference: Conf42 Machine Learning 2023

Year: 2023

Read the abstract ➤ https://www.conf42.com/Machine_Learning_2023_Tamara_Janina_Fingerlin_orchestrating_workflows_apache_airflow Other sessions at this event ➤ https://www.conf42.com/ml2023 Join Discord ➤ https://discord.gg/DnyHgrC7jC Project ➤ https://github.com/TJaniF/airflow-ml-pipeline-image-classification Chapters 0:00 intro 0:22 preface 0:31 overview 1:53 ml orchestration ∈ [ml ops] 3:21 automatable components 4:47 airflow crash course 4:55 what is apache airflow? 5:59 airflow ui 6:58 dags - tasks - operators 10:02 dags complex as you want 10:26 why airflow? 13:01 the data 14:20 sometimes it is (relatively) easy 15:37 sometimes it is harder 15:53 the pipeline 16:12 the tools 17:11 8 dags, 6 datasets 19:59 @continuous 20:39 two dags waiting for new train/test data 21:18 deferrable operators can save resources! 23:30 dynamic tasks 25:00 2 dags handling preprocessing 27:04 astro sdk - part 1 28:36 train the model 29:13 wrapping model fine-tuning into a custom operator 30:37 get a baseline 31:11 wrapping model testing into a custom operator 31:35 test fine-tuned model 32:02 airflow notifiers 33:29 customized slack alerts 34:11 deploy the best model - astro sdk part 2 36:08 demo 42:12 the results 43:17 what is next? 44:56 airflow ♥ ml - resources 48:38 thank you