Robson Junior - Mastering a data pipeline with Python: 6 years of learned lessons from mistakes

Conference: EuroPython 2020

Year: 2020

"Mastering a data pipeline with Python: 6 years of learned lessons from mistakes EuroPython 2020 - Talk - 2020-07-24 - Parrot Data Science Online By Robson Junior Building data pipelines are a consolidated task, there are a vast number of tools that automate and help developers to create data pipelines with few clicks on the cloud. It might solve non-complex or well-defined standard problems. This presentation is a demystification of years of experience and painful mistakes using Python as a core to create reliable data pipelines and manage insanely amount of valuable data. Let's cover how each piece fits into this puzzle: data acquisition, ingestion, transformation, storage, workflow management and serving. Also, we'll walk through best practices and possible issues. We'll cover PySpark vs Dask and Pandas, Airflow, and Apache Arrow as a new approach. License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2020.europython.eu/events/speaker-release-agreement/ "