List of videos

DBT & Python - How to write reusable and testable pipelines — Florian Stefan

[EuroPython 2024 — North Hall on 2024-07-11] DBT & Python - How to write reusable and testable pipelines by Florian Stefan https://ep2024.europython.eu/session/dbt-python-how-to-write-reusable-and-testable-pipelines The "data build tool" (DBT) was designed to unlock software engineering best practices for SQL-based data pipelines: pipelines as version controlled directed acyclic graphs (DAGs) consisting of testable and reusable nodes. With the increasing number of cloud data warehouses and data lakehouses that allow the native execution of Python code, DBT also added support for Python models. In this talk, I will explain how Flatiron Health uses DBT to improve and extend lives by learning from the experience of every person with cancer. We will discuss an example project setup that uses SQL as well as Python models. I will share our experiences with unit and data testing as well as with writing a reusable variable library. The talk is well-suited for anyone with prior data warehouse or data lakehouse experience who is curious how they can leverage DBT to write test-driven and reusable data piplines. The example project will use SQL, Python and Snowflake. --- This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License: https://creativecommons.org/licenses/by-nc-sa/4.0/

Watch

Don't fix bad data, do this instead — Martina Ivanicova

[EuroPython 2024 — North Hall on 2024-07-11] Don't fix bad data, do this instead by Martina Ivanicova https://ep2024.europython.eu/session/don-t-fix-bad-data-do-this-instead In a time where GenAI is quickly growing in popularity, along with prescriptive analytics and online ML models, the question is raised whether we still need to care about data quality? We strongly believe that the answer is yes, and even more so than before! Our expectations of data are high, and this often leads to frustrations when reality does not meet these expectations. In the pursuit of data quality, expectations must be grounded in reality. It is often the case that a gap exists between anticipated outcomes and the actual data reality, which leads to frustration and mistrust. This talk delves into pragmatic strategies that can be employed to bridge this gap. The talk will discuss both the technical (hard) and cultural (soft) measures implemented to uphold these standards. Key Takeaways: 1. Integration tests serve as a proactive barrier, preempting the violation of data contracts, unlike reactive data quality checks. 2. Prioritisation is crucial; a product-centric mindset is key when evaluating the balance between resource investment and potential gain. 3. Data quality management is requiring both hard and soft measures Are you a data scientist, software engineer, product manager, or data engineer? Join us in this discussion; data quality concerns us all. --- This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License: https://creativecommons.org/licenses/by-nc-sa/4.0/

Watch

Scikit-LLM: Beginner Friendly NLP Using LLMs — Iryna Kondrashchenko, Oleh Kostromin

[EuroPython 2024 — North Hall on 2024-07-11] Scikit-LLM: Beginner Friendly NLP Using LLMs by Iryna Kondrashchenko, Oleh Kostromin https://ep2024.europython.eu/session/scikit-llm-beginner-friendly-nlp-using-llms The instruction following and in-context learning capabilities of LLMs make them suitable for tackling many NLP tasks. In this talk, we will introduce Scikit-LLM: https://github.com/iryna-kondr/scikit-llm, a rapidly growing, beginner-friendly library that abstracts the complexity of working with LLMs by providing a scikit-learn compatible API. We will showcase how Scikit-LLM can be utilized for solving text classification and text-to-text tasks, and will delve deeper into various methods to improve the model performance, such as prompting strategies and fine-tuning. --- This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License: https://creativecommons.org/licenses/by-nc-sa/4.0/

Watch

PySyft: Data Science on data you are not allowed to see — Valerio Maggio

[EuroPython 2024 — North Hall on 2024-07-11] PySyft: Data Science on data you are not allowed to see by Valerio Maggio https://ep2024.europython.eu/session/pysyft-data-science-on-data-you-are-not-allowed-to-see In today's data-driven world, privacy stands as an essential requirements for the ethical and effective practice of data science. Moreover, the implementation of robust privacy guarantees in data analysis not only protects sensitive information, but also unlocks the potential for unprecedented democratisation of models and datasets. PySyft: https://github.com/OpenMined/PySyft, is a stack of open source tools that is designed to help organisations to securely collaborate with external (untrusted) individuals. By using PySyft, organisations can enable external auditors (e.g. data scientists) to use their assets, such as datasets or models, in order to conduct studies with a specific, known purpose. Data scientists can run their analysis using those assets through PySyft, and without seeing nor obtaining a copy of the assets themselves. We call this process _Remote Data Science._ PySyft is a framework for Remote Data Science. In the first part of my talk I will introduce the problem of privacy in Data Science, PETs (Privacy Enhancing Technologies), and OpenMined mission to democratise access to data and information. Afterwards, I will demonstrate how *_PySyft_* works, and how it can be used to run a machine learning experiments, with privacy guarantees. --- This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License: https://creativecommons.org/licenses/by-nc-sa/4.0/

Watch

Python’s Journey: From Upstream to Enterprise — Lumír Balhar

[EuroPython 2024 — North Hall on 2024-07-11] Python’s Journey: From Upstream to Enterprise by Lumír Balhar https://ep2024.europython.eu/session/pythons-journey-from-upstream-to-enterprise Have you ever wondered how Python gets from the first alpha version upstream to years of stability in your enterprise Linux systems? And what products and useful components are created for you along the way? In this talk, Lumír will take you through the incredible journey of Python delivery from the first alpha version shipped to Fedora Linux a couple of days after the official upstream release, through containers developers can use for testing with many old and new Python releases in their CI, to Red Hat Enterprise Linux and its main and alternative Python application streams and containers with various Python versions ready to be deployed to production environments with years of required stability. In this talk, Lumír will talk about: * Python maintainers’ focus on speed of delivery in Fedora and stability and reliability in RHEL. * How to use containers based on Fedora for early adoption of new Pythons in CI/CD pipelines. * What challenges do we face during ten years of maintenance of old Python interpreters. Come and learn how you can benefit from our efforts, use a modern development environment, and deploy your apps with guaranteed stability. --- This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License: https://creativecommons.org/licenses/by-nc-sa/4.0/

Watch

Mastering Generative AI: Tools and Techniques with VS Code, GitHub, Azure — Leo Yao

[EuroPython 2024 — North Hall on 2024-07-11] Mastering Generative AI: Tools and Techniques with VS Code, GitHub, Azure by Leo Yao https://ep2024.europython.eu/session/mastering-generative-ai-tools-and-techniques-with-vs-code-github-azure With the rise of Generative AI, developers are now able to create a wide range of applications that can generate content from simple prompts and context. In this presentation, we will explore how you can leverage the power of Visual Studio Code, GitHub, and Azure to develop, test, and deploy generative AI applications. We will discuss the latest tools and techniques for building and training generative models, and demonstrate how to build a sample application using GPT-4o, VS Code and its extensions. Additionally, we will showcase how to use GitHub for version control and collaboration, and how to deploy and manage your applications using Azure. For both beginners and veterans, join us to learn how you can master the power of generative AI to create innovative applications. --- This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License: https://creativecommons.org/licenses/by-nc-sa/4.0/

Watch

Earth Observation through Large Vision Models — Mayank Khanduja

[EuroPython 2024 — North Hall on 2024-07-11] Earth Observation through Large Vision Models by Mayank Khanduja https://ep2024.europython.eu/session/earth-observation-through-large-vision-models Ever wondered how location planning is done to build city infrastructure? Or when there is a disaster, how do we determine the possible affected areas and send reinforcements there? We require overhead imagery for that, which we mainly obtain from satellites. European Space Agency has sent various satellites however, the dataset from these satellites is huge and may even contain multiple bands from the electromagnetic spectrum. Large AI models have a huge potential in this domain, if they are developed to work well with this dataset. There are a lot of pre-trained Generative & Large Vision models on platforms like HuggingFace, Kaggle, etc., but these models do not integrate well with a specific domain like satellite datasets, hence the need to train or fine-tune them. In this talk, we are going to see from where we can access open satellite datasets, fine-tune various Vision Models and Multimodals on it, and examine the following applications: - Perform Zero-Shot classification and object detection on satellite images with human language input using Multimodal models. - Image-to-image translation on satellite imagery using generative vision models. --- This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License: https://creativecommons.org/licenses/by-nc-sa/4.0/

Watch

Unlock the Power of Dev Containers: Consistent Environments in Seconds! — Thomas Fraunholz

[EuroPython 2024 — South Hall 2A on 2024-07-11] Unlock the Power of Dev Containers: Consistent Environments in Seconds! by Thomas Fraunholz https://ep2024.europython.eu/session/unlock-the-power-of-dev-containers-consistent-environments-in-seconds In this talk, we will explore the basic concepts of Dev Containers and demonstrate how they can support your everyday development as a Python programmer, data scientist, or machine learning engineer. With Dev Containers, you can build a consistent development environment in seconds, no matter where you are or what tools you use. And you know what? The Development Container Specification is even open source. Say goodbye to the hassle of setting up your development environment from scratch every time you start a new project! We will start with a basic example and discuss how to set up a consistent Python development environment, including best practices for package management and GPU support. After this talk, you will be able to leverage the advantages of Dev Containers, allowing you to work from anywhere and be ready in seconds. If you're tired of wasting time setting up your development environment and want to unlock the power of Dev Containers, then this talk is a must-attend for you! --- This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License: https://creativecommons.org/licenses/by-nc-sa/4.0/

Watch

Containerize your Python apps like it's 2024 — Jan Smitka

[EuroPython 2024 — South Hall 2A on 2024-07-11] Containerize your Python apps like it's 2024 by Jan Smitka https://ep2024.europython.eu/session/containerize-your-python-apps-like-it-s-2024 There are a lot of resources on containerizing Python applications with Docker, but most are basic and outdated. Following them results in slow builds and potentially insecure applications. Let's see how we can build better containers using recent Docker features! This talk will show how to speed up your builds and make your images smaller and more secure. We'll use features such as multi-stage builds or cache mounts to build containers with Python apps. We will also discuss how to improve the security of your container. Tips from the talk are valid for applications of all sizes and kinds: from hobby projects to enterprises, from CLI tools to web applications and APIs. You will be able to apply them immediately after the talk. Basic knowledge of Docker and its key concepts (images, layers, Dockerfile commands) is required. You'll learn something new even if you have used Docker for some time. --- This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License: https://creativecommons.org/licenses/by-nc-sa/4.0/

Watch