List of videos

Talks - Valerio Maggio: Pythonic `functional` (`iter`)tools for your data challenges

Nowadays Python is very likely to be the first choice for developing machine learning or data science applications. Reasons for this are manifold, but very likely to be found in the fact that the Python language is amazing (⚠️ opinionated), and the open source community in the PyData ecosystem is absolutely fantastic (💙 that's a fact 1 2 3). In this context, one of the most remarkable features of the Python language is its ability in supporting multiple programming styles (from imperative to OOP and also functional programming). Thanks to this versatility, developers have their freedom to choose whichever programming style they prefer. Functional programming is indeed very fascinating, and it is great for in-demand tasks such as data filtering or data processing. Of course, this doesn't say anything about other paradigms, but sometimes the solution to a data problem could be more naturally expressed using a functional approach. In this talk, we will discuss Python's support to functional programming, understanding the meaning of pure functions (also why mutable function parameters are always a bad idea), and Python classes and modules that would help you in this style, namely itertools, functools, map-reduce data processing pattern. As for reference data challenges, we will discuss functional-style solutions to Advent of Code coding puzzles, to make it fun, and interactive.

Watch

Talks - Łukasz Langa: Working Around the GIL with asyncio

You've heard it many times: the GIL is a problem for using all your CPU cores in one program. Among the generally accepted solutions there's multiprocessing, a way to orchestrate a group of worker processes to spread CPU load over many cores. This solves the problem for many use cases but if you have a lot of data to pass around there and back again, it's much less efficient. In this short talk we'll go through two examples of data processing with Python 3.11 and how asyncio with shared memory helps speed things up. To cover all bases, one example will run on macOS, the other on Windows Subsystem for Linux. You'll see how the built-in building blocks of Python allow to compose scalable systems. Our focus is on the base programming language. We won't be reimplementing data pipelines or covering any MLops best practices.

Watch

Talks - Maria Jose Molina Contreras: Next level Machine Learning with TinyML and Python

We usually associate the future of computing as large clusters being able to perform tasks in a fraction of a second, but is it really the only scenario on how computational hardware will evolve? Machine learning has become an important component in our societies, we see how people, communities, and global companies are focusing their resources into improving their technological stack, and being the leader into the next generation of AI. At the same time that we see clusters getting larger, GPUs more powerful, and our phones are practically computers being capable of doing almost everything. We see that some of the smart devices are becoming smaller. The Internet of Things has been flourishing for many years, and Python has been playing an important role on the “easy to automate” topic for many devices, but can Python help us in all scenarios? One of the challenges for the next generation ML is to think small, you read that right “thinking small”. It’s time to start being able to have mechanisms with super well-trained ML models in small-devices: ML on microcontrollers. We are going to dive into TinyML and evaluate different setups to interact with sensors on microcontrollers. We will discuss the different hardware options and frameworks to start with, while checking different use cases that TinyML can solve, like: agriculture, conservation, health issues detection, ecology monitoring, autonomous vehicles, etc. In this talk, you will learn about Tiny Machine Learning (TinyML), which is an approach that explores machine learning deployed in embedded systems on microcontrollers. Similarly, I will talk about Micropython and CircuitPython, and how they have been conquering the microcontroller scene. Lastly, we will discuss a real use-case, predictive machine learning model to predict anomalies for predictive maintenance problems.

Watch

Talks - Dave Aronson: Kill All Mutants! (Intro to Mutation Testing)

How good is your test suite? Would it all still pass if the tested code was changed? If so, there may be problems with your code, your tests, or both! Mutation Testing reveals these cases. It makes lots of slightly altered versions of your code, called "mutants." If any mutants let all of the code's tests pass, you probably have gaps in your test suite, ineffective code, or both. This talk will tell you what mutation testing is, how it works, how to use it, and its benefits, drawbacks, inner workings, and history. There will be several examples, and a list of tools for many popular languages. You will come away equipped with a powerful new technique for making sure your tests are strict and your code is meaningful!

Watch

Talks - A. Jesse Jiryu Davis: Consistency and isolation for Python programmers

When you use a SQL database like Postgres, you have to understand the subtleties of isolation levels from "read committed" to "serializable." And distributed databases like MongoDB offer a range of consistency levels, from "eventually consistent" to "linearizable" and many options in between. Plus, non-experts usually confuse "isolation" with "consistency!" If we don't understand these concepts we risk losing data, or money, or worse. So what's the bottom line? Isolation: in a simple world, your database runs on one machine and executes each request one-at-a-time. In reality, databases execute requests in parallel, leading to weird phenomena called "anomalies". To see why anomalies happen, we'll look at Python code that simulates how a database executes operations. The various isolation levels make different tradeoffs between the anomalies they allow, versus the parallelism they can achieve. Consistency: distributed databases keep copies of your data on several machines, but these copies go out of sync. This leads to new anomalies: weird phenomena that reveal the out-of-sync data, and make your application feel like it's in a time warp. The various consistency levels make tradeoffs between anomalies versus latency. It depends how long you're willing to wait for your data changes to be synced across all the machines. Again, we'll look at a Python simulation to understand these anomalies. You don't need to know all the names and details of every consistency and isolation level. You can refer to this handy chart. And you don't need to read all the academic papers, but I'll name four or five that are worth your time. Now, make informed decisions about consistency and isolation, and use your database with confidence!

Watch

Talks - Rob de Wit: Transforming a Jupyter Notebook into a reproducible pipeline for ML experiments

Jupyter Notebooks are part of every data scientist's arsenal and for good reason. But while they're great for prototyping in data science projects, they are not ideal for experimenting with different configurations. I have been guilty of running experiments with changing parameters while keeping track on a notepad, and the result has always been messy. In this session, we will explore how we can transform our notebook prototype into a reproducible pipeline. We will discuss what goes wrong without proper experiment tracking, why reproducibility is the key to solving this, and how we can achieve that with Git and DVC. I will discuss this topic using a text2image project with Stable Diffusion. I'll show how to break up a notebook into modules, create a pipeline from them, run experiments through the pipeline, and compare their results to find the best possible outcomes. The target audience will be data scientists that don't have a strong engineering background but would like to move beyond messing about in notebooks. Much like myself a year or two ago.

Watch

Talks - Dan Craig: Testing Spacecraft with Pytest

Much of the industry discussion around software testing over the last couple of decades has been focused on web services, but there are lots of different types of software systems that have different testing needs. This talk will first explore the differences and similarities between testing web services and testing safety-critical and mission-critical software systems, such as those used on spacecraft. We will then consider a rubric for thinking about the verification needs of different types of software based on attributes of the software and the environments in which it runs. Finally, we will examine a real-world example of using pytest to test Varda Space Industries' spacecraft software, showcasing many of pytest's power features, such as its fixtures and extensive hook system, as well as Python language features such as generators, context managers, and threading, that enable easy-to-use tools for testing against real-time telemetry streams and generating rich test report output.

Watch

Talks - Cheuk Ting Ho: Trying No GIL on Scientific Programming

Last year, Sam Gross, the author of nogil fork on Python 3.9, demonstrated the GIL can be removed. For scientific programs which use heavy CPU-bound processes, it could be a huge performance improvement. In this talk, we will see if this is true and compare the no-gil version to the original. In this talk, we will have a look at what is no-gil Python and how it may improve the performance of some scientific calculations. First of all, we will touch upon the background knowledge of the Python GIL, what is it and why it is needed. On the contrary, why it is stopping multi-threaded CPU processes to take advantage of multi-core machines. After that, we will have a look at no-gil Python, a fork of CPython 3.9 by Sam Gross, and how it provides an alternative to using Python with no GIL, demonstrating it could be the future of the newer versions of Python. With that, we will try out this version of Python in some popular yet calculation-heavy algorithms in scientific programming and data sciences e.g. PCA, clustering, categorization and data manipulation with Scikit-learn and Pandas. We will compare the performance of this no-gil version with the original standard CPython distribution. This talk is for Pythonistas who have intermediate knowledge of Python and are interested in using Python for scientific programming or data science. It may shine some light on having a more efficient way of using Python in their tasks and interest in trying the no-gil version of Python.

Watch

Talks - Iván Pulido: Reproducible molecular simulations with Python

In this talk the audience will be briefly introduced to the field of molecular dynamics simulations and its challenges. Special attention will be given to how the features found in Python and its scientific ecosystem are boosting the research in the area, especially in times where Machine Learning and AI methods are revolutionizing the field. Examples using the OpenMM and its ecosystem (openmmtools, perses, among others) will be featured.

Watch