List of videos

Talks - Kevin Kho, Han Wang: Speed is Not All You Need for Data Processing

Pandas was the dominant local data processing framework for majority of the last decade. Now, there are many other options available like Polars and DuckDB. Is it worth switching to them? One of the main reasons developers switch is because of the supposed speed. TPCH benchmarks show Polars and DuckDB are an order of magnitude faster than Pandas (and Dask) because of the Rust-based or C++ implementation. For large-scale data, we are often told to use pure native Spark whenever possible. Pandas UDFs are often discouraged because they are deemed as a bottleneck. The optimizer works best when it can see the entire query plan, but Pandas UDFs are a black box. But as practitioners, we have to ask two related questions: 1. Are these assumptions true? Is it universally true that Pandas and Pandas UDFs are slower? 2. Even if it it slower, is it worth the development overhead to avoid using Pandas? In this talk, we'll show benchmarks across data of various sizes to show that these common assumptions are not always true. In fact, we'll see that Pandas UDFs can actually be faster than native Spark in some cases. With this result in mind, data practitioners should just focus on the tools that serve them best rather than adjusting to the tools.

Watch
Talks - Mridul Seth, Erik Welch: NetworkX is Fast Now: Graph Analytics Unleashed

Have you ever wondered how to find connections in your data and to gain insights from them? Come discover how NetworkX makes this easy (and fast!). This talk is broadly divided into two parts. First we will talk about the power of graph analytics and how you can use tools like NetworkX to extract information from your data, and then we will talk about how we made the machinery behind NetworkX work with heterogeneous backends like GraphBLAS (CPU optimized) and cuGraph (GPU optimized). Part I NetworkX is the most popular library in Python for graph theory and applied network science thanks to its extensive API and beginner-friendly documentation. NetworkX is used "everywhere", because graphs are everywhere. Don't believe me? We surveyed more than 300 Python packages to understand how they use NetworkX in domains ranging from geoscience, neuroscience, genomics, biology, chemistry, quantum computing, text and language, machine learning, causal inference, optimization, and more. We will summarize what we learned to help you apply graph analytics to your data. Once you start using NetworkX you will soon realize that the pure-Python implementation starts becoming a roadblock to scalable graph analytics. This takes us to the second part of the talk... Part II What should you do when your graph data becomes too large or NetworkX becomes too slow? Simple: use an accelerated NetworkX backend! NetworkX 3.0 added the ability to dispatch to other implementations. This means you can use other highly tuned libraries from NetworkX to achieve up to 100 to 10_000+ times speedup! As "the API for graphs", NetworkX now makes it easy to accelerate your graph workflows on CPUs with GraphBLAS and NVIDIA GPUs with nx-cugraph. Other backends are welcome, and we plan to support distributed graphs soon for extreme scalability 🚀

Watch
Talks - William Woodruff: Building a Rusty path validation library for PyCA Cryptography

The Python ecosystem has historically relied on OpenSSL (and its myriad forks) to provide an implementation of X.509 path validation, a little-known but essential component of every secure HTTPS connection made on the modern Internet. This has brought technical debt, developer frustration (due to OpenSSL's poorly documented implementation quirks), and a mottled security history. This talk introduces an alternative, developed over the past year: a new implementation of X.509 path validation, written from the ground-up in a memory-safe language with standards conformance as a priority, newly integrated into PyCA Cryptography, Python's most popular cryptographic library. We'll cover the work's implementation details, strategies applied for reducing complexity, technical decisions and tradeoffs made in its Rust components, as well as the work's impact on the millions of Python developers that depend on PyCA Cryptography and PyCA pyOpenSSL. Particular attention will be dedicated to the work's critical security scope and accompanying testing philosophy, including developed strategies for reaching perfect test coverage and avoiding vulnerability classes that have historically afflicted X.509 path validation implementations. The audience is expected to have an intermediate familiarity with general Python development, including a high-level familiarity with SSL/TLS and HTTPS (but not X.509 or X.509 path validation). Audience members will leave the talk with a more complete understanding of the modern Internet's security model, as well as how the Python ecosystem is maturing to accomodate modern cryptographic best practices in networked settings.

Watch
Talks - Irit Katriel: CPython's Compilation Pipeline

Over the last couple of years, CPython's compiler was refactored. In version 3.13, we will have access from Python scripts to more of the compilation stages: Instead of the old 4-stage pipeline (source -- tokens -- AST -- code object), we will have a more refined pipeline (source -- tokens -- AST -- optimized AST -- pseudo bytecode -- optimized pseudo bytecode -- bytecode -- code object). This talk describes the new compilation pipeline of CPython 3.13 and the possibilities that it creates for CPython users, maintainers and educators. It presents Codoscope, a new vizualization tool that displays CPython's process of translating Python source code into an executable code object.

Watch
Talks - Sarah Kaiser: Eternal sunshine of the spotless development environment

“It says the package is not installed?” is a common refrain when working on software projects, especially in Python. Creating and configuring reproducible environments is a major part of modern software development and has led to the popularity of tools like Docker to specify where and how code runs. Development (Dev) Container are an open specification that extends Docker images to make it easy to configure not only the where the code runs, but also the developer workspace in your preferred editors or toolchains like VS Code, PyCharm, or DevPod. Setting up Dev Containers for your projects can reduce maintainer overhead of OSS projects, bootstrap contributors, and make running events like workshops or sprints go more smoothly. In this talk, we will briefly cover why setting up container infrastructure can be useful for isolating your project environments and dig into how you can extend that with Dev Containers to configure a complete development experience using VS Code. We will look at two common OSS project situations, onboarding and workshops, to see how workflows for using Dev Containers and other supporting tools make things easier. No container experience required, brief familiarity with VS Code is helpful, but not necessary. Slides: https://pycon-assets.s3.amazonaws.com/2024/media/presentation_slides/123/2024-05-18T16%3A59%3A27.169760/pycon-2024-eternal-sunshine.pptx

Watch
Talks - Jeff Epler: Connecting Old to New with CircuitPython: Retrocomputer input devices on...

Full title: Connecting Old to New with CircuitPython: Retrocomputer input devices on modern PCs Presented by: Jeff Epler The input devices of decades past hold nostalgic value for many folks. But they don't need to merely sit on a shelf as museum objects—they can be reverse engineered and then adapted to modern computers without modifying the original hardware. CircuitPython, an implementation of the Python language for microcontrollers, is an excellent language for projects like these, thanks to native USB Human Interface Device (HID) support and the ability to ‘bitbang’ archaic interfaces combined with the fast development cycle of an interpreted language, as you'll learn in case studies adapting these keyboards & mice. No previous experience with CircuitPython is necessary. Some knowledge of electronics will enhance your enjoyment of this talk, though there will be a quick summary of key electronics concepts as the talk proceeds.

Watch
Talks - Tim Paine: Building FPGA-based Machine Learning Accelerators in Python

In this talk, we will demo a simple machine learning accelerator deployed on a commodity FPGA and developed using a Python-based toolchain. The FPGA platform is based on an entry level Xilinx FPGA, with a total cost of materials smaller than $200. The toolchain uses a combination of open source software, including PyTorch and ONNX for modeling, and Migen and LiteX for the construction of the System-on-chip. We will also survey the wide array of both open source and proprietary vendor tools necessary to build this project, and discuss the broader open source silicon landscape.

Watch
Talks - Krishi Sharma: Trust Fall: Three Hidden Gems in MLFlow

AI research is more important now than ever. Trust in AI is critical, but it’s hard to build trust without metrics and documentation. How can we make documentation as easy as possible in order to maintain trust in the results from our research? Is there a way to organize our models so that we can ensure reproducibility? How can we save ourselves precious development time by automating parts of the metric tracking process? In this talk, we’ll give a brief introduction to a popular metric tracking tool, MLFlow, before going into a deep dive on three lesser known features that can enhance collaboration, increase transparency and reduce the time wasted reproducing results. The three features that we’ll talk about are autologging, MLFlow system tags and the MLFlow model registry. We’ll see how using these three features can save you tons of time that would have otherwise been wasted writing lines of code, looking for old code or finding the right model version. By the end of this talk, you’ll have all the knowledge you need to successfully use MLFlow to your best advantage. You’ll be able to automatically log every parameter and metric according to your framework of choice, link the version of code to the metrics that version produced for faster reproducibility and have a process that you can reliably use to write helpful documentation quickly. Slides: https://pycon-assets.s3.amazonaws.com/2024/media/presentation_slides/74/2024-05-15T18%3A13%3A55.322433/Trust_Fall_-_Hidden_Gems_in_MLFlow.pdf

Watch
Talks - Alastair Stanley: Computational Origami

What's the best thing you can do with a piece of paper? I'm not talking about paper planes or dragons (or even Mr. Napkin Head). The elegant art of paper folding can be harnessed to perform surprisingly powerful calculations. From a handful of basic folding axioms, we will construct computational systems to solve a wide range of problems. Starting with basic arithmetic operations, we will build up to tackling cubic equations and even proofs of irrationality. I'll be simulating the fold sequences in a custom Python library, but feel free to bring a sheet of paper to follow along. Slides: https://pycon-assets.s3.amazonaws.com/2024/media/presentation_slides/28/2024-05-17T19%3A00%3A59.950470/Computational_Origami.pdf

Watch