List of videos

CHARLAS / Eric Arellano / Cuándo usar extensiones nativas en Rust: rendimiento accesible y seguro
Diapositivas:Cuando hay problemas de rendimiento, las extensiones nativas de Python se empoderan para mejorar el rendimiento del "critical path", y también seguir usando Python y evitar una reinscripción costosa. Sin embargo, normalmente se escriben las extensiones nativas en C y C++, y es un profundo reto usarlas de manera segura. Rust ofrece una alternativa lista para la producción a las extensiones en C y C++. Con un rendimiento casi igual, Rust ofrece la seguridad de la memoria y de la concurrencia, acompañada con una ergonomía moderna y una comunidad inclusiva para los principiantes (¡como Python!). Incluso si no se tiene experiencia con las extensiones nativas, C/C++, o Rust, esta charla le dará un resumen accesible sobre cómo las extensiones nativas en Rust han empoderado el proyecto open source de Pants a realizar el rendimiento, mientras que mantiene la expresividad y la flexibilidad de Python para la mayoría de sus desarrolladores. Se adquirirá conocimiento de cuándo vale la pena usar extensiones nativas en Rust—basado en los 5 años de experiencia de la comunidad de Pants—y, además, algunos recursos para aprender cómo usar las extensiones nativas en Rust. Diapositivas: https://speakerdeck.com/ericarellano/cuando-usar-extensiones-nativas-en-rust-rendimiento-accesible-y-seguro
Watch
TALK / Jeremy Paige / Packaging Python in 2021
Five years after the inception of pyproject.toml the Python packaging landscape is now richer than ever. Despite all the new choices when setting up a project, setup.py's roll is diminishing. Discover why it may soon be absent and what to use in its place. Slides: https://docs.google.com/presentation/d/19K_QccShcWncnFK-6Q4JvD1Ictg1VYTMiL5xEZjA8gc/edit?usp=sharing
Watch
TALK / Simon Prickett / No, Maybe and Close Enough: Using Probabilistic Data Structures in Python
Being right all the time isn't necessarily the best idea. This talk examines how to count distinct items from a firehose of data, how to determine if we've seen a given item before, and why absolute accuracy may be impractical when doing so. Probabilistic data structures trade accuracy for approximate results, speed and economy of resources. They provide fast, scalable solutions to problems such as counting likes on social media posts, or determining which articles on a website a user has previously read. I'll introduce the Hyperloglog and Bloom Filter, explain how they work at a high level, and demonstrate different ways in which each can be leveraged in Python. A GitHub repo to accompany this talk can be found at https://github.com/simonprickett/python-probabilistic-data-structures Slides: https://simonprickett.dev/no_maybe_and_close_enough_slides.pdf
Watch
TALK / Sebastiaan Zeeff / The magic of "self": How Python inserts "self" into methods.
A phrase that I hear a lot is "Python is easy to learn, but hard to master". In a way that's true: Python is easy to learn because its high level of abstraction allows you to focus on the business logic of what you're trying to do instead of the lower-level implementation details. At the same time, Python's abstraction isn't magical: Its versatile data model allows you to hook into almost every part of the language to implement objects that behave just as Python's built-in objects do, enabling you to create similarly high-leveled interfaces for your own objects. That's where "hard to master" comes in: There is so much to learn that you're never done learning. In this talk, I want to entice you to look beyond Python's high-level interface into the wonderful landscape of its data model. I'll do that by explaining one of Python's most "magical" features: The automatic insertion of self into methods. Often, to beginners, the insertion of the instance as the first argument to methods is explained as something that Python just does for you: "Don't worry about it, it just happens!". More intermediate Python programmers typically get so used to self that they hardly notice it anymore in their function signatures, let alone wonder about what's powering it. To explain this bit of Python magic, I’ll give you an informal introduction to something called descriptors. To be sure, this talk isn’t going to be an in-depth discussion of the finer details of the descriptor protocol. Rather, it’s aimed at advanced beginners and intermediate Python developers who are eager to get an idea of what lies beneath the surface of Python. With this talk, I hope to pique your curiosity about the more advanced features of the Python programming language and hopefully give you a glimpse of all the things that are possible. Slides: https://sebastiaanzeeff.nl/pycon
Watch
TALK / Rebecca Bilbro, Daniel Sollis, Mark, Patrick Deziel /PyTesting the Limits of Machine Learning
Despite the hype cycle, each day machine learning becomes a little less magic and a little more real. Predictions increasingly drive our everyday lives, embedded into more of our everyday applications. To support this creative surge, development teams are evolving, integrating novel open source software and state-of-the-art GPU hardware, and bringing on essential new teammates like data ethicists and machine learning engineers. Software teams are also now challenged to build and maintain codebases that are intentionally not fully deterministic. This nondeterminism can manifest in a number of surprising and oftentimes very stressful ways! Successive runs of model training may produce slight but meaningful variations. Data wrangling pipelines turn out to be extremely sensitive to the order in which transformations are applied, and require thoughtful orchestration to avoid leakage. Model hyperparameters that can be tuned independently may have mutually exclusive conditions. Models can also degrade over time, producing increasingly unreliable predictions. Moreover, open source libraries are living, dynamic things; the latest release of your team's favorite library might cause your code to suddenly behave in unexpected ways. Put simply, as ML becomes more of an expectation than an exception in our industry, testing has never been more important! Fortunately, we are lucky to have a rich open source ecosystem to support us in our journey to build the next generation of apps in a safe, stable way. In this talk we'll share some hard-won lessons, favorite open source packages, and reusable techniques for testing ML software components. Slides: https://docs.google.com/presentation/d/1Qrg0C5L6-5uQCtkUdqgw5UZPyFoCNJ07LxHWXVAzx2g
Watch
TALK / Dino Viehland / Python Performance at Scale - Making Python Faster at Instagram
Python is used in a large number of web sites where the performance of the web tier is a significant cost. There are multiple ways to improve the performance of these applications: improving the Python code itself, moving code out of Python using tools like Cython, and extreme options like directly improving the performance of the Python interpreter. In this talk we’ll explore some of the changes we’ve made to the CPython runtime to improve the performance of our workload. We’ll start with a high level overview of our architecture which isn’t atypical for a Python web application and see opportunities and challenges that has provided for optimization. Then we’ll go deep down the rabbit hole and look at common hot spots in the Python runtime and the results we’ve had in reducing the overhead of them. Along the way we’ll look at both targeted optimization opportunities and classic techniques such as inline caching, a JIT compiler, and leveraging type annotations for performance. We’ll cover techniques that we’ve proven successful, and ones that are still experimental. We’ll see how these can be applied to the Python runtime and what are the performance results of doing so: overall we’ve seen a 20-30% improvement in our production workload and up to 7x improvement on benchmarks. Slides: https://www.viehland.com/PyCon_2021.pdf
Watch
TALK / Adam Breindel / Dask-SQL: Empowering Pythonistas for Scalable End-to-End Data Engineering
Few things are more frustrating -- or inefficient -- than having a team of brilliant Python folks get stuck at the initial "get the data" stage of a project, because that data is "trapped" in a Hive/Spark-based datalake or requires complex SQL queries to assemble. Let's get unstuck, with dask-sql! PyData tooling and Dask are immensely popular in data pipelines, but the beginning stages of those pipelines -- often involving SQL data extraction from enterprise datalakes -- have traditionally required Java/JVM-based tools, such as Apache Spark. That changed in the past year, with the release of dask-sql. Dask-sql empowers Pythonistas with little or no knowledge of the JVM/Hadoop world to create end-to-end data projects. In this talk, we'll explore how we can use Python and dask-sql to perform SQL data/feature extraction from datalakes and Hive tables. We'll see how we can immediately refine and use that data for machine learning, analytics, or transformation workloads with our favorite PyData tools. We'll also discuss the design of dask-sql: an innovative project that combines battle-tested SQL optimization from Apache Calcite, scalable dataframe operations via Dask, and integration to the enterprise-standard Hive metastore data catalog. Slides: https://github.com/adbreind/pycon2021-dask-sql
Watch
TALK / Tobias Kohn / The Road to Pattern Matching in Python
Pattern matching is a great and proven tool for programmers. However, can we also assimilate and integrate it into Python? This talk tries to give an answer and discusses the rationale and ideas behind the recent "pattern matching" PEPs. Processing structured data has sparked ever more powerful programming tools. Python's objects and classes, for instance, have proven themselves to be particularly versatile and form part of the backbone of the language. Constructing or building new objects---including built-ins such as lists, tuples or dictionaries---abounds in any Python code. In contrast, testing the structure of data and extracting specific elements is often rather cumbersome, requiring the frequent use of built-in functions like isinstance, len and getattr. Pattern matching addresses this issue by introducing a new paradigm to de-construct data, complementing existing tools. It can be thought of as an extension of Python's iterable unpacking to arbitrary objects. However, it does so in a 'safe' way, ensuring that objects have the necessary structure to proceed with unpacking elements and attributes. The objective of this talk is to give you an overview of why pattern matching matters and what it really is. You will gain a deeper understanding of the core concepts that make up pattern matching, as well as the design decisions and ideas behind the recent "pattern matching" PEPs. However, this talk will not provide an introduction on how to use pattern matching in your code, nor is it about the intricacies of the implementation. If you are a Python programmer, have heard of the new pattern matching features and are wondering what it is all about, then this talk is for you.
Watch
TALK / Alexander Hultnér / Intro to Pydantic, run-time type checking for your dataclasses
Want static type checking in run time? Want to use standard python type annotations? Want compatibility with standard python dataclasses? Then it sounds like pydantic is something for you. Pydantic offers a pythonic way to validate your user data using run-time enforced standard type-annotations. This talk focuses on how Pydantic can be used with web APIs to simplify many parts regarding user input validation. I’ve previously back in early 2018 built a similar solution to Pydantic based upon standard dataclasses for a large B2B SaaS application built with flask. When I left that project I was briefly considered rebuilding it as open-source but while doing my research I discovered Pydantic’s powers which I had put in my keep tabs on the list when it was in a much earlier stage, but at this point, it had evolved to a polished library and a perfect companion for JSON-based APIs. Slides: https://slides.com/hultner/pycon-us-2021
Watch