PyCon US 2021
2021
List of videos

TUTORIAL / Husni Almoubayyed / Effective Data Visualization
From picking the right plot for the particular type of data, statistic, or result; to pre-processing sophisticated datasets, and even making important decisions about the aesthetic of a figure, visualization is both a science and art that requires both knowledge and practice to master. This tutorial is for python users who are familiar with python and basic plotting, and want to build strong visualization skills that will let them effectively communicate any data, statistic, or result. We will use python libraries such as seaborn, matplotlib, plotly, and sklearn; and discuss topics such as density estimation, dimensionality reduction, interactive plotting, and making suitable choices for communication. Drawing examples from datasets in the scientific, financial, geospatial (mapping) fields and more.
Watch
TUTORIAL / Geir Arne Hjelle / Introduction to Decorators: Power UP Your Python Code
Python supports functions as first-class objects. This means that functions can be assigned to variables, and passed to and from other functions, just like any other object in Python. One powerful application of this is the decorator syntax, which makes it easy to apply one function to another at compile time. Decorators offer a simple and readable way of adding capabilities to your code. This tutorial will teach you how decorators work, and how to create your own decorators. Being comfortable with using and creating decorators, will make you a more efficient Python programmer.
Watch
KEYNOTE / Robert Erdmann
Prior to earning his Ph.D. from the University of Arizona in 2006, Robert Erdmann started a science and engineering software company and worked extensively on solidification and multiscale transport modeling at Sandia National Laboratories. Upon graduation, he joined the faculty at the University of Arizona in the Department of Materials Science and Engineering and the Program in Applied Mathematics, where he worked on multiscale material process modeling and image processing for cultural heritage. In 2014 he moved permanently to Amsterdam to focus full-time on combining materials science, computer science, and imaging science to help the world access, preserve, and understand its cultural heritage. He is Senior Scientist at the Rijksmuseum, and is also Full Professor of Conservation Science in the Faculties of Science and of Humanities at the University of Amsterdam. He has been using Python since 2001 and teaching Python since 2006.
Watch
KEYNOTE / Saron Yitbarek
Saron is the founder of Disco, audio courses on tech topics. She's also the founder of CodeNewbie (acquired), a podcaster, a developer, and speaker.
Watch
KEYNOTE / Akshay Sharma
Akshay Sharma is executive vice president of artificial intelligence (AI) at Sharecare, the digital health company that helps people manage all their health in one place. Sharma joined Sharecare in 2021 as part of its acquisition of doc.ai, the Silicon Valley-based company that accelerated digital transformation in healthcare. Sharma is an entrepreneur, founder, and experienced leader in engineering technology with a focus on healthcare. Since he initially joined doc.ai, he has been integral in the creation of core mobile app offerings including Passport, a privacy-based health-at-work solution; Serenity, a mental health digital offering; NetRunner, an edge computing and inference AI app; and Genewall, a genome app that deals with bioinformatics on the edge. He is passionate about developing and applying poly-omics data combinations to healthcare and life sciences as well as developing AI to assist in medical data understanding. He has built products, technology, and teams focused on edge computing (mobile/MCUs/sensors) and privacy (computations/inferences/learning), and holds several patents in this space. With doc.ai, Sharma previously held various leadership positions including chief technology officer (CTO), and vice president of engineering, a role in which he developed several key technologies that power mobile-based privacy products in healthcare. Prior to joining the company, he also founded and co-founded several businesses including Swast, a startup focused on doctor-efficiency within the Indian healthcare ecosystem, and PixelSimple, a company engineering nextgen media streaming systems in the U.S. and Bangalore. In addition to his role at Sharecare, Sharma serves as CTO of TEDxSanFrancisco and also is involved in initiatives to decentralize clinical trials. Sharma holds bachelor’s degrees in engineering and engineering in information science from Visvesvaraya Technological University.
Watch
KEYNOTE / Python Steering Council / C. Willing, T. Wouters, B. Cannon, P. Galindo Salgado, B. Warsaw
Carol Willing Thomas Wouters Brett Cannon Pablo Galindo Salgado Barry Warsaw Elected as prescribed in PEP 8016, the Python Steering Council is a 5-person committee that assumes a mandate to maintain the quality and stability of the Python language and CPython interpreter, improve the contributor experience, formalize and maintain a relationship between the Python core team and the PSF, establish decision making processes for Python Enhancement Proposals, seek consensus among contributors and the Python core team, and resolve decisions and disputes in decision making among the language. This keynote will update the community on current and future initiatives. Additionally, the Steering Council will address community questions collected prior to the conference.
Watch
KEYNOTE / Python Software Foundation Community Address / Ewa Jodlowska
Ewa Jodlowska, Executive Director addresses the community and announces the Community Service Award recipients
Watch
TALK / Marina Shvartz / Testing stochastic AI models with Hypothesis
Over the years, testing has become one of the main focus areas in development teams, a good feature is a well tested one. In the field of AI this is many times a real struggle. Since eventually most advanced AI models are stochastic - we can’t manually define all their possible edge cases. This led us to use the hypothesis library which does a lot of that for you, while you can focus on defining the properties and specifications of your system. In this talk, I will cover shortly the theory of property-based testing and then jump into use cases and examples to demonstrate how we used the hypothesis library to generate random examples of plausible edge cases of our AI model.
Watch
TALK / Benjy Weinberger / Creating extensible workflows with off-label use of Python
Workflow-oriented systems have many uses, including data processing and analysis, ETL, CI/CD, and more. But creating a programmatic interface to a workflow system is a delicate balancing act: we want the API to be flexible enough to support useful work, but also constrained enough that tasks run cooperatively within the larger system. We faced this challenge when designing the task API for the Pants build system. We needed to allow custom task code to enjoy the benefits of complex features like caching, concurrency and remote execution, without every task author having to reason about them. In this talk we'll show how we found the right balance through unconventional use of Python's type annotations, coroutines, and dataclasses. Combining these seemingly disparate features in the context of a workflow engine allows you to build elegant extensibility APIs with just the right amount of flexibility. Slides: https://docs.google.com/presentation/d/1aWZjk3tZUp37RDmZZxy0j8OjbatzkuWLfuSh84Lzwfk/edit?usp=sharing
Watch
TALK / Susan Shu Chang / Narrative-focused video games development with Ren'Py an open source engine
The game engine, Ren'Py, is an open source engine used to make countless interactive fiction games, also known as visual novels (VNs). These include commercial hits with VN elements such as Persona 5, to viral works such as Doki Doki Literature Club (2mil+ downloads as of Jan 2018). I learned to program in Python using this engine, and have released my commercial game with it after working for a few years on it during weekends, selling 6K+ copies in less than half a year. In the daytime, I work as a principal data scientist in fintech. Anyhow, the talk will dig into the source code of the engine, https://github.com/renpy/renpy, such as: How it takes care of OS level stuff for game developers, memory optimization, cross platform game saves, and all that cool stuff. Outcome: The audience will understand the independent gaming industry and how they can use Python to break into the industry, as I share my journey. There will also be components of source code walkthrough, but will be more of an overview than a step by step tutorial due to the scope of the talk.
Watch
TALK / Alan Yu, Vasu Bhog / What we learned from Papermill to operationalize notebooks
When you hear about beautiful notebook automation, your first thought usually goes to what Netflix is doing with Jupyter Notebooks. Their work is heavily inspired by nteract's papermill, which allows for the parameterization, execution, and analysis of Jupyter notebooks. Notebook operationalization opens many doors for team's troubleshooting pipelines, and we wanted to learn more from the open source community for how we can work together to empower developers who are on-call. Join us for this code-focused session to hear about our journey of listening and learning to the open source community, and how we used Python to evolve notebook parameterization.
Watch
TALK / Randall Hunt, Mike Ruberry / From NumPy to PyTorch, A Story of API Compatibility
NumPy has grown to be a vital part of the data science workflow for everyone from astrophysicists to zoologists. This talk is about how PyTorch approaches being “NumPy-compatible,” and why the PyTorch community thinks that’s important, why it can be challenging, and why sometimes it’s necessary to be divergent from NumPy’s behavior. Slides: https://www.slideshare.net/MikeRuberry1/from-numpy-to-pytorch
Watch
TALK / Meredydd Luff / Writing Good Documentation for Developers
If you're building something for developers, you want it to get used. This means your potential users need to find your library, framework, or API. They need to work out whether it's useful for them, learn how to use it, and solve problems they encounter along the way. All these things depend on your developer docs! Docs aren't just docs: They're your UI, your marketing, and they - not your code - define what your product is. This talk talk about important functions of your docs that you might not think about, and then some particular pitfalls of documenting things for developers. Slides: https://drive.google.com/file/d/1K93TsQ4s39X70vvzckPpdke7srzMTXZ6/view
Watch
TALK / Yetunde Dada / Reproducible and maintainable data science code with Kedro
Code produced by data scientists is under attack! There are a growing series of conference talks, Medium blog posts and business stakeholders telling a story of how changing business objectives are driving interest in production-level code. Production-level code is considered time-consuming to produce and limiting for the experimentation process needed to create amazing models. You're going to follow a workflow that deconstructs your experimentation workflow in a Jupyter notebook and helps you create production-ready ML pipelines. The talk is focused on an open source Python framework, called kedro that emphasises creating reproducible, maintainable and modular data science code. Documentation: https://kedro.readthedocs.io/en/stable/ GitHub Repository: https://github.com/quantumblacklabs/kedro Slides: https://speakerdeck.com/yetudada/reproducible-and-maintainable-data-science-code-with-kedro?slide=18
Watch
TALK / Terri Oda / pyKnit: math tools for knitters
Knitting patterns are effectively code that gives you a physical object if you execute them. Customizing and designing patterns takes a lot of math to get sizing and shapes right, but not all knitters love math, and even those who do don't want to do it by hand all the time. Every year at PyCon I meet a few more knitters, so I thought maybe this was the year we could put our heads together and build an open source knitting toolkit and maybe make customizing your knitting a little easier for everyone. The pyKnit toolkit will hopefully make it easier for people to adjust garment patterns to fit, for pattern designers to be size inclusive, or for any knitter to adjust patterns to make the most out of a special ball of yarn. Slides: https://docs.google.com/presentation/d/1Kr7Nmzgs5RCqx3kxyMDXwGNGe9Skq8E4bquLQhI3fdo/edit?usp=sharing
Watch
TALK / James Murphy / From 3 to 300 fps: NES Emulation in Python and Cython
It is sometimes asserted that “Python is a real bad choice for any kind of real-time system”. When I first got my Python NES emulator to boot, only for it to run at 2 frames per second, I felt like agreeing. But is it really true? After just a few days reworking the emulator’s operational core into Cython, the framerate is now above 300fps, proving that Python is a viable choice for emulator development and other performance-dependent projects. In this talk, I will outline the advantages and some challenges of using Cython to achieve realtime performance from an existing Python codebase. Slides: https://docs.google.com/presentation/d/e/2PACX-1vTj4Y7KEy5hhOlYKqbSdUDbFLol4tw91Qvn7otLdFTb-rCwhP4kLKYieYUzhcDnV9OGtgxuraQV3-ep/pub?start=false&loop=false&delayms=3000
Watch
TALK / SangBin Cho / Data Processing on Ray
Machine learning and data processing applications continue to drive the need to develop scalable Python applications. Ray is a distributed execution engine that enables programmers to scale up their Python applications. This talk will cover some of the challenges we faced and key architectural changes we made to Ray over the past year to support a new set of large scale data processing workloads. Slides: https://docs.google.com/presentation/d/15a6-6Smdu9FldWU8S21925i7wV5Y625QXduXlxBBscw/edit?usp=sharing
Watch
TALK / Shemra Rizzo / Learning python during lockdown: a surprising bonding experience with my child
Do you remember what it was like when you first started to code? Do you remember feeling frustrated and discouraged or even wanting to give up? What would you say to yourself if you had a chance to go back in time and talk to your beginner self? Unfortunately, traveling back in time is not possible, but we can always share our experiences and advice with those who are just getting started. The pandemic lockdown coincided with my teenager taking his first python courses remotely. I had the opportunity to witness the beginning of his path as a python programmer and to serve as an encouraging mentor. I demystified the programming process and created realistic expectations. Meeting four times a week to discuss and practice python for many months proved to be a surprising and amazing bonding experience with my child. In this talk I will cover the main topics we discussed during our meetings, which were all the things I wish I had known when I was first starting to code. - Programming is not memorization. It is problem solving. - Don’t jump straight into coding, use lots of scratch paper and pseudo code to understand your problem and design your solution. - Make sure your code does what it’s supposed to do. Test it. - Getting stuck and making mistakes is normal. Debugging is not a sign of failure. - Don’t learn isolated. Find a mentor and join a community. I hope that after this talk you will be inspired to become a mentor, support others and share your experience with those that are starting their programming journey in python.
Watch
TALK / Itamar Turner-Trauring / 0 to production-ready: a best-practices process for Docker packaging
You know the basics of packaging your Python application for Docker, but do you know enough to run that image in production? Bad packaging can result in security and production problems, not to mention wasted time try to debug unreproducible errors. And even if you figure out the best practices, there's still a huge number of details to get right, many of which interact with each other in unexpected ways. My personal list includes over 60 Docker packaging best practices, and it keeps growing. So where do you start? What should you do first? To help you quickly package your application in a production-ready way, this talk will give you a process to help you prioritize and iteratively implement these best practices, by starting with the highest priority best practices (security, automation), moving on the correctness and reproducibility, and finally focusing on optimization.
Watch
TALK / Francesco Tisiot / Event-driven applications: Apache Kafka and Python
Code and data go together like tomato and basil; not many applications work without moving data in some way. As our applications modernise and evolve to become more event-driven, the requirements for data are changing. In this session we will explore Apache Kafka, a data streaming platform, to enable reliable real-time data integration for your applications. We will look at the types of problems that Kafka is best at solving, and show how to use it in your own applications. Whether you have a new application or are looking to upgrade an existing one, this session includes advice on adding Kafka using the Python libraries and includes code examples (with bonus discussion of pizza toppings) to use. With Kafka in place, many things are possible so this session also introduces Kafka Connect, a selection of pre-built connectors that you can use to route events between systems and integrate with other tools. This session is recommended for engineers and architects whose applications are ready for next-level data abilities. Notebook link: https://github.com/aiven/kafka-python-notebooks Slides: https://speakerdeck.com/ftisiot/event-driven-applications-apache-kafka-and-python
Watch
TALK / Emery Berger / Scalene: A high-performance, high-precision CPU+GPU+memory profiler for Python
Scalene is a high-performance CPU and memory profiler for Python that does a number of things that other Python profilers do not and cannot do. It runs orders of magnitude faster than other profilers while delivering far more detailed information. This talk will present case studies of using Scalene, and describe some of the technical advances that make it work. Slides: https://www.cs.umass.edu/~emery/scalene-pycon2021.pdf
Watch
TALK / Mariatta Wijaya / Oops! I Became an Open Source Maintainer!
I consider myself relatively new to the open source world; my first open source contribution was in 2016. Since then, I’ve continued to actively contribute to open source and specifically to core Python and Python libraries. Pretty soon I found myself being given commit rights to other people’s open source projects. It’s been quite a journey. Being a new open source contributor has its own challenges, and being a new open source maintainer brings another set of unique challenges. In this talk, I will share my journey and the things I’ve learned along the way, and some advice for other aspiring open source maintainers and contributors. Slides: https://speakerdeck.com/mariatta/oops-i-became-an-open-source-maintainer
Watch
TALK / Simon Mo / Patterns of ML Models in Production
You trained a ML model, now what? The model needs to be deployed for online serving and offline processing. This talk walks through the journey of deploying your ML models in production. I will cover common deployment patterns backed by concrete use cases which are drawn from 100+ user interviews for Ray and Ray Serve. Lastly, I will cover how we built Ray Serve, a scalable model serving framework, from these learnings. Slides: https://drive.google.com/file/d/1iLY5Tw7Sq3Hik4Fy0LXRnclyfXBbsJUj/view?usp=sharing
Watch
TALK / Anthony Shaw / Restarting Pyjion, a general purpose JIT for Python- is it worth it?
In this talk you'll see an update to the Pyjion project, a JIT compiler for CPython byte-code. This project was started 5 years ago but stopped after making no gains in performance. Recent changes to CPython have made optimisations more viable, so now it has been restarted and is showing big performance gains vs. standard CPython with 100% compatibility. Many attempts have been made to build a general purpose JIT for Python and few have succeeded. Is it worth it and what are the gains to be made? This talk will cover the design ideas of a JIT for CPython, optimisations, and future potential. Website: https://pyjion.readthedocs.io Source code: https://GitHub.com/tonybaloney/pyjion Book: https://realpython.com/products/cpython-internals-book/
Watch
CHARLAS / María José Meneses / Restauración de imágenes multiespectrales con GAN
Las imágenes multiespectrales contienen información útil pero su procesamiento puede presentar pérdida de datos valiosos, por ello se propone un método de restauración basado en GAN que genera resultados visualmente aceptables a partir de imágenes multiespectrales con corrupciones de tamaño y ubicación aleatoria. Diapositivas: https://drive.google.com/file/d/1SR7kFPduibOtFwwBE9w-XfFi1N9lTEg7/view?usp=sharing
Watch
CHARLAS / Mauricio Vásquez / Tracing Distribuido con OpenTelemetry
OpenTelemetry nace de la fusión de OpenTracing y OpenCensus, dos proyectos similares que brindan un conjunto de APIs para tracing distribuido y métricas. El tracing distribuido nos permite monitorear sistemas distribuidos recolectando información de monitoreo que es generada por los diferentes componentes. Para generar esta información, los desarrolladores instrumentan las aplicaciones y librerías introduciendo pequeñas porciones de código que envían información a un sistema colector que analiza y ensambla todos los datos para dar una visión general de las interacciones de los diferentes componentes del sistema. OpenTelemetry también provee un mecanismo de instrumentación automática que permite generar información de tracing en aplicaciones que no incluyen soporte para OpenTelemetry. Este mecanismo se basa en el hecho que la mayoría de aplicaciones usan un conjunto de librerías estándar que está instrumentado con OpenTelemetry. Esta charla explica el concepto de tracing distribuido, brinda una visión general de OpenTelemetry, presenta una demostración de una aplicación simple y como el sistema de instrumentación automática puede ser usado en aplicaciones que no tienen soporte nativo para OpenTelemetry. Diapositivas: https://tinyurl.com/pyconus-2021
Watch
CHARLAS / Rafael Santos / El Zen de Python en español
Entendiendo en nuestro idioma la filosofía detrás de Python, poniendo en contexto cada uno de los principios que conforman el "Zen" del lenguaje. Diapositivas: https://objectstorage.us-ashburn-1.oraclecloud.com/p/yXzK_sUn7Qag9ihL7kGHv8KXlcRYONIkWXgnwptCf-4LRZdIG3zvIIrmsBYTb2ds/n/idzlqojegvvo/b/pycon21/o/ElZenDePython_Pycon21.pdf
Watch
CHARLAS / Luis Conejo / GitHub Classroom: Una herramienta simple para la enseñanza de Python
Esta presentación cubrirá nuestro uso previo de un único repositorio de GitHub, sus ventajas y desventajas y nuestras experiencias durante la migración a GitHub Classroom y su impacto positivo en el aprendizaje de nuestros estudiantes de Python.
Watch
CHARLAS / Eric Arellano / Cuándo usar extensiones nativas en Rust: rendimiento accesible y seguro
Diapositivas:Cuando hay problemas de rendimiento, las extensiones nativas de Python se empoderan para mejorar el rendimiento del "critical path", y también seguir usando Python y evitar una reinscripción costosa. Sin embargo, normalmente se escriben las extensiones nativas en C y C++, y es un profundo reto usarlas de manera segura. Rust ofrece una alternativa lista para la producción a las extensiones en C y C++. Con un rendimiento casi igual, Rust ofrece la seguridad de la memoria y de la concurrencia, acompañada con una ergonomía moderna y una comunidad inclusiva para los principiantes (¡como Python!). Incluso si no se tiene experiencia con las extensiones nativas, C/C++, o Rust, esta charla le dará un resumen accesible sobre cómo las extensiones nativas en Rust han empoderado el proyecto open source de Pants a realizar el rendimiento, mientras que mantiene la expresividad y la flexibilidad de Python para la mayoría de sus desarrolladores. Se adquirirá conocimiento de cuándo vale la pena usar extensiones nativas en Rust—basado en los 5 años de experiencia de la comunidad de Pants—y, además, algunos recursos para aprender cómo usar las extensiones nativas en Rust. Diapositivas: https://speakerdeck.com/ericarellano/cuando-usar-extensiones-nativas-en-rust-rendimiento-accesible-y-seguro
Watch
TALK / Jeremy Paige / Packaging Python in 2021
Five years after the inception of pyproject.toml the Python packaging landscape is now richer than ever. Despite all the new choices when setting up a project, setup.py's roll is diminishing. Discover why it may soon be absent and what to use in its place. Slides: https://docs.google.com/presentation/d/19K_QccShcWncnFK-6Q4JvD1Ictg1VYTMiL5xEZjA8gc/edit?usp=sharing
Watch
TALK / Simon Prickett / No, Maybe and Close Enough: Using Probabilistic Data Structures in Python
Being right all the time isn't necessarily the best idea. This talk examines how to count distinct items from a firehose of data, how to determine if we've seen a given item before, and why absolute accuracy may be impractical when doing so. Probabilistic data structures trade accuracy for approximate results, speed and economy of resources. They provide fast, scalable solutions to problems such as counting likes on social media posts, or determining which articles on a website a user has previously read. I'll introduce the Hyperloglog and Bloom Filter, explain how they work at a high level, and demonstrate different ways in which each can be leveraged in Python. A GitHub repo to accompany this talk can be found at https://github.com/simonprickett/python-probabilistic-data-structures Slides: https://simonprickett.dev/no_maybe_and_close_enough_slides.pdf
Watch
TALK / Sebastiaan Zeeff / The magic of "self": How Python inserts "self" into methods.
A phrase that I hear a lot is "Python is easy to learn, but hard to master". In a way that's true: Python is easy to learn because its high level of abstraction allows you to focus on the business logic of what you're trying to do instead of the lower-level implementation details. At the same time, Python's abstraction isn't magical: Its versatile data model allows you to hook into almost every part of the language to implement objects that behave just as Python's built-in objects do, enabling you to create similarly high-leveled interfaces for your own objects. That's where "hard to master" comes in: There is so much to learn that you're never done learning. In this talk, I want to entice you to look beyond Python's high-level interface into the wonderful landscape of its data model. I'll do that by explaining one of Python's most "magical" features: The automatic insertion of self into methods. Often, to beginners, the insertion of the instance as the first argument to methods is explained as something that Python just does for you: "Don't worry about it, it just happens!". More intermediate Python programmers typically get so used to self that they hardly notice it anymore in their function signatures, let alone wonder about what's powering it. To explain this bit of Python magic, I’ll give you an informal introduction to something called descriptors. To be sure, this talk isn’t going to be an in-depth discussion of the finer details of the descriptor protocol. Rather, it’s aimed at advanced beginners and intermediate Python developers who are eager to get an idea of what lies beneath the surface of Python. With this talk, I hope to pique your curiosity about the more advanced features of the Python programming language and hopefully give you a glimpse of all the things that are possible. Slides: https://sebastiaanzeeff.nl/pycon
Watch
TALK / Rebecca Bilbro, Daniel Sollis, Mark, Patrick Deziel /PyTesting the Limits of Machine Learning
Despite the hype cycle, each day machine learning becomes a little less magic and a little more real. Predictions increasingly drive our everyday lives, embedded into more of our everyday applications. To support this creative surge, development teams are evolving, integrating novel open source software and state-of-the-art GPU hardware, and bringing on essential new teammates like data ethicists and machine learning engineers. Software teams are also now challenged to build and maintain codebases that are intentionally not fully deterministic. This nondeterminism can manifest in a number of surprising and oftentimes very stressful ways! Successive runs of model training may produce slight but meaningful variations. Data wrangling pipelines turn out to be extremely sensitive to the order in which transformations are applied, and require thoughtful orchestration to avoid leakage. Model hyperparameters that can be tuned independently may have mutually exclusive conditions. Models can also degrade over time, producing increasingly unreliable predictions. Moreover, open source libraries are living, dynamic things; the latest release of your team's favorite library might cause your code to suddenly behave in unexpected ways. Put simply, as ML becomes more of an expectation than an exception in our industry, testing has never been more important! Fortunately, we are lucky to have a rich open source ecosystem to support us in our journey to build the next generation of apps in a safe, stable way. In this talk we'll share some hard-won lessons, favorite open source packages, and reusable techniques for testing ML software components. Slides: https://docs.google.com/presentation/d/1Qrg0C5L6-5uQCtkUdqgw5UZPyFoCNJ07LxHWXVAzx2g
Watch
TALK / Dino Viehland / Python Performance at Scale - Making Python Faster at Instagram
Python is used in a large number of web sites where the performance of the web tier is a significant cost. There are multiple ways to improve the performance of these applications: improving the Python code itself, moving code out of Python using tools like Cython, and extreme options like directly improving the performance of the Python interpreter. In this talk we’ll explore some of the changes we’ve made to the CPython runtime to improve the performance of our workload. We’ll start with a high level overview of our architecture which isn’t atypical for a Python web application and see opportunities and challenges that has provided for optimization. Then we’ll go deep down the rabbit hole and look at common hot spots in the Python runtime and the results we’ve had in reducing the overhead of them. Along the way we’ll look at both targeted optimization opportunities and classic techniques such as inline caching, a JIT compiler, and leveraging type annotations for performance. We’ll cover techniques that we’ve proven successful, and ones that are still experimental. We’ll see how these can be applied to the Python runtime and what are the performance results of doing so: overall we’ve seen a 20-30% improvement in our production workload and up to 7x improvement on benchmarks. Slides: https://www.viehland.com/PyCon_2021.pdf
Watch
TALK / Adam Breindel / Dask-SQL: Empowering Pythonistas for Scalable End-to-End Data Engineering
Few things are more frustrating -- or inefficient -- than having a team of brilliant Python folks get stuck at the initial "get the data" stage of a project, because that data is "trapped" in a Hive/Spark-based datalake or requires complex SQL queries to assemble. Let's get unstuck, with dask-sql! PyData tooling and Dask are immensely popular in data pipelines, but the beginning stages of those pipelines -- often involving SQL data extraction from enterprise datalakes -- have traditionally required Java/JVM-based tools, such as Apache Spark. That changed in the past year, with the release of dask-sql. Dask-sql empowers Pythonistas with little or no knowledge of the JVM/Hadoop world to create end-to-end data projects. In this talk, we'll explore how we can use Python and dask-sql to perform SQL data/feature extraction from datalakes and Hive tables. We'll see how we can immediately refine and use that data for machine learning, analytics, or transformation workloads with our favorite PyData tools. We'll also discuss the design of dask-sql: an innovative project that combines battle-tested SQL optimization from Apache Calcite, scalable dataframe operations via Dask, and integration to the enterprise-standard Hive metastore data catalog. Slides: https://github.com/adbreind/pycon2021-dask-sql
Watch
TALK / Tobias Kohn / The Road to Pattern Matching in Python
Pattern matching is a great and proven tool for programmers. However, can we also assimilate and integrate it into Python? This talk tries to give an answer and discusses the rationale and ideas behind the recent "pattern matching" PEPs. Processing structured data has sparked ever more powerful programming tools. Python's objects and classes, for instance, have proven themselves to be particularly versatile and form part of the backbone of the language. Constructing or building new objects---including built-ins such as lists, tuples or dictionaries---abounds in any Python code. In contrast, testing the structure of data and extracting specific elements is often rather cumbersome, requiring the frequent use of built-in functions like isinstance, len and getattr. Pattern matching addresses this issue by introducing a new paradigm to de-construct data, complementing existing tools. It can be thought of as an extension of Python's iterable unpacking to arbitrary objects. However, it does so in a 'safe' way, ensuring that objects have the necessary structure to proceed with unpacking elements and attributes. The objective of this talk is to give you an overview of why pattern matching matters and what it really is. You will gain a deeper understanding of the core concepts that make up pattern matching, as well as the design decisions and ideas behind the recent "pattern matching" PEPs. However, this talk will not provide an introduction on how to use pattern matching in your code, nor is it about the intricacies of the implementation. If you are a Python programmer, have heard of the new pattern matching features and are wondering what it is all about, then this talk is for you.
Watch
TALK / Alexander Hultnér / Intro to Pydantic, run-time type checking for your dataclasses
Want static type checking in run time? Want to use standard python type annotations? Want compatibility with standard python dataclasses? Then it sounds like pydantic is something for you. Pydantic offers a pythonic way to validate your user data using run-time enforced standard type-annotations. This talk focuses on how Pydantic can be used with web APIs to simplify many parts regarding user input validation. I’ve previously back in early 2018 built a similar solution to Pydantic based upon standard dataclasses for a large B2B SaaS application built with flask. When I left that project I was briefly considered rebuilding it as open-source but while doing my research I discovered Pydantic’s powers which I had put in my keep tabs on the list when it was in a much earlier stage, but at this point, it had evolved to a polished library and a perfect companion for JSON-based APIs. Slides: https://slides.com/hultner/pycon-us-2021
Watch
TALK / Reuven M. Lerner / When is an exception not an exception? Using warnings in Python
If your code encounters a big problem, then you probably want to raise an exception. But what should your code do if it finds a small problem, one that shouldn't be ignored, but that doesn't merit an exception? Python's answer to this question is warnings. In this talk, I'll introduce Python's warnings, close cousins to exceptions but still distinct from them. We'll see how you can generate warnings, and what happens when you do. But then we'll dig deeper, looking at how you can filter and redirect warnings, telling Python which types of warnings you want to see, and which you want to hide. We'll also see how you can get truly fancy, turning some warnings into (potentially fatal) exceptions and handling certain types with custom callback functions. After this talk, you'll be able to take advantage of Python's warning system, letting your users know when something is wrong without having to choose between "print" and a full-blown exception. Slides: https://speakerdeck.com/reuven/when-is-an-exception-not-an-exception-using-pythons-warnings
Watch
TALK / Nina Zakharenko / More Fun With Hardware and CircuitPython - IoT, Wearables, and more!
Learn about programming hardware with Python and advanced uses of CircuitPython by walking through exciting demos of real-world projects in action. Advanced components like buttons, sensors, and screens bump up the fun and the interactivity of your project. Level-up your hardware skills in this fast-paced talk! CircuitPython is the education-friendly fork of MicroPython that's been steadily rising in popularity as new releases increase stability, reliability, and speed. CircuitPython allows Python enthusiasts to quickly learn about hardware projects without having to learn something completely brand new. Given the rise in popularity, the Python community is quickly becoming familiar with the basics of CircuitPython. In fact, all attendees of PyCon US in 2019 were given a CircuitPython-compatible CircuitPlayground Express device with LEDs, speakers, sensors, and more, all usable without the need of learning to solder. If you're interested in doing more with hardware, this talk will point you in the right direction of where to go next. We'll start with choosing the right device for the scope of your project. Next, we'll scratch the surface of working with electronics -- what's a circuit? What are good resources for learning to solder? Lastly, I'll cover topics such as IoT, wearables, and adding interactivity to your projects with sensors, buttons, and switches with live demos of real-world projects I've created, along with sharing the build process and code for each. Viewers will finish the talk feeling confident about continuing their hardware journey across a range of project types. Slides: https://nina.to/pycon2021
Watch
TALK / Dustin Ingram / Secure Software Supply Chains for Python
One of the most powerful parts of Python lies not within the language itself, but within the robust ecosystem of open-source Python packages available to use along with it. The Python Package Index, the canonical repository for Python code, hosts nearly 300,000 different projects. However, integrating software from so many third-parties comes at a cost: how can we be sure it's secure? In this talk, we'll explore the common Python software supply chain, various ways in which such a supply chain can be attacked, as well as protected. We'll examine some tools and methodologies that help improve supply-chain security, and discuss the challenges and benefits these tools provide. Finally, we'll look at what fundamental improvements we can make to the overall ecosystem.
Watch
TALK / Paul Everitt / Static Sites with Sphinx and Markdown
Everybody knows Sphinx for documenting projects, Python and otherwise. But few think of Sphinx for the rest of a website. Why? Because Sphinx traditionally means authoring with reStructuredText instead of Markdown. While RST is very powerful, it's a bit quirky, and nowhere near the popularity of Markdown. But with the arrival of full Markdown support MyST, and with static site generators having a renaissance, it's time to give Sphinx a second look. Sphinx is an "information-rich" static site generator, with rich linking and many other features for authoring a knowledge base. This talk introduces Sphinx for websites, shows enabling MyST for Markdown, and compares what it has to offer versus other approaches. We’ll do a light treatment of customization. All the material in this talk comes from a published tutorial.
Watch
TALK / Jonathan Striebel / Using Declarative Configs for Maintainable Reproducible Code
Wondering how to keep your application config from getting outdated? Looking for a way to future-proof it in a backwards-compatible manner, keeping previous versions reproducible? Join this talk, we’ll share how declarative configs can be leveraged to make your code maintainable and reproducible at the same time. Therefore, an overview across the application config landscape is given – from inputs as cli-args, env-vars, and config-files, to their representations in code, covering serialization & deserialization, type-safety with config-schemas and evolutions. We’ll recommend cherries to pick for a maintainable and expressive declarative config system. All code examples are available at https://github.com/jstriebel/declarative-configs 00:18 *Introduction & Problem Domain* https://scalableminds.com https://webknossos.org https://twitter.com/jostriebel 03:02 *Goals: Maintainability & Reproducability* *Declarative Configurations and their Pythonic Representations* 04:16 Toy Experiment 05:07 Declarative Configuration Exctraction 06:08 Input Formats, Representations & Deserialization https://typer.tiangolo.com https://www.attrs.org https://cattrs.readthedocs.io 08:49 Landscape Overview Blog Post comparing attrs, dataclasses & pydantic: https://stefan.sofa-rockers.org/2020/05/29/attrs-dataclasses-pydantic *Code Examples* 10:10 Toy Example 11:08 Split Configuration 13:46 Type Checking https://mypy.readthedocs.io/ https://nbqa.readthedocs.io 15:15 Complex Example with Nested Configurations 18:45 Evolution of Old Configurations *Recap & Summary* 20:15 Schema Versions & Evolutions 21:04 Experiment Tracking 21:34 Summary Slides: https://speakerdeck.com/jstriebel/declarative-configs-for-maintainable-reproducible-code
Watch
TALK / Kevin Kho / Large Scale Data Validation (with Spark and Dask)
Data validation is checking if data follows certain requirements needed for data pipelines to run reliably. It is used by data scientists and data engineers to preserve the integrity of existing workflows, especially as they get modified. As an example, extreme machine learning predictions can be stopped from being displayed to application users if a new model is bad. Missing data can be flagged if it has the potential to break downstream operations. As data volume continues to increase, we will examine how data validation differs between a single-machine setting and a distributed computing setting. We will show what validations become more computationally expensive in Spark and Dask. For large scale data, there is sometimes also a need to apply different validations on different partitions of data. This is currently not feasible with any single library. In this talk, we will show how we can achieve this by combining the strengths of different frameworks. To demonstrate the data validation journey, we'll go over a fictitious case study. The data will start small, and we'll apply Pandas-based validations with Pandera and Great Expectations while discussing the pros and cons of each. As data size increases, we'll go over in detail the pain points of transitioning to a distributed setting. We'll show one way to reuse the same Pandas-based validations on Spark and Dask by wrapping them with Fugue. Slides: https://drive.google.com/file/d/1x3w4pfk8PVw1dcy1717Qi-_kt67xQj-S/view?usp=sharing
Watch
TALK / Luciano Ramalho / Protocol: the keystone of type hints
The static type system supporting type hints in Python is becoming more expressive with each new PEP, but PEP 544--Protocols: Structural subtyping (static duck typing) is the most important enhancement since type hints were first introduced. The typing.Protocol special class lets you define types in terms of the interface implemented by objects, regardless of type hierarchies, in the spirit of duck typing--but in a way that can be verified by static type checkers and IDEs. Without typing.Protocol, it was impossible to correctly annotate many APIs considered Pythonic, including many functions in the standard library itself. In this talk you will learn the concepts and benefits of static duck typing, through actual examples of increasing complexity taken from type hints of standard library functions in the official typeshed project. Slides: https://speakerdeck.com/ramalho/protocol-keystone-of-python-type-hints
Watch
TALK / Maggie Moss / Gradual Typing in Practice
Type coverage in Python improves readability, finds bugs and supports tooling to improve security and improve developer efficiency. However, driving for type adoption on a rapidly changing codebase under active development can pose several challenges. This presentation will focus on how you can get meaningful results from Pyre as you move from just a few annotations to a fully typed codebase, and the guarantees we can make along the way. Then, I will discuss the approaches and tools we use to increase type coverage and “strictify” the Instagram codebase, one of the largest active Python projects. Slides: https://docs.google.com/presentation/d/1CbADIfFJhXIJxwEiQp8VOfJJBRtroQ2M9P9lVpAX0KM/edit?usp=sharing
Watch
TALK / John Belmonte / Your app is async so take advantage of it for development!
So your Python application is running under asyncio or similar framework-- congratulations! But what does that mean to you? More efficient use of compute resources? Simpler program structure and avoiding callbacks? It should mean even more. Cooperative multitasking opens new doors for inspecting the state of a program at runtime, which has valuable development uses. This talk covers how Python's async is useful for "development views"-- visualizing and interacting with the state of your running app-- and gives some working examples that run concurrently and don't require intrusive changes to program structure: remote REPL - open one or more interpreter sessions over HTTP to inspect and modify internal state of your app while it's running graphical visualizations - view custom graphical representations of state remotely from a web browser. These are written alongside the code being visualized, and have zero overhead when not observed. Keyboard and mouse input is possible too. What kind of visualizations? For a Python app embedded in a home robot, these might include a local map of obstacles; display of orientation, speed, power usage; low res. camera or depth camera feeds; representations of internal state machines deciding behavior; etc. Slides: https://docs.google.com/presentation/d/e/2PACX-1vQqzFgzYqKinBkIBMpe20Jv_6pyYN1iTkKrDrOQRlqoMSBg4SyWQRnkGc0hBgTxQN_UteHdDe_Cge5h/pub
Watch
TALK / Alon Nir / Getting an Edge with Network Analysis with Python
Networks are all around us. While Facebook and Twitter are the obvious examples, every time we shake hands, drive from point A to B, push code to github, check out a meetup or rate a show on IMDB, we’re participating in network activity. People, places, things and even ideas are inter-connected in innumerable networks, and these can have a great (yet sometimes inconspicuous) impact on our lives. The purpose of this talk is to introduce members of the audience to network analysis and its importance, and give them the basic building blocks for applied network analysis with Python. Slides: https://github.com/alonnir/PyCon-Us-2021-Talk/
Watch
TALK / Josh Izaac / What are quantum computers, and how can we train them in Python?
“Let me just go run this on my quantum computer.” Quantum computers aren’t what-ifs anymore — they are available now, and publicly accessible over the internet. And Python is rapidly becoming the language of choice for accessing and programming quantum computers, with Python SDKs available from Google (Cirq), IBM (Qiskit), and others. However, early quantum computers are small, noisy, and error prone. Simultaneously, it has never been easier to perform differentiable programming in Python; simply swap out NumPy for TensorFlow, PyTorch, or JAX, and you have the ability to differentiate and train the program itself. So what would happen if we attempted to combine the two? Using a mixture of real Python examples and illustrated diagrams, we show how to not only evaluate, but also differentiate small quantum programs directly on quantum hardware. By extracting the gradients, we can integrate these quantum programs directly into larger differentiable programs in Python, and train/optimize the full (hybrid quantum-classical!) program. Over the course of this talk, quantum-curious Python developers will see first-hand how quantum programming looks in Python, and get an idea of how (and when) it makes sense to take advantage of these novel hardware devices. Slides: https://iza.ac/pdf/pycon2021.pdf
Watch
TALK / Thomas Jewitt / An Introduction to FastAPI
With the skyrocketing popularity of Python as a language for web development, a wide array of tools now exist for the creation and documentation of REST APIs. Enter FastAPI, a quick, modern and extensible solution for rapidly creating RESTful services. This talk will explain the features, advantages, and utility of the FastAPI framework for developing comprehensive and useful APIs.
Watch
TALK / Meg Ray / Python: The Next Generation
Did you know that Python is being taught to more secondary students than ever before? Understanding of the landscape of Python in education, Learn practical, evidence-based strategies for teaching Python programming, and Get involved in the Python education community. Slides: http://bit.ly/mray-pycon21
Watch
TALK / Niels Bantilan / Statistical Typing: A Runtime TypingSystem for Data Science&Machine Learning
Data science and machine learning rely on high quality datasets for visualization, statistical inference, and modeling. However, the barriers to testing data processing, analysis, or model-training code are high, even with the extensive tooling that the python ecosystem offers, such as pandas, pytest, and hypothesis. To address this problem, in this talk I define statistical typing as a general concept describing a runtime typing system, which extends primitive data types like bool, str, and float into the class of statistical data types. By providing additional semantics about the properties held by a collection of data points, statistical typing enables us to naturally express types as multivariate schemas. It also enables us to implement schemas as generative data contracts, which serve to both validate data at runtime and generate valid samples for testing purposes. I'll use pandera, a pandas data testing library, to illustrate how statistical typing makes data testing easier by enabling you to validate real-world data with reusable schemas and isolate units of processing, analysis, and model-training code. Slides: https://pandera-dev.github.io/pandera-presentations/slides/20210515_pycon_statistical_typing.slides.html
Watch
TALK / Graham Bleaney, the_storm/ Unexpected Execution: Wild Ways Code Execution can Occur in Python
Every Python user knows that you can execute code using eval or exec, but what about yaml or str.format? This talk will take you on a walk through all the weird and wild ways that you can achieve code execution on a Python server (and trust me, I didn’t spoil the surprise by putting the weirdest ones in the description). The talk should be equal parts practical and entertaining as we work through both real examples of code execution vulnerabilities found in running code as well as absurd remote code execution exploits. The talk will end on a practical note by explaining how Facebook detects and prevents the exploit vectors we discussed, using an open source Python Static Analyzer called Pysa. All demos are available at: https://github.com/gbleaney/python_security Attendees are encouraged to download the demos and follow along at home. To get started using static analysis to detect the vulnerabilities discussed in this talk, check out: https://pyre-check.org/docs/pysa-quickstart/
Watch
TALK / Jenna Conn, Hannah Cline / Optimizing Data Retrieval with Python Celery
Whether for a CEO in a boardroom or a family creating next month’s budget, people need continual access to data. Problems occur when web applications used to visualize large datasets reach browser limits for the number of open connections that can be created, due to multiple queries. To overcome this limitation, presenters will discuss asynchronous methods of retrieving data, focusing on Python Celery. Celery task queues distribute data queries while the web application polls for results, creating a better user experience. Slides: https://noti.st/hustjl22/n3KqaM/optimizing-data-retrieval-with-python-celery
Watch
SPONSOR WORKSHOP / Eric Zhang / Huawei: Data Pre-processing in MindSpore
MindData - Data Pre-processing in MindSpore
Watch
SPONSOR WORKSHOP / Jérôme Vieilledent, Sümer Cip / Blackfire: Debugging Performance
Performance is a feature of any application. It should be tested, and cared for like any feature. Using the right tools to understand how code consumes resources, and how to match a given performance budget is key. That discipline can become seamless, from development, to test/staging and production. This workshop will give tips and tricks to master performance optimization on an example Django application.
Watch
SPONSOR WORKSHOP / Mike, Joe, Ryan Soley, Sri / Capital One
The machine learning lifecycle is complex and consists of many different stages; one of the most important being the model training process. Model developers must iterate fast in order to efficiently produce the best possible results. Experiment tracking and documentation for analysis, governance, and eventual model approval must not be tedious and time consuming. Rubicon is an open source data science tool that seamlessly captures and stores model training and execution information, like hyperparameters and outcomes, in a repeatable and searchable way. Rubicon’s git integration associates experiments directly with the model pipeline code to ensure full auditability and reproducibility for both developers, reviewers, and stakeholders alike. Rubicon offers an integrated dashboard that makes it easy to explore, filter and visualize experiments. Rubicon also exposes a process for highlighting and sharing experiments of interest with collaborators and reviewers.
Watch
SPONSOR WORKSHOP / Angel Riviera / CircleCl - CICD 101
Continuous Integration and Continuous Delivery/Deployment (CI/CD) concepts are increasingly adopted by many technology organizations and teams. CI/CD enables teams to establish processes that increase velocity, collaboration and quality of their codebase. CI/CD enables developer & operations teams to break down unnecessary silos and gain a deeper knowledge of their respective arenas. In this workshop the participants will be introduced to the basic fundamentals of Continuous Integration and Continuous Delivery/Deployment. Participants will learn the core principles of CI/CD and have the opportunity to reinforce what they’ve learned in a hands on workshop featuring the CircleCI platform. The workshop will demonstrate CI/CD build configuration, code commits, commit builds, code testing and packaging. The participants will leave with a hands-on experience and understanding of what it takes to CI/CD. Code Example Repo URL https://github.com/datapunkz/python-cicd-workshop
Watch
SPONSOR WORKSHOP / Anthony Shaw / Microsoft
In this workshop, we will talk through scalable Django architecture and how Azure services like load balancing, sharded databases, and functions can be used to scale a Django application from a few to lots of users. Slides: https://aka.ms/pycon-django-workshop
Watch
SPONSOR WORKSHOP / Seth Larson / Elastic
Search is a universal expectation of modern web applications. There are dozens of ways to implement search, but many leave you to implement the next level of features like indexing, tuning, suggestions, and analytics. Elastic App Search is a batteries-included solution for making your web applications searchable in minutes. Workshop Materials: https://github.com/elastic/pycon-2021-workshop-app-search
Watch
SPONSOR WORKSHOP / Zhiyi Ma, Shagun Sodhani / Facebook
Dynabench is a research platform for dynamic data collection and benchmarking. In the first part of this workshop, we introduce how we built the dynamic model evaluation pipeline for Dynabench using a collection of open source Python toolkits, e.g. pytorch, torchserve, boto3, etc, and how anyone in the community can easily submit their models into the pipeline using our open source toolkit, Dynalab. We believe that the evaluation cloud of Dynabench will transform how the community thinks of models and benchmarks, and continuously push the community to innovate beyond the current status quo from a completely new angle of thinking. The two key components in a multi-task Reinforcement Learning codebase are (i) Multi-task RL algorithms and (ii) Multi-task RL environments. Facebook AI developed open-source libraries for both components. Part 2 of this workshop describes how one can use these two libraries to get started with multi-task reinforcement learning. By the end of the talk, the audience should be able to design their own multi-task RL agent using the components from the MTRL library (https://github.com/facebookresearch/mtrl) and run them on environments provided by the MTEnv library (https://github.com/facebookresearch/mtenv). Slides: https://drive.google.com/file/d/1FYRiyEojjGyRP8xifFzsuv9zU0nRx5rX/view?usp=sharing
Watch
SPONSOR WORKSHOP / Paul Prescod / SalesForce
Whenever one builds a new application, there is a challenge in testing it at scale: where do you get sufficient data to generate a realistic “data shape?”. Snowfakery is a Domain Specific Language that builds on and deeply integrates with Python to excel at this task. This talk will describe how to build an “interpreter” for a YAML-based language in Python, including how we leverage Python-specific superpowers such as Jinja templating and easy dynamic loading. Slides: https://docs.google.com/presentation/d/17VmU_mW2lQGsj_ChJHB4nG9qmhGbVwEtqQTvweBFTAQ/edit?usp=sharing
Watch
SPONSOR WORKSHOP / Charlie Engelke/ Google
Get an overview of how to approach design of a serverless application architecture and how to create the compute, storage, and messaging parts of it and tie them all together. Examples are shown using several Google Cloud Platform tools: App Engine, Cloud Functions, Cloud Run, Cloud Storage, Firestore, and Pub/Sub, all built using Python. Suggestions on resources useful for going deeper in the topic are also provided. Slides: https://serverlessworkshop.dev/slides/pycon2021.pdf
Watch
SPONSOR WORKSHOP / Shay DeWael and Alissa Renz /Slack
New features and powerful tools make it easier and more intuitive to build custom Slack apps in Python. Slides: https://www.dropbox.com/scl/fi/29ok1tj5h5v4naxd3v1tv/PyCon-21_-Building-on-Slack-Platform.pptx?dl=0&rlkey=3mdypwn3hdq23o04557ut3h80
Watch
TUTORIAL / Mike Müller / Functional Python
I will use a JupyterLab for the tutorial because it makes a very good teaching tool. You are welcome to use the setup you prefer, i.e editor, IDE, REPL. If you also like to use a JupyterLab, I recommend conda for easy installation. Similarly to virtualenv, conda allows creating isolated environments but allows binary installs for all platforms. There are two ways to install Jupyter via conda: 1. Use Minconda. This is a small install and (after you installed it) you can use the command conda to create an environment: conda create -n pycon2021py39 python=3.9 Now you can change into this environment: conda activate pycon2021py39. The prompt should change to (pycon2021py39). Now you can install JupyterLab: conda install jupyterlab. 2. Install Anaconda and you are ready to go if you don't mind installing lots of packages from the scientific field. 3. Install the dependencies: * Jupyter Lab 2 conda install jupyterlab * more_itertools conda more_itertools * toolz conda install toolz 4. Hint: You do all this in one command: conda create -n pycon2021py39 python=3.9 jupyterlab more-itertools toolz You can create a comparable setup with virtual environments and pip, if you prefer. WORKING WITCH CONDA ENVIRONMENTS After creating a new environment, the system might still work with some stale settings. Even when the command which tells you that you are using an executable from your environment, this might actually not be the case. If you see strange behavior using a command line tool in your environment, use hash -r and try again.
Watch
TUTORIAL / Eric Ma / Magical NumPy with JAX
The greatest contribution of the age the decade in which deep learning exploded was not these big models, but a generalized toolkit to train any model by gradient descent. We're now in an era where differential computing can give you the toolkit to train models of any kind. Does a Pythonista well-versed in the PyData stack have to learn an entirely new toolkit, a new array library to have access to this power? This tutorial's answer is as follows: If you can write NumPy code, then with JAX, differential computing is at your fingertips with no need to learn a new array library! In this tutorial, you will learn how to use the NumPy-compatible JAX API to write performant numerical models of the world and train them using gradient-based optimization. Along the way, you will write loopy numerical code without loops, think in data cubes, get your functional programming muscles trained up, generate random numbers completely deterministically (no, this is not an oxymoron!), and preview how to mix neural networks and probabilistic models together... leveraging everything you know about NumPy plus some nearly-learned JAX magic sprinkled in!
Watch
TUTORIAL / Trey Hunner / Hands-On Regular Expressions in Python
What are regular expressions, what are they useful for, and why are they so hard to read? In this tutorial we will break down the regular expression syntax to better understand how they work. We will learn how to dissect regular expressions, how to use regular expressions in Python, and how to make your regular expressions more readable (yes it's possible... sort of). We will learn how to use regular expressions for data validation, data parsing, and data normalization. We'll also discuss when not to use regular expressions.
Watch
TUTORIAL / All Sweigart / A complete Beginner's Guide to Python by Making Simple Games
Excited about programming? Have you heard good things about Python? Now is the time to dive in and start learning how to program. This three hour tutorial covers the basics of the basics of Python. Programming is a wide and deep field, but you only need a taste. You'll learn about variables, expressions, loops, functions, and most importantly: what those words even mean to begin with. This is a tutorial for complete beginners (or those who want to start over again from the beginning.) This tutorial does not include computer science, machine learning, or brain surgery. By the end, we'll have create a few simple games (Guess the Number, Magic 8 Ball, and a Dice Rolling Simulator) as well as how to guide yourself through the next steps on your programming journey.
Watch
TUTORIAL / Ramon Perez / Dashboards for All
Dashboards are useful tools for data professionals from all levels and within different industries. From analysts who want to showcase the insights they have uncovered to researchers wanting to explain the results of their experiments, or developers wanting to outline the most important metrics stakeholders should pay attention to in their applications, these dashboards can help tell a story or, with a bit of interactivity, let the audience pick the story they’d like to see. With this in mind, the goal of this tutorial is to help data professionals from diverse fields and at diverse levels tell stories through dashboards using data and Python. The tutorial will emphasize both methodology and frameworks through a top-down approach. Several of the open source libraries included are bokeh, holoviews, and panel. In addition, the tutorial covers important concepts regarding data types, data structures, and data visualization and analysis. Lastly, participants will also learn concepts from the fields where the datasets came from and build a foundation on how to reverse engineer data visualizations they find in the wild.
Watch
TUTORIAL / Ryan S McCoy / From Spreadsheets to DataFrames
A spreadsheet is a wonderful invention and an excellent tool for certain jobs. All too often, however, spreadsheets are called upon to perform tasks that are beyond their capabilities. It’s like the old saying, 'If the only tool you have is a hammer, every problem looks like a nail.' However, some problems are better addressed with a screwdriver, with glue, or with a Swiss Army Knife. Python is described by some in the programming world as the Swiss Army Knife of programming languages because of its unrivaled versatility and flexibility in use. This allows its users to solve complex problems relatively easily compared with other programming languages and is one of the reasons why Python has become increasingly popular over time. In this tutorial, we’ll briefly discuss spreadsheets, signs that you might be living in “Excel Hell”, and then we’ll spend the rest of the time learning how to escape it using Python. In the first section, we’ll extend on what spreadsheet users already know about cells, rows, columns, and formulas, and map them to their Python equivalent, such as variables, lists, dictionaries, and functions. At the end of this section, we’ll do an interactive exercise and learn how we can perform a simple calculation, similar to one you might do in Excel, but instead using Python. In the second section, we’ll discuss (and attempt) how we can perform more complex tasks including web scraping, data processing, analysis, and visualization, by utilizing a few popular 3rd party libraries used including Requests, Pandas, Flask, Matplotlib, and others. In the last section, we’ll round out our discussion with a few important concepts in data management, including concept of tidy data, building a data pipeline, and a few strategies (and packages) to use when approaching various data problems, including demo using Apache Airflow. https://github.com/ryansmccoy/spreadsheets-to-dataframes
Watch
TUTORIAL / Bernát Gabor / Python Packaging Demystified
For most developers, Python packaging feels like a magical (and cryptic) black box. Apps and libraries use a variety of tools and have different packaging challenges. Once you start reading up on this topic, you come across many seemingly random components: setuptools, pip, poetry, wheels, pyproject.toml, MANIFEST.in, virtual environments, zippapp, shiv, pex, and so on. The sheer number of concepts to master can be overwhelming, leading many programmers to conclude that packaging in Python is a mess. Before you despair, join me in this tutorial session where you'll have a chance to learn how to package and publish/deploy your library and/or application through hands-on exercises. Topics include: How and why library packaging differs from application packaging Differences between a source tree/source distribution/wheel Differences between a build back-end and a build front-end (and why we even have this separation) Tools used for packaging your library Tools and techniques used to package your application Testing your package for correctness
Watch
TUTORIAL / Marysia Winkels / (Serious) Time for Time Series
Time to take Time Series seriously! From inventory to website visitors, resource planning to financial data, time-series data is all around us. Knowing what comes next is key to success in this dynamically changing world. And for that we need reliable forecasting models. While complex & deep models may be good at forecasting, they typically give us little insight about the underlying patterns in our data. In this tutorial, we'll cover relatively simple yet powerful approaches for time series analysis and seasonality modeling with Pandas. At the end of this session, you will be familiar with the fundamentals of time series analysis, how to decompose time series into trend, seasonality and error component, and how to use our insights to create simple but powerful models for forecasting.
Watch
TUTORIAL / Moshe Z / Python Unit Testing with Pytest and Mock
Writing unit tests for your code is widely accepted as a best practice. Learn how to use Pytest, the de-facto testing tool standard, and mock, the built-in library for creating mock objects, to write high-quality tests. Slides: https://github.com/dev-skill-up/pycon-2021-testing
Watch
TUTORIAL / James Bourbeau, Julia Signell / Hacking Dask: Diving Into Dask;s Internals
Dask is a popular Python library for scaling and parallelizing Python code on a single machine or across a cluster. It provides familiar, high-level interfaces to extend the PyData ecosystem (e.g. NumPy, Pandas, Scikit-Learn) to larger-than-memory or distributed environments, as well as lower-level interfaces for parallelizing custom algorithms and workflows. In this tutorial we’ll cover more advanced features of Dask like task graph optimization, the worker and scheduler plugin system, how to inspect the internal state of a cluster, and more. Attendees should walk away with a deeper understanding of Dask’s internals, an introduction to more advanced features, and ideas of how they can apply these features effectively to their own data intensive workloads.
Watch
TUTORIAL / Mariatta / Writing Documentation with Sphinx and reStructuredText
The success of Python and open source libraries is not separable from the availability of good documentation. Reading documentation is one of the first things a user of the open source library has to do. In the Python open source community, documentation is often written using reStructuredText markup language, and built with Sphinx. The official Python documentation and Python Enhancements Proposals (PEPs) are all written using reStructuredText. Being able to write documentation using reStructuredText becomes a necessary skill for any aspiring Python open source contributors and maintainers. Yet, reStructuredText itself can be seen as a barrier into contributing to open source, since it is not as straightforward as Markdown. Compared to Markdown, reStructuredText is not as widely adopted outside of the Python community. Don’t let this discourage you! Let’s break down this barrier! reStructuredText is not as complicated as you might think. You can learn it! In this tutorial, we'll go through various useful features of reStructuredText. You will learn how to create and build a documentation project using Sphinx. Not only will you learn a new skill, you can also confidently start contributing to open source projects by helping to improve their documentation. Slides: https://sphinx-intro-tutorial.readthedocs.io/
Watch
TUTORIAL / Andrea and Josh / Practical Deep Learning for Data Scientists
This tutorial is a chance to get hands-on with PyTorch and GPU Deep Learning (DL). It is specifically targeted toward attendees who may be familiar with the concepts of DL, but want practical experience. Familiarity with Python and typical ML packages (e.g. pandas, numpy, sklearn) is expected. At the end of this session, you will understand how to: Build some common DL architectures in PyTorch Evaluate and improve the performance Take advantage of more compute (and when you should do so) This will set you up to take advantage of interesting developments in the field and maybe even contribute your own!
Watch
TUTORIAL / Eyal Kazin / A Hands-On Introduction To Multi-Objective Optimization
11:10 - Pause video to work through Part 1 Optimising for multiple objectives is a non-trivial task, especially when they are in conflict. For example how can one best overcome the classic trade-off between quality and cost of production, when the monetary value of quality is not defined? In this hands-on Python tutorial you will learn about Pareto Fronts and use them to optimise for multiple objectives simultaneously. Multi-Objective Optimisation, also known as Pareto Optimisation, is a method to optimise for multiple parameters simultaneously. When applicable, this method provides better results than the common practice of combining multiple parameters into a single parameter heuristic. The reason for this is quite simple. The single heuristic approach is like horse binders limiting the view of the solution space, whereas Pareto Optimisation enables a bird’s eye view. Real world applications span from supply chain management, manufacturing, aircraft design to land use planning. For example when developing therapeutics, Pareto optimisation may help a biologist maximise protein properties like effectiveness and manufacturability while simultaneously minimising toxicity. I will provide a git repository with Jupyter notebooks with which you will apply lessons and tools learned to the simple Knapsack problem. Here you will program for filling a bag with packages with the objective of minimising the bag weight while maximising its content value. My objective is for you to gain a basic intuition for the technique, understand its advantages and shortcomings to be able to assess applicability for your own projects Slides: http://bit.ly/pycon21-handson-moo
Watch
SUMMIT / Education
Having high quality educational materials and informed educators has never been more important than now. While we cannot run the Summit with all the elements we had originally planned, The Education Summit committee is committed to ensuring that the Summit does happen this year. We will have a 2 hour format, held within Zoom. We will have strict schedules for talks so people can attend portions as able. Education Summit talks will be focused on K-higher education teaching practices, important developments for the Python teaching community, impactful case studies, etc. EDU SUMMIT SCHEDULE 0:00-5:16: Welcoming and setup 5:16 -11:16: Opening remarks about the Education Summit 11:16-34:45: Introducing Friendly (André Roberge) 34:45-54:48: Learner Personas for Domain-Specific Data Science Educational Materials (Daniel Chen) 54:48-1:14:37: Practice makes perfect -- but what kind of practice? (Reuven M. Lerner) 1:14:37-1:37:12: From room to Zoom: Strategies and technologies for teaching Python workshops online (Marley Kalt) Lightning talks: 1:37:12 - 1:46:41 Interdisciplinary Education case studies: applying Python to the Humanities, Social Sciences, & Arts (Chiin-Rui Tan) 1:46:41-1:53:28 Martian Math (Kirby Urner) 1:53:28-1:59:49: Closing
Watch
SUMMIT / Trainers
The Python Virtual Training Summit seeks to present how community organizations, trainers, meetup groups, classrooms, and employers have facilitated Python training and mentorship to a virtual environment. Virtual Training Summit talks will be focused on showcasing the strategies and methods that have been used over the past years to facilitate the formal and informal training needed in highly technical organizations. We are hoping to hear from groups that support aspects of training but may not see themselves within the education sphere. Following the same time notation, the schedule is as follows: +0:00 - 0:10: Welcoming and setup +0:10 - 0:30: Keynote: Virtual Teaching the New Way of Life (SherAaron (Sher!) Hurt) +0:30 - 0:45: Don't make these training mistakes! (Because I already did) (Reuven M. Lerner) +0:45 - 1:05: Coaching Junior Developers in a Remote World (Sebastiaan Zeeff) +1:05 - 1:25: Python Emergency Remote Teaching (Fernando Masanori) +1:25 - 1:45: Tools we need to teach project management (Sumana Harihareswara) +1:45 - 1:50: Lightning talks: Distilling Your Examples (Miki Tebeka) Computational Thinking for Creatives - Decoding Barriers to Entry (Tadeh Hakopian)
Watch
SUMMIT / Typing
Greetings Type Syntax Simplifications (Maggie Moss) 0:06 Validating JSON with TypedDict, trycast, and TypeForm (David Foster) 29:45 Type Variables for All (Pradeep Kumar Srinivasan) 54:20 Static Python: Types in Bytecode Compilation & Runtime (Carl Meyer) 1:25:16 Incremental Check in Pyre (Jia Chen) 1:50:55 Scaling Typeshed to 1000 Packages (Jukka Lehtosalo) 2:17:01 Catching Tensor Shape Errors Using the Type Checker (Pradeep Kumar Srinivasan, Matthew Rahtz) 2:43:06 Type Arithmetic (Alfonso Castaño) 3:11:10
Watch
SUMMIT / Maintainers
SPEAKERS Brian Douglas Getting Traction with GitHub Actions and Python Cheuk Ting Ho Oops, I Did It Again! When Your Deploying CI Pipeline Is Broken Juanita Gomez Spyder Says: Let's Get Millennial! Kati Michel Bringing Pinax Back to Life May Ireland Burnout: Identity & Emotion at Work Melissa Weber Mendonça NumPy Newcomer's Hour: an Experiment on Community Building Rose Judge Improve Your Git Commits in Two Easy Steps Sumana Harihareswara Researching the Leadership Gap for Legacy Projects Thibaud Colas Building Accessibility into Open Source Projects SCHEDULE LIVE panel discussion "Funding open source work" Alex Clark (Pillow) Eunice Chendjou (Open Teams) Gina Häußge (Octoprint) Sumana Harihareswara (Changeset Consulting) William Stein (SageMath) Moderated by David Charboneau (Open Teams) LIVE Q&A with the presenters of talks Moderated by: Alexandre de Siqueira (Berkeley Institute for Data Science, scikit-image) and Inessa Pawson (Albus Code, NumPy)
Watch
Lightning Talks 1
Agenda of Lightning Talks - Hosted by Dustin Ingram and Lorena Mesa André Roberge - Friendlier tracebacks Bernat Gabor - tox 4 is happening! Brett Cannon - Introducing the Python Launcher for Unix Cheuk Ho - What happens when the developer decided that your name is too short Deepa - A tale of mutability and recursion Jason C. McDonald - Code Review For Great Good Jürgen Gmach - How to Maintain Many, Many, Many, Many... Many Git Repositories? Mfon Eti-mfon - Queer Struggles in Africa: Fighting Hate With Python Phil Jones - What’s new in Flask Clint Cameron - How to take ownership of security in your Python code Daniel J. Dufour - Load Django Settings from Environmental Variables with One Magical Line of Code
Watch
Lightning Talks 2
Agenda of Lightning Talks - Hosted by Dustin Ingram and Lorena Mesa Aakanksha Chouhan - Moulding Data for ML Andres & Denny - PyCon Latam the conference you don't want to miss Cristián Maureira-Fredes - Python Chile and its first PyCon! Dia-ning Yudono - Parametrizing tests with unittest and pytest Gregory M. Kapfhammer - Committing to Writing Good Commit Messages: Supporting the Creation of Human and Machine-Readable Commit Messages with Python Rumanu - Save Sheldon, in 5 minutes! Sebastian Witowski - 9 Jupyter notebook tricks for your next Advent of Code Dhananjay Jindal - f-Strings: How cool are they? Grey Li - FastAPI Seems Good, so Why Don't We Build Something Similar For Flask? Jeremy Gibson - direnv will change your life... maybe. Jürgen Gmach - How to Maintain Many, Many, Many, Many... Many Git Repositories?
Watch