PyCon US 2022
2022
List of videos

Tutorial - Zac Hatfield-Dodds: Introduction to Property Based Testing
Has testing got you down? Ever spent a day writing tests, only to discover that you missed a bug because of some edge case you didn’t know about? Does it ever feel like writing tests is just a formality - that you already know your test cases will pass? Property-based testing might be just what you need! After this introduction to property-based testing, you’ll be comfortable with Hypothesis, a friendly but powerful property-based testing library. You’ll also known how to check and enforce robust properties in your code, and will have hands-on experience finding real bugs. Where traditional example-based tests require you to write out each exact scenario to check - for example, assert divide(3, 4) == 0.75 - property-based tests are generalised and assisted. You describe what kinds of inputs are allowed, write a test that should pass for any of them, and Hypothesis does the rest! ```python from hypothesis import given, strategies as st @given(a=st.integers(), b=st.integers()) def test_divide(a, b): result = a / b assert a == b * result ``` There’s the obvious ZeroDivisionError, fixable with b = st.integers().filter(lambda b: b != 0), but there’s another bug lurking. Can you see it? Hypothesis can! AUDIENCE This tutorial is for anybody who regularly writes tests in Python, and would like an easier and more effective way to do so. We assume that you are comfortable with traditional unit tests - reading, running, and writing; as well as familar with ideas like assertions. Most attendees will have heard "given, when, then" and "arrange, act, assert". You may or may not have heard of pre- and post-conditions - we will explain what "property-based" means without reference to Haskell or anything algebraic. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/177/2022-04-26T04%3A34%3A57.242477/PBT-intro-PyCon-2022.pdf
Watch
Keynote - Łukasz Langa
CPython Developer in Residence, Python 3.8 and 3.9 release manager, creator of Black, pianist, dad. ambv on Github. Opinions his own.
Watch
Keynote - Sara Issaoun
Sara Issaoun is a NASA Einstein Fellow and observational astronomer at the Center for Astrophysics | Harvard & Smithsonian. She is a member of the Event Horizon Telescope (EHT) collaboration, the global effort to image and study the environment close to supermassive black holes. Her research centers around the collection, calibration, and imaging of millimeter-wave radio observations of supermassive black holes. Supermassive black holes generate the highest energy processes in the known Universe, ejecting jets of plasma affecting galaxy environments on large scales, but their dynamics and emission mechanisms remain shrouded in mystery. She makes use of global networks of radio-telescopes to image and study the immediate surroundings of the supermassive black holes at the centers of our Galaxy and the galaxy M87.
Watch
Keynote - Peter Wang
Peter Wang is the CEO and co-founder of Anaconda, and helped found the PyData conferences and global community. Prior to starting Anaconda, Peter worked as a professional scientific computing and visualization software engineer. He has extensive experience in software design and development across a broad range of areas, including 3D graphics, geophysics, large data simulation and visualization, financial risk modeling, and medical imaging. Peter holds a BA in Physics from Cornell University.
Watch
Diversity & Inclusion Workgroup
Diversity & Inclusion Workgroup Panel Georgi Ker, Lorena Mesa, Anthony Shaw, Reuven Lerner
Watch
Keynote - Steering Council Panel
PYTHON STEERING COUNCIL Pablo Galindo Salgado Petr Viktorin * Thomas Wouters Gregory P. Smith Brett Cannon * Elected as prescribed in PEP 8016, the Python Steering Council is a 5-person committee that assumes a mandate to maintain the quality and stability of the Python language and CPython interpreter, improve the contributor experience, formalize and maintain a relationship between the Python core team and the PSF, establish decision making processes for Python Enhancement Proposals, seek consensus among contributors and the Python core team, and resolve decisions and disputes in decision making among the language. This keynote will update the community on current and future initiatives. Additionally, the Steering Council will address community questions collected prior to the conference.
Watch
Keynote - Naomi Ceder
Naomi Ceder earned a Ph.D in Classics several decades ago, but switched from ancient human languages to computer languages sometime in the last century. Since 2001, she has been learning, teaching, writing about, and using Python. She has attended every PyCon since the first one in 2003 and was one of the originators of the Poster Session, the Education Summit, the Intro to Sprints sessions, the PyCon Charlas, and the Hatchery. An elected fellow of the Python Software Foundation, Naomi is the immediate past chair of its board of directors. She is also co-founder of Trans*Code and speaks internationally about Python as well as community, inclusion, and diversity in technology in general. The author of The Quick Python Book and the Explore Python Fundamentals project series, she has also done corporate training in Python. In her spare time she enjoys sketching, knitting, and deep philosophical conversations with her dog.
Watch
Lightning Talks - Day 1
Lightning talks are a ~ 5 minutes long, on any topic of interest to other Python people. It doesn't have to be about something that you wrote, it can be something that you learned, or a technique you think other people will be interested in. 00:50 - Sameer Wagh - Data Science without Data? 2:25 - Cheuk Ting Ho - Cultural Shock - My 1st 7:25 - Łukasz Langa - COVARIANCE/CONTRAVARIENCE 11:45 - Seth M Larson - Truststore: OS trust stores in Python 15:50 - Pablo Galindo - Memray: hardcore memory profiling 20:17 - Graham Waters -The grief cycle, data security breaches, how we could code the future of America and the world 24:17 - Mason Egger - What is Synthetic Data 29:55 - Sophia Yang - Holoviz 34:00 - Shiray Lamba - Robyn; The fastest rust based python webframework server 39:00 - Chris May - Three steps to elegant code 43:50 - Chris Ariza - Getting to 100% coverage 49:15 - Indra - Jupyter ML model to production ML as a service
Watch
Lightning Talks - Day 2 AM
Lightning talks are a ~ 5 minutes long, on any topic of interest to other Python people. It doesn't have to be about something that you wrote, it can be something that you learned, or a technique you think other people will be interested in. 00:45 - Jeff Weiss - Teaching Python for Community Outreach (Note: Video begins at ~2:40:00) 06:16 - Jessica David - How staying away from one word can change everything 10:59 - Roy m Mezan - Biometric attack 15:28 - Gajendra Deshpande - Security Considerations in Python Packaging 20:32 - Diamond Bishop - Scaling PyTorch Models in Prod 25:44 - Manabu Terada - Our Challeng to spread Python community w/ covid in Japan 30:59 - Jay Miller - DevRel: showing your company skills 36:29 - Jack Lee - Non-trivial applications of binary search 41:11 - Henry Schreiner - Scikit-hep: developer pages a guide for modern package development 46:44 - Chrisjrn - STOP RUNNING YOUR TESTS
Watch
Lightning Talks - Day 2 PM
Lightning talks are a ~ 5 minutes long, on any topic of interest to other Python people. It doesn't have to be about something that you wrote, it can be something that you learned, or a technique you think other people will be interested in. 00:35 - Christian Maureia Fredes - Python en Espanol 05:35 - Mario Munoz - My First Pycon: Reflections 10:20 - Georgi Ker - Open source is a walk in the park 14:30 - Bence Nagy - Lint your code, repo, playlist, and fashion sense 19:49 - Mark Shannon - Help us speed up Python with benchmarks 23:35 - Larry Hastings - Correlate your data with Correlate 27:39 - Rich Taggart - The importance of effective concise communication 35:38 - William Woodruff - Securing your PyPI account 40:18 - Alexa Lindberg - Generating recipes w/ GPT-2 & Python 20:53 - Srinivas Bontula - Managing transitive dependencies for Django 50:25 - Adrian - When to rewrite in rust
Watch
Lightning Talks - Day 3
Lightning talks are a ~ 5 minutes long, on any topic of interest to other Python people. It doesn't have to be about something that you wrote, it can be something that you learned, or a technique you think other people will be interested in. 00:23 - Pandy Knight - How to write a test case 05:09 - Shreya Batra - The Effects of Computational THinking 09:36 - Patrick Arminio - The fastest way to fetch the latest python version 11:57 - Ray McLendon - Not all data is created equal 16:24 - Geir Arne Hjelle - Reading PEPs 21:30 - Jonathan Helmus - Pip install Python? 26:07 - Jelle Zijlstra - PEP 688: Typing for the buffer protocol 29:30 - Nick Muoh - Post pandemic meetuup 33:25 - multiple speakers talking about Regional Python Conferences
Watch
Typing Summit - at PyCon US 2022
Schedule of presentations: 0:00 - “New typing features in Python 3.10 and 3.11”, David Foster 17:51 - “Typing of Tensor Shapes and Type Arithmetic”, Alfonso Castaño 39:15 - “Too small for a PEP: minor new typing features in Python 3.11”, Jelle Zijlstra 1:00:43 - Extending PEP 647: User-Defined Type Guards”, Rebecca Chen 1:19:07 - “The future of TypedDict" and "Runtime uses for type annotations: A survey of tools”, David Foster 1:50:30 - “Runtime Annotations: PEP 563 & 649 Overview”, Carl Meyer 2:21:05 - “Beyond Subtyping”, Kevin Millikin 2:50:44 - “Panel: Typing-sig and Python Core Dev”, Guido van Rossum, Pablo Galindo Salgado, Thomas Wouters, Jelle Zijlstra, Pradeep Kumar Srinivasan, Matthew Rahtz
Watch
Talk - Brandt Bucher: A Perfect Match The history, design, implementation, and future of Python's...
Python 3.10 was released on October 4th, bringing with it a major new feature: "structural pattern matching". As one of the designers of the feature and its principal implementer, my goal is to introduce you to Python's powerful, dynamic, object-oriented approach to this long-established functional programming construct, and to explore ways that you might use structural pattern matching in your own code. Along the way, we’ll also dive into the history of the match statement, the design process behind it, how it actually works, and what we're already doing to improve it in Python 3.11 and beyond. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/116/2022-04-29T05%3A05%3A28.862819/PyCon_US_2022_Outline.pdf
Watch
Talk - Moshe Zadka: Best Practices for Continuous Integration in Python V02
It is now accepted that having continuous integration is a best practice for almost all non-trivial projects. But configuring CI for Python correctly is still hard. The solution space is big, many common configurations work around the bugs and limitations that existed in past CI systems, and there are few explanations about how to do it well. A good CI configuration concentrates on giving timely and accurate feedback to the developer. Whether it is using GitHub Actions, GitLab CI/CD, Jenkins, or something else, there are ways to configure the system to be more accurate and faster.
Watch
Talk - Mario Corchero: Finding penguins with a snake Linux features for a Python user
Python has APIs that allow developers to use Linux features that many are often unaware of. If you are a modest Linux/Unix user and want to learn some features of the OS through the APIs that Python offers, this is the perfect talk with you. We will speak about processes, named pipes, fork and exec, inodes, and signals, among others, all whilst seeing how to play with these through the APIs that the Python standard library offers us.
Watch
Talk - Calvin Hendryx-Parker: Bootstrapping Your Local Python Environment
There are simple, yet crucial, reminders that can differentiate an expert developer from a hobbyist. In this talk and live demo, developers will learn: - the importance of abiding by the Zen of Python; - where (and how) to install Python on your machine; - three rules to follow when installing Python; - proper version management with pyenv; - which Python add-ons (e.g.: virtualenv, pipx, piptools, Docker) can be used to make environments both repeatable and simple. Resources and Links - ActiveState: https://www.activestate.com/products/python/ - asdf: https://github.com/danhper/asdf-python - Anaconda: https://www.anaconda.com/products/individual - Brew: https://brew.sh/ - Chocolatey: https://chocolatey.org/ - Docker’s Python integration: https://hub.docker.com/_/python/ - PDM: https://pypi.org/project/pdm/ - pyenv setup: https://github.com/pyenv/pyenv#installation - pyenv setup for Windows: https://pyenv-win.github.io/pyenv-win/ - pipenv versions: https://pipenv.pypa.io/en/latest/ - piptools: https://github.com/jazzband/pip-tools/#readme - pipx setup: https://pypi.org/project/pipx/ - pipx: https://pypa.github.io/pipx/ - poetry: https://python-poetry.org/ - pyproject.toml: https://www.python.org/dev/peps/pep-0621/ - Python.org: https://python.org/ - virtualenv: https://virtualenv.pypa.io/en/latest/ - virtualenvwrapper: https://virtualenvwrapper.readthedocs.io/en/latest/ - Zen of Python: https://www.python.org/dev/peps/pep-0020/
Watch
Talk - Benjamin "Zags" Zagorsky: Handling Timezones in Python
Does your code use datetimes? There's a chance it has bugs that show up every night after 7pm! Timezones and daylight savings time are problems that plague most systems. Even if your system is designed for use in a singe timezone, you still need to be aware of timezones, both figuratively and literally to avoid bugs (Python datetimes that are correctly instantiated are referred to as "timezone aware"). This talk will cover: * Common mistakes with dates and datetimes in Python * How to use timezone aware datetimes in Python * Recipes for common datetime use cases * Recipes for Django Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/90/2022-05-05T20%3A14%3A33.164888/Timezones_in_Python.pdf
Watch
Talk - Greg Compestine: How to Succeed with Python Across the Enterprise
As a large, well-established company, Bloomberg focused on C++ as its primary language several decades ago. Python began as a scripting language for writing small utilities. An intern project several years ago showed that it was possible to integrate some C++ libraries with Python, making it possible to build domain-specific applications. An engineer with an affinity for Python got approval to form a small team to provide better support for the language. Engineers also formed small committees (or Guilds) to help promote Python across the organization by advocating for users, organizing meetups, actively monitoring messaging channels to help those with questions and problems, and writing lots and lots of documentation. Today, Python is used by more than 3,000 of the company's engineers. We actively support the Python Software Foundation and open source Python projects. Python is used to train new hires on the architectural paradigms used within the company. In less than a decade, we’ve gone from taking our first steps with the language to being one of the leading contributors to its evolution. Sometimes success can "just happen." However, most often changing a cultural dynamic takes a lot of hard work. And it is work that can be very rewarding. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/79/2022-04-28T22%3A02%3A15.039467/How_to_Succeed_with_Python.pdf
Watch
Talk - Amit Saha: Implementing shared functionality using Middleware
In this talk, I will provide an introduction to the topic of writing middleware for your web applications. Middleware is often simply brought in to an application's code base, without perhaps a thorough understanding of how they work. This talk will shed light on how middleware components work in popular Python web frameworks - Flask, Django and FastAPI. Armed with that understanding, you will learn how to write your own middleware as well as use standard community contributed middleware to implement vital functionality in your applications. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/6/2022-04-29T03%3A09%3A30.428341/slides.pdf
Watch
Talk - Nandita Viswanath and Sagar Aryal: In house to open source Stitching the past to the futu...
Ever had to deal with old code that is filled with thousands of repetitive code blocks and too many if statements? It gets harder when the original authors aren't around to explain what they were thinking. These pain points related to legacy software are often the motivation for many organizations to adopt robust open source solutions. Open source software is becoming more and more the standard in any tech stack. Knowing how to navigate the world of open source software and how to best implement it is a skill that is becoming ever more important for any software engineer. Python is one of the most popular languages when it comes to open source. In this talk, we hope to outline why this is and how you can take advantage of it in your software migrations.
Watch
Talk - John Reese: Open Source on Easy Mode
Open source is the lifeblood of the community, and we all stand on the shoulders of giants. But the responsibility, time commitment, and processes that come with maintaining projects on PyPI can be overwhelming, even for the best of us. With this talk, we'll see how the right tools and automation can cut out the overhead from running open source projects, and let you focus on the fun parts! We'll cover a wide range of topics, from packaging, metadata, and dependencies, to code quality, testing, and CI/CD, and finish with documentation, helping new developers, and reviewing contributions from the community. We'll look at high level concepts, modern best practices, and free tools available and how they make it easier than ever for new contributors to get started, while giving you confidence that their changes are safe and ready for production. Rather than just pointing to cookie cutter templates, we'll talk about the "why" behind these best practices and how they fit into common developer workflows. We'll also include links to references and popular developer tools, as well as a companion site with slides and a list of everything mentioned in the talk. Developers of all experience levels are welcome. Whether you're new to packaging and need guidance for your first release, or a seasoned package maintainer looking to simplify your workflow, this talk is for you! Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/104/2022-04-28T21%3A41%3A15.777027/open-source-easy-mode.pdf
Watch
Talk - Henry Fredrick Schreiner III: Building a binary extension
Support for binary extensions is an exceptional advantage of Python that is too often avoided for smaller packages with low developer resources. Binary extensions are used to achieve high performance for libraries like PyTorch, MyPy, and many thousands more. Binary extensions also allow access to a wealth of existing compiled libraries. Building your own binary extension is plagued by historically poor documentation, bad common practices, and many misconceptions. But it is actually easy to write extensions today that work seamlessly on all common developer platforms using modern libraries and continuous integration. We will take a look at packaging a binary extension from start to finish. This starts with pybind11 for C++ bindings, providing simple, header only builds and avoiding the need for a new language or pre-processor step. We will look at scikit-build for building, providing powerful CMake based builds with library search, multithreaded builds, and more. We will use PyPA's build to produce SDists. And we will use PyPA's cibuildwheel to produce binaries for all common platforms with minimal setup and simple CI code in GitHub Actions (but trivially movable to any other CI system). We will talk about how to automate common tasks, like using GitHub's Dependabot to keep cibuildwheel up-to-date while also ensuring reproducible builds. After this talk, it is our hope that you will no longer shy away from using compiled code in libraries, but will feel comfortable writing extensions to accelerate or advance your libraries functionality. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/57/2022-04-29T06%3A17%3A37.414348/CppCon2022_Building_Python_Extensions_AP1.pdf
Watch
Talk - Sebastiaan Zeeff: Demystifying Python’s Internals: Diving into CPython by implementing...
Diving into the CPython source code can feel daunting. Whether you want to start contributing or just want to get a better understanding of Python by exploring its source code, it’s often difficult to know where to start or what you’re missing. In my talk, I will show you around the CPython source code by implementing a new operator, a pipe operator. While doing so, I will discuss core parts of the internals, such as Python’s grammar, its syntax trees, and the underlying logic that will perform the operation. By the end, you will have a good idea of the moving parts involved in core language features. I will also take you through the steps necessary to make it all work. I’ll show you how I obtained a copy of the source code, regenerated the parser and token files, and how I compiled my modified version of CPython. I will also write and run tests to help me implement my changes. This should give you a mental framework that helps you while diving into more comprehensive resources, like the excellent Python Developer’s Guide. My talk is aimed at everyone who wants to explore CPython’s internals. You don’t have to be an expert in Python, although some affinity with Python helps with understanding the internals. I will also use C to implement some of the operator logic, but knowledge of C is by no means required. In short, if you’re interested in diving into the CPython source code, this talk is for you. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/71/2022-04-28T15%3A55%3A04.181398/demystifying_cpython_internals_HANDOUT.pdf
Watch
Talk - Josh Weissbock: Distributed Web Scraping in Python
Web scraping is easy to do in Python, but it quickly becomes tedious when routinely running large batch scraping jobs. This talk looks at how to build a distributed web scraper to reduce batch scraping job times and improve durability of your code as well as lessons learned & stories along the way. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/48/2022-04-29T04%3A28%3A21.308613/PyCon_2022_-_Distributed_Web_Scraping.pdf
Watch
Talk - Meredydd Luff: Building a Python Code Completer
Code completion is almost magic, and it makes writing code feel so good. But how does it actually work? I built a code completion engine from scratch - and in this talk, I'll tell you its secrets. We'll learn how Python parses and compiles code, what an AST is, and how we can use this knowledge to work out what a programmer might type next. And to prove it's not that complicated, I'll build a little code completer, live on stage, in about five minutes. I'll also talk about how code completion is like games programming, how we should broaden our thinking about "types" in Python, and how we can use information that isn't in your code to make coding even more satisfying.
Watch
Talk - Joseph Lucas: Serialization More than pickling
Have you ever needed to persist an object or instance? You probably researched serialization (converting an object to a byte-stream). The default for python is pickle, but there are other serialization options. In this talk, we'll explore some of those other options as well as their efficiency and security considerations.
Watch
Talk - Paul Kehrer/Alex Gaynor: Shipping Python Extensions in Rust Two Million Times a Day
For as long as Python has been around, a strength has been the ecosystem of packages written not in Python, but in C -- whether that's PIL, or numpy, or simplejson, or one of the thousands of others. But why C? Why not some other language? In the last several years, Rust has emerged as a serious competitor to C. This talk will explore how we went about the process of using Rust in the pyca/cryptography package, the challenges we faced, the successes we found, and what this means for your projects.
Watch
Talk - Ajinkya Rajput/Ashish Bijlani:Bad actors vs our community: detecting software supply chain...
Rapid prototyping or development is one of the most favourite features of the Python software ecosystem. This is possible due to efficient reuse of software libraries enabled by package managers such as PyPi. While PyPI maintainers have streamlined the process of publishing and distributing a package for developers, bad actors evidently exploit this infrastructure to propagate malware. For example, simply by publishing a malicious package with a name similar to a popular package, bad actors can exploit carelessness or inexperience of developers and elevate a simple installation typo to a remote code execution attack. In this talk, we will present technical details of our large-scale vetting system that analyzes millions of published software package versions for malware and other “risky” attributes, such as sudo access, source inconsistencies, abandonware, and unsafe installation hooks. We will share our experience while building this system, and present examples of new malware we have detected as case studies. Finally, we will introduce our free tool OSSIE, a Python PyPi package, for developers to audit project dependencies and notify them when dependencies turn malicious. The presented tool is extremely user friendly and is an attempt towards furthering usable security. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/115/2022-04-30T23%3A12%3A58.090937/pycon22.pdf
Watch
Talk - Liran Haimovitch: Effective Protobuf: Everything You Wanted To Know, But Never Dared To Ask
A talk of 40 minutes covering the following topics: 1. Introduction to serialization and its place in software engineering 2. Static typed vs dynamic typed serialization 3. Textual vs binary serialization: pros and cons 4. Popular serialization frameworks 5. Why Protobuf 6. Quick intro to Protobuf (just enough to get by) 7. Protobuf performance challenges and tradeoffs 8. Async synchronization: pros and cons 9. Field encoding: under the hood and what we learn 10. Managing the cost of abstractions 11. Data deduplication and compression 12. Field reuse: the whys and hows 13. gRPC: pros and cons 14. Protobuf over websocket or HTTP 15. Thank you Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/51/2022-04-29T22%3A04%3A43.392216/Effective_Protobuf.pdf
Watch
Talk - Nir Barazida: Dock Your Jupyter Notebook
To perfect your Jupyter Notebook craft, you'd want to make your work reproducible and shareable outside your local machine. In this talk, we will learn how to use Docker to build an isolated and pre-defined environment suited for ML project that runs smoothly on a remote machine.
Watch
Talk - Antoine Toubhans: Flexible ML Experiment Tracking System for Python Coders with DVC and St...
There are so many tools to do data science today that it can be difficult to navigate. Many of them are AI platforms that “do everything by clicking on a UI” and do not leverage pre-existing tools e.g., GIT for versioning, or good old python IDE instead of Jupyter Notebooks. On the other hand, ML engineering is not classical software engineering: in addition to the code, the data should also be versioned; in its essence, ML engineering is an exploratory work: one can not know if the model is going to work before testing it; there is no clear way to guarantee the quality of the trained model: the data-scientist has to play with it to make it “talk”. In this talk, we will build a fully customizable and complete system in python to track Machine Learning experiments. For the purpose of this talk, we will train a neural network (Tensorflow) to classify images between cat and dog, though, the main focus is on the tooling and not the ML algorithm. We will use: DVC (Data Version Control) to 1) version the data alongside the code with GIT 2) build training pipelines to orchestrate the python scripts 3) version experiments. Streamlit to build data exploration apps to play with the trained models. Both DVC and Streamlit are open-source libraries with python APIs. In the second part of the talk, we will focus on various ways of combining DVC and Streamlit. For instance, we will see how to build a Streamlit app that allows selecting any trained model tracked with DVC (provided its GIT commit), loading it, and testing it on given input images. I will provide code samples and live demos throughout the talk.
Watch
Talk - Trey Hunner: Python Oddities Explained
A number of Python features often seem counter-intuitive at first glance, especially when moving from another programming language to Python. Often what at first seems like a bug, will later reveal itself to be a misunderstood feature. During this talk we'll look at a number of Python's unique features and quirks and attempt to re-shape our mental models of Python to better match reality. By the end of this talk you'll have a deeper understanding of Python's rules behind objects, scope, and variables. Warning: this talk will include many Python head-scratchers so show up prepared to think on your feet!
Watch
Talk - Christopher Ariza: Employing NumPy's NPY Format for Faster Than Parquet DataFrame...
Over 14 years ago the first NumPy Enhancement Proposal (NEP) defined the NPY format (a binary encoding of array data and metadata) and the NPZ format (zipped bundles of NPY files). Those same formats, extended in a custom NPZ packaged with JSON metadata, can be used in Python to create a stable DataFrame storage format that can materially out-perform Parquet read / write times in a wide range of contexts. Unlike Parquet, all characteristics of a DataFrame can be encoded and all NumPy dtypes are supported. Implemented in StaticFrame, this format can take advantage of an immutable data model to memory-map full DataFrames from un-zipped directories of NPY. Given wide-spread use of Parquet files in data science workflows, a faster-than-Parquet file format can significantly reduce compute costs. I will begin this talk by introducing the challenge of serializing DataFrames, illustrating how nearly all stable encoding formats lack full support for all DataFrame characteristics. While the broadly-used Parquet format has been called a "gold standard" binary file format, its columnar representation will be shown to have limitations when used for encoding DataFrames. I will show how the NPY format, combined with JSON metadata, can be used to create a custom NPZ file with significant performance and compatibility advantages compared to Parquet. The details of this encoding scheme will be explained. I will close the talk by evaluating numerous read / write performance comparisons between Parquet (via Pandas) and NPZ (via StaticFrame), measured with a wide variety of DataFrame shapes and dtype compositions. I will share techniques used in implementing optimized Python routines for reading and writing NPY files, and demonstrate applications for memory-mapping complete DataFrames via the same NPY representation. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/29/2022-04-29T13%3A59%3A23.738312/pycon-ariza.pdf
Watch
Talks - Fred Phillips: Hooking into the import system
Import hooks and the import system in general is an under-used and under-documented resource within Python. This talk will introduce the audience to the import system, how it works, and how it can be adapted for their needs. We will build a simple import hook that can inspect what is being imported, and go on to demonstrate how we can use the import system to load Python modules from a database and how to reload files on disk immediately as they are changed.
Watch
Talk - Kelly Schuster - Paredes/Sean Tibor: Learn Python Like a 12 Year Old
Along the way to adulthood, we often lose that sense of wonder, enjoyment, and playfulness that we had as kids in our favorite school subjects. As adults, we can become better learners ourselves when we examine how kids learn coding with Python. In this session, we’ll talk about making thinking and coding visible, to the brain science behind how we learn new things, to the importance of playfulness in learning. We will share a variety of helpful tips to improve your learning whether you are new to Python or an experienced coder.
Watch
Talk - Pablo Galindo Salgado: Making Python better one error message at a time
Python 3.10 has been recently released and among many exciting new features, one of the biggest improvements is the inclusion of a whole new set of changes focused on improving the error messages across the interpreter and the general user experience when dealing with error messages. The new error messages have been one of the most welcomed features from very different sets of users ranging from Python teachers and educators, first-time learners, industry professionals and data scientists. In this talk, we will cover: What are the new improvements featured in Python 3.10. Exciting new changes and improvements that will feature in Python 3.11. How these improvements are useful to different sets of users from people learning Python to experienced programmers. How the new PEG parser has unlocked adding new custom syntax errors. How these improvements were implemented and what challenges the CPython core team faced to get them working reliably. How users can contribute to adding new error messages: what is the workflow, how the errors are reviewed by the core team and where to find resources and help. No matter who you are and what you do with Python, there is an improvement that will probably make you smile.
Watch
Talk - Olivier Breuleux: How to change Python (while it's running)
Wouldn't it be nice to be able to do live development in a running Python instance, using your favorite editor and structuring your code however you would like, seeing your changes immediately reflected in the middle of the program's execution? Thanks to Python's incredible runtime flexibility, this can be done. In this talk, we will explore how changes to source code files can be integrated into running programs, covering as many edge cases as possible and explaining the intrinsic limitations of the approach. We will also demonstrate Jurigged, a flexible and extensible working implementation of this system. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/26/2022-04-30T21%3A55%3A53.200731/How_to_change_Python.pdf
Watch
Talk - Jeremiah Paige: Intro to Introspection
Python has immensely powerful capabilities to find information about objects and running code; even code you did not directly create. Through examples I will show you where that information is kept, how to retrieve it, and how to make sense of it. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/5/2022-04-27T04%3A21%3A44.122463/Intro_to_Introspection.pdf
Watch
Talk - Pablo Alcain: Software Development for Machine Learning in Python
Machine Learning and Data Science code has its own set of challenges and peculiarities. When we write code to be used by Data Scientists or Machine Learning Developers we have to keep in mind constantly that every abstraction we use has to a) be compatible with a fast and easy exploration playground; b) allow for sensible checkpoints and optimizations; c) implement in a declarative fashion repeated queries and functions; and d) provide an abstraction level over all of the production code so it can be tracked and monitored seamlessly. In this talk we will provide general guidelines to approach this problem from a software engineering perspective, defining what should our entities be, how deep should our abstraction go and how to avoid some usual design pitfalls. We will apply all these guidelines to a specific and small end to end problem. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/2/2022-05-05T19%3A50%3A59.469490/Software_Development_for_Machine_Learni_tbMu5bk.pdf
Watch
Talk - Sam Scott: Why Authorization is Hard
Every team implements authorization in their app to control which users can do what. You'd think by now we'd have a standard set of best practices for how to build it. And yet, we don't! Especially in the Python ecosystem. Why is authorization so hard? Authorization is made up of three building blocks, each of which presents its own challenges: 1. Enforcing authorization is hard because it needs to happen in so many places. Controllers, database mappers, routers, and user interfaces all need to enforce authorization. As a result, there are limited off-the-shelf approaches that work in all cases. 2. Decision architecture is hard because you want to separate authorization from the application, but a lot of authorization data is application data too. A monolith can check its own database when it needs to make a decision, but what happens if you want to consolidate authorization into a separate service? Many off-the-shelf solutions focus on the separation – coordinating it and keeping everything in sync is challenging too. 3. Modeling authorization is hard too. It's easy to whip up the first use case — adding a roles table to your database works for a while. But it's hard to start simple and grow into your complexity as you need it. And it's hard to make something powerful that's simple to get started with. The options available typically err on one end of the spectrum or the other. In this talk, you'll learn the approaches for how to solve each of these areas and the associated tradeoffs. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/11/2022-04-29T22%3A32%3A41.095872/deck-fa3c23.pdf
Watch
Talk - Tadeh Hakopian: Programming Your Way up a Skyscraper Python in the Built World
Learn how Architects leverage Python in building projects to enable more design possibilities than ever before. Python is one of the fastest growing scripting languages in the Building design and construction field increasingly being used by professional in the industry. This talk will lead you through how Architects design, plan, edit and execute scripts with Python using different editing tools. Learn about how designers tackle the challenge of putting a building together with the aid of code including; using Python script to edit geometry, create algorithmic design for buildings, sort data lists, write content to software and much more. With Python you can unleash the potential in your projects so come and see what’s possible.
Watch
Talk - Vic Kumar: Writing Functional Code in Python
In this talk, we'll define exactly what functional programming is and how it helps us. We'll explore the main concepts from functional programming and see how we can apply them to our Python code going over some concrete examples.
Watch
Talk - Kevin Modzelewski: Writing performant code for modern Python interpreters
This talk will go into the latest efforts to speed up the Python language, and in particular how some things will be sped up much more than others. You may have heard best practices for Python performance before, but there are some new guidelines now, some old ones are no longer as important, and some are no longer true at all. Come to hear how the Python language is being optimized, and what you can do to best take advantage of these optimizations.
Watch
Talk - Aaron Stephens: Python for Threat Intelligence
For many of us, writing code isn't our job - but we do it anyways. We're not software engineers, and balancing the two isn't easy, but we make do. Because with just a few lines of Python, we can automate the boring, tedious work and enable ourselves to tackle the really hard problems. This is especially true in threat intelligence, where analysts help defenders make informed decisions to protect themselves and their businesses against the security incidents happening every single day. How do major hacks happen, who's responsible, and why? Come and learn about the world of threat intelligence, why we ask these questions, how we answer them, and - most importantly - the Python tools we've built along the way. See how we approach development on a team without any developers, balance process with productivity and enable success at scale. This one is for all the scripts out there helping us do our jobs, and for all the part-time developers who write them. Enjoy!
Watch
Talk - Christopher Neugebauer: Fast and reproducible tests, packaging, and deploys with...
“Works on my machine”: The cry of developers who can’t reproduce a bug because their development environment is incompatible with their deployment environment. It’s common because setting up clean environments is slow, tedious, and error-prone. Meanwhile, debugging errors introduced by incorrect environments is slow, tedious, and error-prone. Each step in your CI workflow theoretically only has inputs or outputs, but in reality, files can be left along the way by running tests or compiling extensions. These are side-effects, not inputs for subsequent steps in your workflow, let alone deployment, but if included they can affect correctness. You can solve this using “hermetic environments”: running every step of your workflow inside a fresh environment, so steps run truly independently of one another. You can do this manually with Docker, but it’s difficult: you have to understand which inputs are necessary for a step, which newly generated files are meaningful outputs, and what should be discarded. Pantsbuild uses hermetic builds automatically: it understands the inputs each step needs, what outputs it produces, and stores inputs and outputs inside a content-addressable database so it can rapidly build sandboxed environments for subsequent steps of your workflow. The result is a build process where every step is run in isolation, with only the inputs each process truly needs, and only true outputs made available to each subsequent step. Pants’ workflows are fast but verifiably correct — running against incorrect inputs is not a possible failure case. In this talk, we’ll explore how Pantsbuild enables truly hermetic builds. We’ll look at other approaches to sandboxing and how they compare to Pants’ approach, and how you can benefit from adding hermetic builds to your project. You’ll walk away being confident that “works on my machine” means “works everywhere”.
Watch
Talk - Cillian Kieran: Open Source, Python Based Tools For Data Privacy
In this talk, I make the case that the developer community has an opportunity to profoundly improve data privacy by shifting privacy upstream into the SDLC, where it belongs. I will share resources and lessons learned from my team's development of open-source, Python-based devtools for data privacy. Analogous to physical infrastructure, our digital infrastructure needs to be designed with trustworthiness at the forefront. As developers, we have often been left out of important design decisions about how technical systems actually process personal data. Typically, privacy risk is addressed reactively, and developers have to manually fulfill users' privacy requests across disparate data infrastructure. This reactive, burdensome approach to privacy pits trustworthiness against innovation. To build trustworthy systems at scale, we need devtools for proactive privacy, and the tools must fit within existing developer workflows. I will walk through the existing points of friction for developers today, the power of privacy embedded into the SDLC, and the tight bond between open-source and privacy. My team and I have learned that we can improve privacy at scale when the tools for privacy fit into developers' existing workflows and the infrastructure they use every day, including Snowflake warehouses, mongoDB databases, Redis session stores, and more. I will demonstrate what proactive privacy can look like for developers and data engineers: automatic flags for privacy risk in the CI pipeline, and streamlined privacy request fulfillment by traversing distributed data systems for custom data operations—such as deleting personal data while upholding referential integrity across databases. Open-source and privacy go hand-in-hand in offering developers and end-users digital infrastructure that they can trust. To tackle a problem as complex as modern privacy, the solution requires all of us to build shared, transparent, and community-informed privacy standards for technology worldwide. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/59/2022-05-01T18%3A53%3A23.697439/Fides-PyCon2022.pdf
Watch
Talk - Maria Jose Molina Contreras: Better Air, Better Health Creating an indoor air quality...
In the last couple of years, most people have been moved to a full working from home work-style, which made us realize benefits we were not aware of, but sadly some little inconveniences as well, like health related issues. In this talk, we will explore how to build a functional system to track the air quality, collect our own data using different sensors and implement a predictive approach to avoid future health problems. We are going to dive into the different setups to interact with air quality sensors using Python on microcontrollers and embedded systems, collecting your own data to evaluate different factors like humidity, temperature, CO2, particles, but that’s not all, also we will go into the implementation of a predictive machine learning (ML) model to predict Indoor CO2 levels and alerting us based on predictions before critical levels. The main idea of this talk is to show with a practical example how Python is a great option to build an indoor air quality monitoring complemented with a predictive ML model for Indoor CO2, while having fun building and monitoring their home. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/70/2022-04-29T13%3A37%3A41.013632/pyconusv0-770998_3.pdf
Watch
Talk - Miranda Auhl: Animating NFL play by play data using matplotlib's FuncAnimation()
Most of us have heard the saying, "A picture is worth a thousand words," but a movie builds context and a story, especially when conveying data! Data animations allow us to share more information and are far more engaging than static plots. In this talk, I will discuss the importance of animation in analysis and show how to create data animations using play-by-play RFID data from the 2018 NFL season. Within data science, we often use graphical representations of data to convey our analysis engagingly and succinctly. However, a static image does not always do justice to our findings and sometimes can miss important concepts entirely. When we introduce animation, we can show how location, statistics, etc., can change over time. Using this NFL play-by-play data, I will show how to take a static data plot and transform it into an animation using the matplotlib module. By the end of this talk, you will know what data animation is, how it works for matplotlib using FuncAnimation(), how to animate plots successfully using defined functions in conjunction with your iterative function, and how animation can improve your analysis. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/25/2022-04-29T02%3A51%3A33.350370/PyCon_3.pdf
Watch
Talk - Reuven M. Lerner: Understanding attributes (Or: They're not nearly as boring as you think!)
Attributes in Python, which we use dozens of times each day, seem boring, obvious, and not worthy of attention. But it turns out that they're key to the Python language: Every time you say a.b in Python, that little dot is hiding a lot of work, from searching across multiple objects to silently rewriting things. And it turns out that what happens with attributes, while not always obvious to developers, determines a great deal of behavior in the Python language. In this talk, I'll discuss what attributes are (and aren't), what Python does when you use a dot (.) in your code, and how you can take advantage of it. We'll talk about attribute lookup, about inheritance, and about methods vs. functions. We'll also look into properties, and how they allow us to have attributes that look like data but behave like setters and getters. Finally, we'll look at the descriptor protocol, which makes so much of Python's functionality possible, including the automatic insertion of "self" as the first argument in method calls. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/22/2022-05-01T04%3A08%3A16.996945/Presentation__Python_attributes.key.pdf
Watch
Talk - Paul Ganssle: What to Do When the Bug Is in Someone Else's Code
It's generally better to use libraries than to write your own code, but what happens when you run into an issue that is correctly solved by modifying the library code rather than your own code? What if you need to deploy a fix today, but you can't count on the upstream library applying the required fixes and getting a new release through your deployment system before your deadline? This presentation will cover various stop-gap strategies (of varying desirability) for dealing with this situation, including: -Working around the bug with wrapper functions -Monkey patching the offending methods or functions -Vendoring a patched version of the library into your application -Maintaining a forked version in your local package manager Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/86/2022-04-27T18%3A09%3A54.168666/What_to_Do_When_the_Bug_is_in_Someone__vLd1Knp.pdf
Watch
Talk - Jan-Hein Bührman: When to refactor your code into generators and how
Have you ever found yourself coding variations of a loop construct where fragments of the loop code were exactly the same between the variations? Or, in an attempt to factor out these common parts, you ended up with a loop construct containing a lot of conditional code for varying start, stop, or selection criteria? You might have felt that the end result just didn't look right. Because of the duplicated parts in your code, you noticed that the code didn't conform to the DRY (Don't Repeat Yourself) principle. Or, after an attempt to combine the variations into a single loop, with consequently a lot of conditional code, your inner voice told you that the resulting code had become too complex and difficult to maintain. This talk will show you a way out of this situation. It demonstrates how you can create a generator function that implements only the common parts of your loop construct. Subsequently you will learn how you can combine this generator function with distinct hand-crafted functions or building blocks from the standard library itertools module or the more-itertools package. As an example, imagine you'd need to implement some varying functionality based on the Fibonacci sequence. This talk shows you how it would look like before and after you've refactored it into a pipeline of generators. After having seen this pattern, you will recognize more quickly when this kind of refactoring helps you to create more maintainable and more Pythonic code. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/121/2022-04-29T17%3A23%3A39.884992/How_to_Refactor_into_Generator_Functi_eM5I4Ei.pdf
Watch
Talk - Daniel Pope: Why I reimplemented Trio in a game engine
Trio is an asynchronous I/O framework. Unlike other async frameworks in Python, Trio offers structured concurrency: the structure of a program's concurrency tasks are reflected in its code. The advantage of structured concurrency is that concurrent programs become easier to reason about, particularly if operations are cancelled. I reimplemented Trio-like structured concurrency in a game engine, Wasabi2D, and wrote some games with it. I found it to be an excellent fit that simplifies many game logic tasks. In this talk I'll talk about concurrency in video games, present structured concurrency with examples found in game logic, and draw parallels between I/O based concurrency tasks and those found in video games. The examples will also serve as a tutorial for writing games in Wasabi2D. Finally I will explore the differences between Wasabi2D and Trio's implementation of the structured concurrency concepts. By comparing the solutions we will see which elements of Trio are foundational to structured concurrency and which are specific choices for Trio's problem space. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/45/2022-04-27T13%3A04%3A31.199690/Why_I_reimplemented_Trio_in_a_game_engine.pdf
Watch
Talk - Sangarshanan: Build-a-Database with Python
This talk will help unlock the internal workings of a Database by breaking down the abstractions that make it. We will use Python as our weapon of choice to slowly discuss how you would go about building the different components of a database. 1) Talking to your Database: We start by building out an interface and a language that helps us communicate with our database. We will use Prompt toolkit to build a REPL & use a simple SQL-based language with basic regular expressions that can parse it to instruction to execute. 2) Working with Data: Now that we can communicate with our database using instructions. we start the actual work in building out the Datastore, We initially store all the data in a simple in-memory dictionary and then move to persist this data to disk. We now read the data from the disk to memory every time we query the data and write back the data to the disk but this makes things very slow :( This problem is our entry into the beautiful world of Indexes so by building a very basic Btree index to store references in memory to quickly access only what we require from the data on disk we can actually speed up our access times for basic row access queries from O(N) to O(1) where N is the number of rows in a table 3) Future: We can now proudly demo our new and polished database that can store data, persist it, and can run queries that are quite fast thanks to our Btree Indexes. We also discuss how this Database can be improved in the future by supporting full ACID Transactions, allowing concurrency, and handling locks The best way to understand something is to build it yourself :)
Watch
Talk - Kevin Kho/Han Wang: Comparing the Different Ways to Scale Python and Pandas Code
Fugue is an open-source unified interface for Pandas, Spark, and Dask that aims to let data practitioners define their compute workflows in a scale-agnostic manner. By decoupling logic and execution, users can code in a language that they are familiar with (Python, Pandas or SQL), and then choose an execution engine to run it on (Pandas, Spark or Dask). In this talk, we cover the transform() function, which lets a user execute a single function in a distributed setting. This simple interface can be incrementally adopted and allows data practitioners to be productive with distributed computing very quickly.
Watch
Talk - Mohammad Athar: D&D and G a daring tale of Dungeons and Dragons and also Graph
This talk will take the form of a story of adventurers who meet in tavern, and use graph algorithms to chase down a McGuffin. The goal is to develop an intuition-first understanding of common graph algorithms. Target audience is primarily programmers who want to review, or better understand graph algorithms. I will show how to convert mazes, social networks, and maps in to graphs. I will also cover eight algorithms- BFS, DFS, Dijkstra's, Hierholzer's, articulation points, centrality, Kruskal's algorithm, and the Louvain method. I will also provide practical (as practical as D&D can get) applications for these algorithms.
Watch
Talk - Jessica Temporal: Let's talk about JWT
JSON Web Tokens, or JWTs for short, are all over the web. They can be used to track bits of information about a user in a very compact way and can be used in APIs for authorization purposes. Join me and learn what JWTs are, what problems it solves, how you can use JWTs, and how to be safer when using JWTs on your applications. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/33/2022-04-24T16%3A05%3A50.711928/pycon-jwt.pdf
Watch
Talk - Richard Taggart: Leveraging a custom CPython data model for high performance...
Developing insights for problems encountered while building a high-performance microprocessor requires analyzing large amounts of data generated from a complicated design process. This is a Big Data problem. The four main challenges include: managing design components, reporting, linking data across time, and providing a reliable and scalable platform. The Design Data (DD) data model is a technological breakthrough to address the above challenges for integrated circuit design, analysis, and debug. The data model efficiently stores read-only design graph topology (e.g., inter-connected logic gates and wires), and derived analysis data (e.g., estimated signal delay and power usage) in a compressed binary file. DD is a custom domain-specific read-only binary data model with an extensive query API, which is implemented using C++, CPython, and Python method bindings. The C++ data model implementation enables efficient graph traversal, custom interactive "offline" analysis, and design graph visualization. Every data file may be tagged and stored using the compressible binary format, which facilitates comparing different versions across a multi-year project. They can compare past and current data sets to identify and assess specific design trends and failure modes. This data analysis workflow powered by Python improves the quality of a chip design by allowing engineers to focus on the hard problems. This talk will share our experience of incorporating modern Free Open-Source Software technologies into a complicated ecosystem of commercial toolchains and workflows for Electronic Design Automation (EDA). After this talk, I hope you will be inspired to experiment with integrating C or C++ and CPython bindings into your application workflow. I also hope this may help you think about different ways you might be able to integrate various methods of Data Science into your application domain.
Watch
Talk - Pandy Knight: Managing the Test Data Nightmare
Good data for testing can be a nightmare to manage. Sometimes, teams don’t have much control over the data in their systems under test—it’s just dropped in, and it can change arbitrarily. Other times, teams need to build their own data sets, either before testing or during testing. Inaccurate data can leave test gaps. Incorrect or stale data can break tests. Large data can consume too much time. Ugh! In this talk, we’ll cover strategies for defeating many types of test data nightmares: recognizing the difference between product data and test case data deciding when to prepare data statically beforehand or dynamically during testing using data to control how tests run or reflect product state hard-coding values versus discovering data in the system avoiding collisions on shared data The strategies we cover can be applied to any project in any language. After this talk, you will wake up from the nightmare and handle test data cleanly and efficiently like a pro!
Watch
Talk - Paul Vincent Craven: Harvest the power of the GPU for awesome special effects
This talk will show the impressive graphics you can create with OpenGL shaders. The Arcade library makes it easy to take many of the examples shown on the popular www.shadertoy.com website and run them under Python. We'll explain how shaders work, why they are so fast, and how to get started integrating shaders into your own Python programs.
Watch
Talk - Jason Fried: If an asyncio Task fails in the woods and nobody is around to see it, does i....
Its 3am you just got called about some asyncio production code that is failing either cryptically or silently. You discover that its using some terrible pattern and say to yourself "how did this ever work?". Come find out about some bad asyncio usage patterns and how to combat them in your projects. We will talk about useful patterns for bootstrapping and tear down, and give you the tools you need to improve the code you maintain.
Watch
Talk - Juan Gonzalez: Improving App Performance with Snapshots
Improving your app's performance is a complicated but essential task to handle growth. Often, the bottleneck for your system is your database. While there are many strategies on scaling your database infrastructure, there isn't a clear strategy for solving your performance issues by improving your Python code. Come to this talk to learn how a recently open-sourced library named "snapshot-queries" can help you write Python code that helps you scale without spending more!
Watch
Talk: Jigyasa Grover/Rishabh Misra: Sculpting Data for Machine Learning V02
In the contemporary world of machine learning algorithms - “data is the new oil”. For the state-of-the-art ML algorithms to work their magic it’s important to lay a strong foundation with access to relevant data. Volumes of crude data are available on the web nowadays, and all we need are the skills to identify and extract meaningful datasets. This talk aims to present the power of the most fundamental aspect of Machine Learning - Dataset Curation, which often does not get its due limelight. It will also walk the audience through the process of constructing good quality datasets as done in formal settings with a simple hands-on Pythonic example. The goal is to institute the importance of data, especially in its worthy format, and the spell it casts on fabricating smart learning algorithms.
Watch
Talk - Benjamin Bariteau: How We Migrated 3.8 Million Lines of Python 2 Without Interrupting Deve...
Migrating from python 2 to python 3 is not very easy, but it can be exacerbated by needing to port a large codebase modified frequently by many different developers. Our codebase was nearly 4 million lines of code modified dozens of times a day by hundreds of total developers. It is also business critical, containing a large portion of our most important code and data. We used several tools, techniques, and patterns to achieve the migration without disrupting day-to-day development and keeping regressions a minimum. In this talk, we’ll detail our migration steps, our usage of pre-commit hooks to reduce regressions to fixes, our usage of a reverse proxy to allow granular, low risk rollout for a webapp, and our migration of pickle to rollforward safe json for caching.
Watch
Talk - Reka/Ben: Actionable insights vs ranking How to use and how NOT to use code quality metrics
In this talk, we want to make two major points: Metrics can facilitate better conversation about code quality. They help you focus more on technical problems and improvements instead of personal preferences and organizational issues. Metrics can be misused very easily. Knowing their limitations is crucial. METRICS For each metric, we'll discuss: code examples in Python how to calculate interpretation (incl. some comparison across open source Python projects) actions limitations METHOD LENGTH The simple. You can calculate it without specific tools. First step: Extract functions. It shows well some general limitations of code quality metrics. CYCLOMATIC COMPLEXITY The old. Show the formula, but don't explain it in detail. :-) Extract functions. Remove redundant if conditions. It doesn't account for nested coding constructs. It ignores some modern language patterns. COGNITIVE COMPLEXITY The human. Calculation and interpretation: see https://www.sonarsource.com/docs/CognitiveComplexity.pdf Actions: Extract functions. Use shorthand structures. More Pythonic code is also more readable. Limitations: It ignores both the length of a linear block and the complexity of the expressions used in it. WORKING MEMORY Another aspect of human understanding. Calculation: see https://sourcery.ai/blog/working-memory/ Interpretation: The 7 +/- 2 rule of the human working memory. Actions: Extract functions, some more specific refactorings this metric rewards. Limitations: It ignores the structure. LIMITATIONS AND PITFALLS GENERAL They can be gamed. They easily encourage one-sided thinking and behaviour. SPECIFIC FOR CODE QUALITY METRICS Great as warning signs, not good as "proof of excellence". COMPOUND METRICS Giving a more versatile picture than a single metric. WHAT METRICS DON'T CAPTURE naming, consistent terminology, ubiquitous language (DDD) project structure correctness
Watch
Talk - Carlos Kidman: Testing Machine Learning Models
As ML/AI systems are becoming more prevalent, the need for setting quality standards and testing practices has become crucial. Testing these models goes beyond validation metrics like accuracy, precision, and recall. Instead, quality attributes like model Behaviors, Usability, and Fairness need to be tested and measured using exploratory and automated strategies. In this talk, we'll cover some of the risks and biases that can happen throughout the MLOps pipeline, demonstrate a few techniques to test a model's behaviors and fairness, and apply them against some real-world scenarios and state-of-the-art models. By the end, you will have new ideas and techniques that you can use to test your own ML/AI systems and approach testing these quality attributes from a user's perspective.
Watch
Talk - Bruce Eckel: Making Data Classes Work for You
This will be a example-driven presentation. The first set of examples looks at an int which should be restricted to a value from one through ten. First I'll look at the problems in the traditional approach, passing an int to a function and checking to ensure it is within range. Next I'll encapsulate the int in a (regular) class OneToTen, which allows the movement of the test into the constructor. Although this guarantees that objects will be created correctly, such objects are mutable so they can be modified to be invalid after creation. The solution is to use @dataclass together with the frozen=True option, and add a __post_init__ function to check the validity of the object once it's been initialized. Because such an object is invariant, it cannot be later modified into an invalid state. This ensures that the new type can only ever exist as a legitimate value. Next I'll use this technique to create a Person type that is composed of FullName, BirthDate and EmailAddress fields, each of which validates itself. Finally, I'll compose BirthDate using Day, Month and Year fields.
Watch
Talk - Bianca Rosa: Observability driven development
You're all happy developing your application but when it comes the time to send it to production and have the first customers testing, you might realize that whenever a bug is found it is just too hard to understand what is going on in a production environment - you often don't have access to the user data, the user account and struggles to reproduce the error and support your customer properly. The story is too familiar and probably happened to a lot of of us. In this talk, we will walk through techniques and things to consider when writing an application that is going to be supported in a production environment with an eye for observability by having searchable, consistent, and rich log messages. At first, the problem will be presented to the audience with a use-case scenario where a developer has no way of knowing what happened with a particular customer in a production environment if an API request fails. Opening the code for this request, we will add together the log messages that would've made it possible for the development to debug this problem properly - and then, talk through strategies to keep in mind during the early development of the code. We will also walk through log levels and how to use them properly, making sure the log messages are clean and understandable, how to take take advantage of log's extra fields to have metadata about the messages that we are writing, and the best way to make log messages easily searchable.
Watch
Talk - Bernát Gábor: How we standardized editable installs PEP 660 vs PEP 662
A Python Enhancement Proposal (PEP) is the method through which the Python community debates and adopts new features to the language. The same mechanism is used to standardize interfaces and methods used by the Python Packaging Ecosystem. The main difference is that while language PEPs are written mostly by core developers, packaging PEPs are written by members of the Python Packaging Authority (PyPA). How we build Python packages was standardized in 2016 through PEP-517 and PEP-518. Editable installs, while widely used and well known through the -e flag in pip, proved to be controversial, so it got left out of those proposals. It has taken another five years to reach a consensus, and I'm happy to say that -- through PEP-660 -- we now have a way for all build back-ends to support editable installs. Join me in this talk, where I'll tell a tale explaining how having competing PEPs and exhausting discussions -- while tiresome -- led to a better solution. Plus, you'll also understand the different options we considered and the solution we developed in the end.
Watch
Talk - A. Jesse Jiryu Davis: Why Should Async Get All The Love Advanced Control Flow With Threads
asyncio introduced many of us to futures, chaining, fan-out and fan-in, cancellation tokens, and other advanced control flow concepts. But Python threads were doing this stuff before it was cool! Come see Python threading techniques inspired by asyncio, Go, and Node.
Watch
Talk - Anthony Shaw: Write faster Python! Common performance anti patterns
This talk will show small, specific examples of Python code that can be refactored to be faster without compromising on readability. At the start of the talk, I'll explain how to set up a profiler to measure application performance and how to track improvements and regressions.
Watch
Talk - Peacock: Getting Started with Statically Typed Programming in Python 3.10
Since 2015, it has been possible to write Python like a statically typed language with typing modules and other features introduced in Python 3.5. This can significantly improve the development experience and review process. I have been using type hints in my work for several years and have been studying Haskell and TypeScirpt. I believe this session will be a stepping stone for "type hints newbies." What I will talk about in this talk: - Advantages of using Typing - Getting help from editors - Facilitating code reviews - How to get started with Typing - Argument and return types for functions - Using the standard Collections types - The difference between tuple and other types - Abstract and concrete types - Generics, user-defined types - Type Hinting Updates in Python 3.9 and 3.10 - (3.9) Type Hinting Generics In Standard Collections - (3.10) Allow writing union types as X | Y - (3.10) Parameter Specification Variables - (3.10) Explicit Type Aliases - (3.10) User-Defined Type Guards What is not covered in this talk - Basic Python 3 syntax - (Not required): Experience developing in statically typed languages Related contents: - A talk at PyCon JP 2020 (JA): https://pycon.jp/2020/en/timetable/?id=203955 - https://docs.python.org/3/library/typing.html
Watch
Talk. - Francesco Murdaca/Maya Costantini: How to Make Your Python Jupyter Notebook Standalone an...
Even though many developers (including data scientists) focus on their core problems when working on their experiments, one basic aspect can make these projects not reusable. We are not considering anything machine learning-related yet. One of the first steps during the development of a project is the selection of libraries or dependencies. When someone runs pip install package-name, they might not be aware that along with the library that is going to be installed, so-called direct dependency, many other dependencies will be installed on your machine, so-called transitive dependencies. Any change in one of those dependencies can break your experiment. It’s fundamental to have a way to state all the dependencies used, including the operating system, python interpreter, and hardware used to run a certain experiment. In this session, the speakers will present an open source JupyterLab extension for Python dependency management developed by the Thoth team. They will learn what resolution engine can be used (e.g. Pipenv, Thoth), the difference between these resolution engines. Moreover they will learn what to do in different scenarios emulating typical Jupyter notebook experiences to learn how to use the new extension. By the end of this session, attendees will learn the importance of reproducibility, how to use the Thoth Jupyterlab extension for Python projects and the benefits of a cloud resolution engine with respect to other existing ones. They will be able to run a tutorial using only a GitHub account and a browser as it will be run in a completely open cloud environment.
Watch
Talk - Deepak K Gupta: Speed Up Data Access with PyArrow Apache Arrow Data is the new API
Till now we’re used to accessing data over API’s and the API’s used to make sure that we get the data in the desired format which unfortunately requires data to go through serialization / deserialization cycle before being returned by the API What if we can change or arrange the data in such a way where it neither needs an API nor any serialisation / deserialization to access and understand the data that too using multiple programming languages If it sounds interesting then welcome to the world of Apache Arrow which defines a language independent columnar memory format which supports zero-copy reads for lightning-fast data access without serialization overhead. The python library of the same is called PyArrow and can be integrated with python specific libraries like pandas and numpy and can propagate the benefits to the same. Welcome to this talk where you’ll learn about the architecture, use cases and reasons for using Apache Arrow using PyArrow. I’ll share how to as well as some of the interesting statistics of the difference it makes in our day to day access & analytics. I’ll also talk about Apache Flight, which is a high performance wire protocol focused on bulk transfer for analytics. This Session NOT a tutorial about PyArrow but a set of interesting improvements, facts and statistics which can help you to decide whether it makes sense to explore for the work you’re doing.
Watch
Talk - Ryan Kuhl: GraphQL The Devil's API
While there are advantages to using GraphQL vs. traditional REST APIs such as descriptive queries, there are also a plethora of potential pitfalls, such as the n+1 query problem and idiosyncratic fickleness. We leverage data-loaders, async/await, dynamic query generation, and other performance optimizations in GraphQL to create a flexible, performant interface for our front-end services. Let’s do GraphQL the right way!
Watch
Talk - Roman Yurchak/Hood Chatham: Pyodide: A Python distribution for the browser
Pyodide is a Python distribution for the browser and Node.js based on WebAssembly. It includes a port of CPython 3.9 to WebAssembly/Emscripten, and makes it possible to install and run Python packages in the browser. Pyodide comes with a robust Javascript ⟺ Python foreign function interface so that you can mix these two languages in your code with minimal friction. We will walk through simple examples of how to run Python applications in the browser with Pyodide. We will also discuss the process of porting existing Python packages, including what makes a package suitable to port and what challenges are likely to arise. Some Criteria that Determine Suitability of a Project for Porting: Purely computational projects are simple to port to run in the browser. We are missing threading and multiprocessing, so you will need to be able to run single threaded. File system code mostly works unchanged. However, much of the UI and network access are very different inside the browser. Packages with a clean divide between doing computation and doing UI will be simpler to port, the UI parts may need to be rewritten or shimmed but the pure computation need not be.
Watch
Talk - Graham Bleaney/Pradeep Kumar Srinivasan: Securing Code with the Python Type System
Preventing security vulnerabilities often brings to mind heavyweight security tools. But what if it doesn’t have to be that way? What if you could use the concepts already built into Python to make your code incrementally more secure? In this talk, we'll see how Python types allow you to improve your project's security incrementally. First, we’ll show how simple type annotations by themselves can prevent security-impacting logic errors. Second, we'll see how you can prevent injection vulnerabilities such as SQL injection using a special type in your APIs (PEP 675). Next, we demonstrate how to leverage runtime type validation to securely deal with user-controlled data (such as HTTP requests). Finally, we show how types naturally enable powerful typing-based tools like Pysa and CodeQL to perform static taint flow analysis and catch complex vulnerabilities that span multiple functions. No security tool is a panacea, however, so we’ll also show you where typing and the tools that rely on it can fail. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/18/2022-04-28T19%3A35%3A09.209346/PyCon_2022_-_Typing_for_Security.pdf
Watch
Talk - Dustin Ingram: Securing the Open Source Software Supply Chain
Supply Chain Security: so hot right now. With the recently increased focus on securing software systems, there has been a incredible explosion of tools, methodologies, standards, best practices, and more. Given the sheer quantity, it's hard to keep track and stay informed: how can you know what's right for you? The same attributes that make open source software desirable to use also make it challenging to secure. When anyone can publish an open-source library, how can you decide what's safe to use? If anyone can contribute, how can you trust the maintainers? If source code and development is in public, how can we identify and respond to vulnerabilities when attackers will know about them as soon as we do? In this talk, we'll explore new tools and best practices that you can use today as open-source software user to improve the security of your software supply chain and trust in the ecosystem. We'll show how each of these serves a different purpose, and protects you from a unique way in which your software supply chain could be vulnerable. Finally, we'll discuss upcoming and potential improvements to the entire open-source ecosystem.
Watch
Talk - Jes Ford: The Model Review: improving transparency, reproducibility, & knowledge sharing...
Code Review is an integral part of software development, but many teams don’t have similar processes in place for the development and deployment of Machine Learning (ML) models. I will motivate the decision to create a Model Review process, starting from the principles of transparency, reproducibility, and knowledge sharing. MLflow is a useful Python package to help simplify and automate much of the tracking necessary to create detailed records of machine learning experiments. Much of this talk will be spent introducing this tool, and demonstrating the core MLflow Tracking functionality. I’ll discuss how my team is currently running a Model Review process for any ML models that we push to production, and how we use MLflow to streamline this work and learn from each other. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/68/2022-04-26T03%3A24%3A02.545732/model_review_slides_jesford.pdf
Watch
Talk - Mason Egger: Write Docs Devs Love: Ten Tips To Level Up Your Tech Writing
Think of that feeling you get when you follow an online tutorial or documentation and the code works on the first run. Now think of all the hours spent wasted following broken, outdated, or incomplete documentation. From our favorite tutorials to bad product docs we all consume technical writing. Tutorials, blog posts, and product docs help developers learn new things, build projects, and debug issues. But what makes one tutorial better than another? In this talk I'll discuss how you can write the documentation that developers love and I’ll share 10 tips and tricks to improve your technical writing. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/173/2022-04-28T16%3A12%3A34.131557/Write_Docs_Devs_Love.pdf
Watch
Talk - Tetsuya Jesse Hirata: Productionize Research Oriented Code By Python
Target audiences might be python engineers who is involved with R&D, data science, AI/ML projects, or data oriented projects. 【Introduction】 - Background - Definition of Research Oriented Code and Production Code - Differences between Research Oriented Code and Production Code 【Main Talk】 Four steps to productionize research oriented code 1. Understand the code through code reading and code documentation 2. Modularize the code into preparation code, pre/post-process code, calculation code based on the code documents 3. Refactor the code with test code 4. Make them products 【Summary】 - Summarize the four steps to productionize research oriented code - After making the code products, improve performance and monitor the behaviors of production code Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/77/2022-04-27T13%3A38%3A31.337676/pycon_us_2022_with_note.pdf
Watch
Talk - Maryanne Wachter: Will it Blend? Writing A Custom Constraint Solver for Blender with Cython
Have you ever wanted to prove Python naysayers wrong in that you can have both fast and approachable code with Python? With a few adjustments to standard Python code, you can harness the power of Cython to vastly improve Python performance, while maintaining the look and feel of a traditional Python package. In this talk, common pitfalls in developing with Cython will be discussed in the context of how Cython was used to bring powerful and fast optimization algorithms to a custom geometric form finding add-on for Blender. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/109/2022-05-01T04%3A40%3A53.983171/PyCon_2022_-_Will_It_Blend_.pdf
Watch
Talk - Brendan Collins: Who Said Wrangling Geospatial Data at Scale was Easy?
If you have ever worked with Census Data, you may be recalling nightmares of hours spent staring at data and finding it impossible to download, store, or convert to a sensible format to begin your analysis. And Census data is not even unstructured data! Geospatial Data comes in various formats - GeoJSON, Parquet, Shapefiles, GeoTIFF, etc. But what are the most efficient ways to convert the data into formats that are easy to understand, work with, transfer, and ultimately analyze? Then throw in petabytes worth of data and you hit the challenge of wrangling geospatial data at scale. This talk will walk through some of the best ways to handle geospatial data at scale, with a focus on: -The xarray-spatial library (https://xarray-spatial.org/) for raster-based spatial analysis. -The RTXpy (https://github.com/makepath/rtxpy) for GPU-powered spatial analysis. -Microsoft Planetary Computer (https://planetarycomputer.microsoft.com/) examples of geospatial data processing.
Watch
Talk - Padmaja Bhagwat/Manisha R: VigNET: An intelligent camera app that assists you...
What if you can understand your surroundings with just a click of a picture? The speakers have built an intelligent camera app that assists people with visual impairment in understanding their surroundings. This application takes camera input in the form of an image and attempts to answer questions related to the image. Simply put, it's a Visual Question Answering (VQA) app. The deep learning based application is built using a transformer based model called Vision Language Transformer (ViLT) which is both computationally fast and efficient, thus providing answers to users’ questions within a fraction of seconds. The application is integrated with speech-to-text and text-to-speech capabilities to enhance accessibility. This talk would mainly cover the following: * What is the Vision Language Transformer (ViLT) model? * Advantages of ViLT over traditional vision language pre-trained models * Best practices around modularizing the application into different services * Steps to deploy this deep learning based application on cloud (GCP) * How in-built python libraries helped in implementing and deploying such complex models (viz. ViLT) easily The entire code is open-sourced and the talk will provide a walkthrough of the steps to build your own visual question answering application. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/83/2022-04-29T03%3A09%3A38.147066/PyCon2022.pdf Demo: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/83/2022-04-29T03%3A05%3A47.986467/vignet_demo.mp4
Watch
Charlas - Cristián Maureira-Fredes/Denny Perez: Uniendo a las comunidades hispanohablantes de Py...
Python en Español tiene como objetivo ser el lugar de encuentro central de todas las comunidades de Python hispanohablantes, ayudando a superar la barrera de lenguaje que usualmente es el primer problema con el que muchas personas se encuentran a la hora de buscar documentación, experiencias, o consejos. El equipo de coordinación está formado por personas de distintos países y comunidades, y busca poder tener representación de cada país hispanohablante, y así entender las distintas realidades a la que la comunidad se enfrenta. Uno de los objetivos centrales, es poder compartir la información que la comunidad en general pueda aprender de la experiencia de todas las comunidades colectivamente, para que cada nuevo capítulo se vaya formando sin tantas dificultades. Finalmente, la centralización de recursos y eventos, facilita el acceso para toda la comunidad. Luego de esta charla, aprenderás de los esfuerzos, proyectos e ideas detrás de esta iniciativa, y sobre todo como poder sumarte.
Watch
Charlas - Marcelo Elizeche Landó: AyudaPy org De proyecto de fin de semana a movimiento ciudadan...
La charla trata sobre el proyecto AyudaPy.org que empezó como un experimento/proyecto de fin de semana y se transformó en una fuerza social en Paraguay que en muchos casos suplió la respuesta del estado, de sus orígenes, futuro y sobre los desafíos de convertirse de la noche a la mañana en project manager, mantainer y vocero de un proyecto Open Source replicado en varios países y sobre todo de sobrevivir al burnout de esta situación. Las tecnologías usadas en este proyecto: Python, Django, PostGIS, OpenStreet Maps Lista de temas a tratar - Como nace y el ambiente de crisis social en el cual se gesto el proyecto - Proyecto de fin de semana, idea simple ejecutada en el tiempo correcto - Aceptación del publico - De 1 developer a ~30 en tiempo record - Crecimiento exponencial y open source - La comunidad de voluntarios - Apoyo de la Cruz Roja - Carga emocional de un proyecto social y el burnout - Impacto del proyecto y forks en otros países - ~6k familias ayudadas - Lecciones aprendidas
Watch
Charlas - Alison Orellana Rios: Reconocimiento de figuras con Visión Artificial
Se verá la gestión de la librería OpenCV para el procesamiento de imágenes, con la que será posible seguir los pasos para identificar figuras y diferentes elementos mediante algoritmos de reconocimiento de imágenes aplicados a la determinación de formas. Verás el uso de OpenCV ya que esta librería te permite codificar rápidamente, ya que cuenta con una documentación oficial con muchos ejemplos, que ayudan a entender cada algoritmo en el procesamiento de imágenes, los pasos y su código para ver su uso o aplicación. Para ello, solo requiere una instalación, (que es rápida) y su posterior importación para comenzar a ver sus funciones. La visión artificial puede ayudar a mejorar el procesamiento de datos gráficos en diferentes áreas, tales como: industria, control de calidad, conteo de grandes cantidades de materiales, separación rápida de elementos, aseguramiento de la calidad, salud, determinación de áreas de interés y más, lo que permite mejorar el trabajo. y tiempos de producción, además de facilitar el tratamiento sin necesidad de interacción humana, lo que puede mejorar los resultados.
Watch
Charlas - Débora Azevedo: Software educativo: ¿que es? ¿como se hace?
Si desea incursionar en la tecnología en la educación, esta charla es para usted. Discutiremos las definiciones de tecnología y sus usos en la educación como un medio para generar conocimiento y el desarrollo de software educativo y sus etapas. Para el desarrollo de esas herramientas es fundamental pensar en las concepciones pedagógicas, que van a orientar todo el proceso de desarrollo. Posteriormente se presentarán más detalles de cada etapa de desarrollo de este tipo de software, que son la concepción, elaboración, finalización y viabilidad. En cada uno de ellos también se discutirán procesos y patrones de buenas prácticas. Se presentará el software educativo que desarrollé durante mi master, que tiene como objetivo ayudar a la alfabetización bilingüe (para lengua de signos brasileña y portugués brasileño escrito) de niños sordos siguiendo los patrones discutidos anteriormente. Terminaremos hablando de cómo la innovación debe estar presente en este tipo de herramientas, considerando la motivación para desarrollar estas aplicaciones y los impactos que puede tener en la educación. Finalmente, también discutiremos la accesibilidad y cómo podría haber problemas considerando nuestros propios prejuicios como desarrolladores de software, entre otras dificultades relacionadas con el proceso de desarrollo de software con fines educativos. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/128/2022-04-26T16%3A10%3A54.189378/PyCon_US_-_Charlas_2022.pdf
Watch
Charlas - Ana Cecília Vieira: Análisis exploratorio de datos abiertos para el fortalecimiento de...
Ven a aprender acerca del análisis exploratorio de datos abiertos con las librerías más populares de Python para la ciencia de datos. Además, conocer el nivel de transparencia de datos en América Latina. La charla va dirigida a las personas que tienen interés en el tema “ciencia de datos” e iniciaron sus estudios, sin embargo aún no comprenden cómo usarlos. Serán usadas las librerías Pandas y Matplotlib en el análisis, que será hecho en el Google Collab.
Watch
Charlas - Andres Pineda: Programacion Reactiva Navegando en el mundo de la asincronia con RxPy
A medida que nuestras aplicaciones van creciendo y estas se van poniendo más complejas, el performance pudiera verse impactado por las diferentes tareas que se ejecutan, algunas inclusive llegando a bloquear el “thread” donde estas están corriendo. La programación reactiva (ReactiveX) nos ayuda con esto, permitiéndonos escribir de una forma fácil instrucciones que se ejecutarán de forma asíncrona gracias a los operadores que pueden ser usados para crear, filtrar y unificar los diferentes flujos de datos de nuestro sistema, todo esto, manteniendo nuestro código flexible, legible y fácil de mantener. Esta presentación nos muestra que es la programación reactiva, en qué consiste y que nos permite hacer para nuestros programas Python puedan implementarla y así disfrutar de sus beneficios.
Watch
Charlas - Kin Gutierrez Olivares/Federico Garza Ramirez: Nixtla: Deep Learning para pronóstico...
El pronóstico de series de tiempo tiene una amplia gama de aplicaciones: finanzas, retail, salud, IoT, etc. Recientemente modelos de deep learning como ESRNN o N-BEATS han demostrado tener performance estado del arte en estas tareas. Nixtlats es una librería de python que hemos desarrollado para facilitar el uso de estos modelos estado del arte a data scientists y developers, para que puedan utilizarlos en ambientes productivos. Escrita en pytorch, su diseño está enfocado en la usabilidad y reproducibilidad de los experimentos. Para ello, nixtlats cuenta con diversos módulos: Data: contiene datasets de diversas competencias de series de tiempo. Models: incluye modelos estado del arte. Evaluation: posee diversas funciones de pérdida y métricas de evaluación. Objetivo: -Introducir a les asistentes a los retos del pronóstico de series de tiempo con deep learning. -Aplicaciones comerciales del pronóstico de series de tiempo. -Describir nixtlats, sus componentes y las mejores prácticas para entrenamiento y despliegue de modelos estado del arte en productivo. -Reproducción de resultados estado del arte usando nixtlats del modelo ganador de la competencia M4 de series de tiempo (ESRNN). Repositorio del proyecto: https://github.com/Nixtla/neuralforecast.
Watch
Charlas - Ariel Ortiz: Match case para principiantes
Python ha carecido durante mucho tiempo de un mecanismo de control de flujo condicional presente en muchos otros lenguajes de programación, algo que permita tomar un valor y compararlo de manera directa y sencilla contra varias opciones. El lenguaje C y sus derivados cuentan con la instrucción switch/case. Otros lenguajes tienen un soporte más sofisticado de pattern matching. Las formas tradicionales para lograr un comportamiento equivalente en Python no eran del todo elegantes. Una opción era escribir una cadena de expresiones if/elif/else. Una segunda opción era utilizar un diccionario con llaves asociadas a funciones. En general esto funciona adecuadamente, pero puede ser complicado de construir, entender y mantener. Después de varias propuestas fallidas para agregar una sintaxis tipo switch/case a Python, se aceptó finalmente una propuesta reciente para Python 3.10: structural pattern matching (búsquedas de coincidencias de patrones estructurales). Este esquema de pattern matching no solo hace posible realizar coincidencias simples de estilos de switch/case, sino que también admite una gama más amplia de casos de uso. En esta charla se mostrará cómo aprovechar en nuestros programas esta nueva facilidad.
Watch
Charlas - Marco Carranza: Estrategias para trabajar con datos a medida que estos crecen
Note: Captions start at minute 1:55 Hoy en día, los datos son cada vez más grandes, por lo que es casi imposible procesarlos en máquinas de escritorio. Para resolver este problema, han surgido muchas tecnologías para procesar todos datos utilizando múltiples clústeres de computadoras. El desafío es construir soluciones sobre estas tecnologías, requiriendo diseñar complejos pipelines de datos combinando múltiples tecnologías. Sin embargo, en algunos casos, no disponemos suficiente tiempo o recursos para aprender a usar y configurar una infraestructura completa para ejecutar un par de experimentos. Quizás seas un investigador con recursos muy limitados o una startup con un calendario apretado para lanzar un producto al mercado. El objetivo de esta charla es presentar diversas estrategias para procesar la data a medida que esta crece y puede ser procesada con los recursos limitados de una sola máquina o con el uso de clusters, utilizando tecnologías como Pandas, Pyspark, Vaex y Modin. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/72/2022-04-30T18%3A48%3A44.034748/Pycon_2022_-_Estrategias_para_trabajar_ZInYxdm.pdf
Watch
Charlas - Sofía Martin/Ariel Ramos/Liliana Hurtado/Enzo Juárez: Python + VPS Jupyter HUB/Notebook...
-Desde 2017 en el Norte Argentino, realizamos actividades con experiencias en tecnologia de impacto positivo para la sociedad, lo hacemos con las Comunidades Python Norte y Python Argentina. Enseñamos a los asistentes buenas practicas de uso en tecnologia y Software Open Source. -Combinamos 3 componentes para lograr un ambiente de tecnologia seguro y practico en estas experiencias piloto educativas, de manera remota, durante la pandemia COVID19. -Los 3 componentes: -Lenguaje de Programacion + Entorno de Trabajo + Infraestructura == Python + Project Jupyter + VPS (Jupyter HUB/Notebook). -VPS == Servidor Privado Virtual (Virtual Private Server) -Tuvimos en cuenta los conocimientos tecnicos basicos de los interesados, entonces decidimos implementar/instalar en un VPS todos los componentes necesarios (Python + Librerias + Plugins de Jupyter + Widgets), asi ellos aprenden directamente. -Iniciamos con programacion, luego con experiencias piloto programando Jupyters Notebooks para enseñar Matematicas, Fisica, Robotica, armamos los notebooks con lo justo y necesario de programacion, ayudandonos de Widgets y Graficas. -A medida que avanzamos, armamos Jupyter Notebooks en materias de No Calculo. -Logramos una buena practica y dinamica en la asistencia de aprendizaje en el uso del VPS y la enseñanza de conceptos de materias en las que logramos armar/programar Jupyter Notebooks. -Los interesados fueron Docentes, Alumnos, Particulares. -Se hizo de manera remota, tambien tuvimos experiencias en forma presencial. -Generamos nuestros notebook como recursos. -Tambien se pudo formar Jovenes Investigadores de la Universidad de Salta en disciplinas No relacionadas a Tecnologia. -Repositorio del Proyecto: https://github.com/entrerrianas/pyconus2022 Materials: -https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/10/2022-04-30T20%3A41%3A37.101675/01-PyconUS-Ariel-Iniciando-Jupyter-Widgets.pdf -https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/10/2022-04-30T19%3A20%3A02.141276/01-LILIANA-HURTADO.pdf -https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/10/2022-04-30T20%3A46%3A33.139948/QR-GITHUB-MEDIANO.png -https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/10/2022-04-30T20%3A41%3A54.521757/02-PyconUS-Sofia-Ejemplos.pdf
Watch
Charlas - María Andrea Vignau: Bailo con tu sombra: Patch, stub, mock
Pretendo incentivar la creación de tests, 1) Su importancia, ayudando a identificar las razones por las que usar objetos simulados. 2) como inyectar estos en el código, usando patch y dependencia inversa. 3) las ventajas y potencia de la librería mock, magicmock ayudar a identificar los casos de uso de patch, como emplear asserts respecto de las llamadas al mock y la posibilidad de usarlo como wrapper 4) Cierro contando dos casos de bibliotecas muy populares para tesetar páginas web: vcr-pytest y moto. el uso de mocks y de stubs. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/19/2022-04-25T16%3A47%3A54.046603/Mocks_es.pdf
Watch
Charlas - Luis Conejo: De cero a 200 OK en 30 minutos Desarrollo Web con Django, Heroku...
En época reciente, tuve la oportunidad de desarrollar un proyecto de freelance con una empresa editorial cuyo objetivo es migrar su herramienta de generación de libros de ser un script de Python, lanzado desde un terminal, a convertirse en una herramienta gráfica basada en la web. En el proceso, tuve la oportunidad de aplicar las herramientas para la creación de una página web completa que normalmente utilizo en mi trabajo como instructor de Python en una universidad. OBJETIVO En esta charla, quiero compartir una versión simplificada de dicha experiencia, mostrando la creación de un proyecto nuevo en Django, la implementación de acceso controlado y de un modelo de base de datos en SQL, para finalmente desplegar nuestro proyecto en Heroku y habilitar Integración Continua utilizando GitHub y TravisCI. El proceso se desarrolla en su totalidad en el nivel de coste cero de cada servicio, mostrando que es posible crear un prototipo completo de esta manera. AUDIENCIA META -Desarrolladores interesados en hacer freelance en Python sin incurrir en altos costos para la creación de un prototipo inicial para clientes potenciales. -Instructores interesados en enseñar desarrollo web 100% en Python. -Desarrolladores que quieren aprender desarrollo web en Python. ESTRUCTURA DE LA CHARLA -Cómo terminé de desarrollador freelance? (3 minutos). -Las herramientas que utilizaremos (3 minutos). -Creando nuestro proyecto en Django (22 minutos). - Configuración inicial. - Modelo de base de datos. - La interfaz de administrador en Django. - Plantillas HTML. - Publicación de nuestra página en Heroku. - Integración Continua con GitHub y TravisCI. -Qué más podemos hacer? (2 minutos)
Watch
Charlas - Laura Gutiérrez Funderburk: Reduciendo prejuicio en la inteligencia artificial...
En esta charla, el ponente proveerá: 1. Introducción a un problema de aprendizaje automático en el ámbito de salud: estudiaremos un problema con datos sobre pacientes, y un programa que busca recomendar a pacientes de mayor riesgo basado en el número de visitas de emergencia y no emergencia. 2. Ejemplos de cómo evaluar los datos para identificar prejuicios utilizando gráficos y los paquetes pandas y matplolib. Entrenamiento del modelo con scikit-learn y evaluación de eficacia de resultados. 3. Una introducción a Fairlearn (https://fairlearn.org/). Basado en el ejemplo anterior, vamos a ver cómo podemos utilizar Fairlearn para mejorar cómo determina el algoritmo a qué pacientes recomendar via el uso de dos submóludos: MetricFrame y ThresholdOptimizer. Veremos cómo podemos mejorar la calidad de las predicciones. 4. Cómo aprender más sobre Fairlearn, la comunidad y oportunidades para contribuir.
Watch
Charlas - Renne Rocha: Querido Diario: Cómo Liberar Datos Oficiales de Ciudades Brasileñas con Py...
Los Diarios Oficiales son las principales formas de comunicación entre la ciudadanía y el poder ejecutivo de una ciudad. En Brasil, por ley, todos los actos oficiales del gobierno deben publicarse en los Diarios. Sin embargo, no existe un estándar sobre como deben estar disponibles estas publicaciones. Entonces tenemos un escenario donde las 5570 ciudades brasileñas publican cada uno a su manera, generalmente utilizando formatos cerrados como PDF que dificultan la consulta y análisis de datos de forma automatizada. El proyecto Querido Diario tiene como objetivo hacer más accesibles estos Diarios, facilitando la búsqueda y consulta de su contenido a través de una página de búsqueda, una API abierta y en el futuro con herramientas de análisis de contenido. En esta charla se presentará todo el proceso, desde la extracción de datos de las páginas de los municipios (mediante data scraping usando el framework Scrapy), el almacenamiento y procesamiento de archivos PDF para permitir la búsqueda en su contenido (usando OCR), a la API y la página de búsqueda, donde cualquier persona tiene acceso centralizado a los Diarios de todos los municipios.
Watch
Charlas - Nicole Franco Leon: Álgebra de Mapas en Python
Álgebra de mapas es un lenguaje de expresiones aritméticas que utilizan relaciones (operadores y funciones) y variables que representan datos y valores espaciales para realizar análisis geográficos mediante el modulo ArcPy. El álgebra de mapas básicamente implica hacer matemáticas con mapas. La idea de utilizar datos geográficos existentes para generar nuevos o simplemente extraer de ellos resultados cuantitativos, es una práctica común desde el mismo momento en que aparece la cartografía moderna. En esta charla tendremos una introducción al modulo de ArcPy, su configuración en ArcGis, pasando luego por una mirada holística de todo el modulo desde sus operadores, operaciones, y funciones algebraicas, la creación de expresiones complejas para el procesamiento de datos geoespaciales en determinada temporalidad, la preparación de las capas y concluiremos con la generación de mapas usando Python. Si te gustan la geografía, la matemática y Python, esta charla es para ti.
Watch
Talk - Zachary Sarah Corleissen: Localize your open source documentation: a Kubernetes case study
NOTE: video begins at ~3:45 This talk covers how Kubernetes docs were able to scale from zero to eleven localizations within six months in 2018. It covers what docs maintainers learned, mistakes to avoid, and how you can start localizing your own open source project. Great documentation drives developer adoption...but documentation is only great if it's accessible. One piece of accessibility is localization: the ability for developers to access information in their native or primary language. This talk covers the specifics of scalable localization that other projects can adopt, based on the Kubernetes documentation model: tooling, workflows, standards for minimum viable documentation, and community conduct. This talk also covers some avoidable mistakes to save your maintainers time and stress, as well as the ongoing greater-than-additive benefits that localization can bring. This talk concludes with specific recommendations for other projects to start their own localizations.
Watch
Tutorial - Mario Munoz: Goodbye, "Hello, World." Hello, Functional FastAPI Web App!
Building a web application with Python is super easy. With just a few lines of code, you can get a simple, working app running directly on your computer's browser. Awesome! But then what? This tutorial focuses on that awkward transition from beginner to intermediate—when you want a project to be less of a sketchpad and more of an actual, useful tool. We will learn tactics on how to find and use resources when devising a plan for your web application, as well as hands-on learning for tackling common (and necessary) aspects of building your app, such as configuration, app structure, and database modeling. For the training, you will be following along as we build the foundation of a fully-functional web application, and will leave with the ability to further refine it for real-world scenarios.
Watch
Tutorial - Eric Ma: Network Analysis Made Simple
Have you ever wondered about how data scientists at Facebook and LinkedIn make friend recommendations? Or how epidemiologists track down patient zero in an outbreak? If so, then this tutorial is for you. In this tutorial, we will use a variety of datasets to help you understand the fundamentals of network thinking, with a particular focus on constructing, summarizing, visualizing, and using complex networks to solve problems.
Watch
Tutorial - Ariel Ortiz: A Pythonista's Introductory Guide to Web Assembly
Wasm is a binary code format specification first released in 2017. This technology can be implemented in web browsers or standalone applications in a secure, open, portable, and efficient fashion. More precisely, Wasm is an intermediate language for a stack-based virtual machine that uses a just-in-time (JIT) compiler to produce native machine code. Although Wasm was primarily designed as a compilation target for languages such as C/C++ or Rust, it can be integrated with Python in interesting ways. And that’s what we’ll be focusing on during this tutorial. Some experience with JavaScript and web development might come in handy but is not strictly required. At the end, we’ll show how to develop a tiny compiler that has Wasm as it’s compilation target. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/53/2022-05-05T23%3A36%3A12.746488/webassembly_tutorial.pdf
Watch
Tutorial - Pradeep Kumar Srinivasan, Jia Chen, Shannon Zhu: Python Types for Fun and Profit
Many Python developers now use type annotations to catch and fix bugs early in the coding process. This tutorial will introduce you to type annotations in Python. We’ll cover basic ideas about how types work in a dynamic language like Python, and where explicit annotations can provide value. We’ll then explore features of the type system in more depth, and demonstrate how they can be used to precisely yet flexibly express a huge range of programming patterns. Throughout the tutorial, you will have the chance to get your hands dirty by learning how to add types to small code snippets as well as to an example GitHub project, and run a type checker to see errors as you code. You’ll get to practice and play around with each concept as we discuss it, and walk away with concrete experience adding types to and catching bugs in real code. A laptop with Python installed is required along with internet access.
Watch
Tutorial - Jacob Deppen: Documenting your code from docstrings to automated builds
IF IT ISN'T DOCUMENTED, IT DOESN'T EXIST. Documentation can make or break a project. Getting it right takes effort, but that effort doesn't have to be painful. In this tutorial, we will take a multi-stage approach to documentation, starting with the fundamentals, adding complexity and style, then finishing with automated publishing to the web. We will practice a maintainer-friendly workflow that smooths out some of the rough edges of creating documentation. It is never too early or too late to pick up good documentation techniques and tools. As such, this tutorial will have elements that are relevant to brand new Pythonistas (What does a good docstring look like? What is a type hint?) as well as long-time practitioners (How can I make my docs easier to maintain? Where can I host docs? How can I test examples in my docstrings?). We will cover code comments, docstrings, and type annotations as ways to add documentation within your code. Next, we will add a user interface and documentation prose layer with JupyterBook, Jupyter Notebooks, and Markdown. After that, we will use Sphinx to build API documentation. Finally, we will automate the build and publish steps with GitHub Actions and GitHub Pages.
Watch
Tutorial - Cheuk Ting Ho: Knowledge graph data modelling with TerminusDB
FOR WHOM IS YOUR WORKSHOP Data scientists, engineers and researchers who have no prior experience in knowledge graph data modelling. In this workshop, we will start from the fundamentals - learning how to think in terms of triples to describe relations of different data objects. If your work involves data analysis, data management, data collaboration or anything data-related, this is a workshop for you to have a brand new insight into how data should be represented and stored. SHORT FORMAT OF YOUR WORKSHOP Overview-10 min, Lecture - 60 mins, Breaks- 20 minutes, Hands-on training - 80 mins, Closing - 10 mins WHAT ATTENDEES WILL LEARN By the end of the workshop, you will be able to think like a knowledge graph expert and construct a proper schema to store your data in a knowledge graph format. You will acquire the skills that you need to build knowledge graphs in TerminusDB - an open-source graph database that enables revisional control and collaborations. COURSE BENEFITS You will have learnt a new skill set that may assist you in your project in data science or research. You will have a new tool that you can better model your data and collaborate with others. Also, you gain all the prerequisites to use WOQL - a query language for knowledge graphs and the TerminusDB Python client to manage, manipulate and visualize data in your knowledge graph. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/50/2022-04-26T08%3A51%3A50.741435/Data_Modelling_Workshop.pdf
Watch
Tutorial - Francesco Bruni: Getting started with Object-Oriented Programming through Signal...
In this tutorial OOP foundations will be explored fitting signals and waves into objects. We will follow a top-down methodology, by modelling signals from scratch, creating fatty objects, and then tweaking their representation introducing inheritance and delegation. We will talk about Python magic methods to implement processing operations. We will eventually see how to implement the Iterator Design Pattern. Trough the session, we will keep a special eye on code explicitness and simplicity, highlighting pros and cons of every implementation. A laptop with Python installed is the sole requirement. Neverthless, it could be handful having a Jupyter notebook instance running to visualize and listen to signals easily. In this case only numpy and matplotlib should be already installed. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/28/2022-04-26T16%3A18%3A44.431273/pyconus22_complete.pdf
Watch
Tutorial - Jules S. Damji: Distributed Python with Ray Hands on with the Ray Core APIs
Please note: Audio and speaker video do not start until 01:28:26. Our apologies for this. This is an introductory and hands-on guided tutorial of Ray Core. Ray provides powerful yet easy-to-use design patterns for implementing distributed systems in Python. This tutorial includes a brief talk to provide an overview of concepts, why one might use Ray for distributing Python and Machine Learning workloads, and a brief discussion on Ray’s Ecosystem. Primarily, the tutorial will focus on Ray Core APIs to write remote functions, actors, and understand Ray’s basic design patterns for writing distributed Python applications. Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/164/2022-04-25T16%3A51%3A23.026629/pyconsus_slides.pdf
Watch
Talk - William Morrell: (Professionally) Coding with Others
A mix of tools and practices to incorporate for facilitating collaboration between developers. As a nice side-effect, these also let past-you help future-you work on entirely solo projects. Topics include: - Documentation, specifically calling out a README and contributor guidelines, and site generators à la Sphinx or MkDocs; - Version control / git, collecting changes in logical commits, writing good commit and pull request messages; - Auto-lint and formatting: pre-commit, black, isort, flake8; - Dependency management: pyenv, pipenv/poetry, Docker; Slides: https://pycon-assets.s3.amazonaws.com/2022/media/presentation_slides/103/2022-04-11T02%3A51%3A35.659719/pycon_export.pdf
Watch
Tutorial - Pandy Knight: Awesome Modern Web Testing with Playwright
Everybody gets frustrated when web apps are broken, but testing them thoroughly doesn't need to be a chore. Playwright, a new open-source browser automation tool from Microsoft, makes testing web apps fun! Playwright outperforms other tools like Selenium WebDriver with a slew of nifty features like automatic waiting, mobile emulation, and network interception. Plus, with isolated browser contexts, Playwright tests can set up much faster than traditional Web UI tests. In this tutorial, we will build a Python test automation project from the ground up. We will automate web search engine tests together step-by-step using Playwright for interactions and pytest for execution. Specifically, we will cover: 1. How to install and configure Playwright 2. How to integrate Playwright with pytest, Python’s leading test framework 3. How to perform interactions through page objects 4. How to conveniently run different browsers, capture videos, and run tests in parallel By the end of this tutorial, you'll be empowered to test modern web apps with modern web test tools. You'll also have an example project to be the foundation for your future tests. You can use Playwright to test Django apps, Flask apps, or any other kinds of apps!
Watch