List of videos

Talks - Glyph: How To Keep A Secret
API keys, passwords, auth tokens, cryptographic secrets… in the era of cloud-based development, we've all got a bunch of them. But where do you put them? How do you keep them safe? And how can you access them conveniently from your Python code, both in development and production, without putting them at risk? In this talk, I'll review information security best practices for managing secrets as well as Python-specific tips and tricks.
Watch
Talks - Jodie Burchell: Vectorize using linear algebra and NumPy to make your Python code fast
Have you found that your code works beautifully on a few dozen examples, but leaves you wondering how to spend the next couple of hours after you start looping through all of your data? Are you only familiar with Python, and wish there was a way to speed things up without subjecting yourself to learning C? In this talk, you'll see some simple tricks, borrowed from linear algebra, which can give you significant performance gains in your Python code, and how you can implement these in NumPy. We'll start exploring an inefficient implementation of an algorithm that relies heavily on loops and lists. Throughout the talk, we'll iteratively replace bottlenecks with NumPy vectorized operations. At each stage, you'll learn the linear algebra behind why these operations are more efficient so that you'll be able to utilize these concepts in your own code. You'll see how straightforward it can be to make your code many times faster, all without losing readability or needing to understand complex coding concepts.
Watch
Talks - Calvin Hendryx-Parker: Too Big for DAG Factories?
You’re working on a project that needs to aggregate petabytes of data, and it doesn’t make sense to manually hard-code thousands of tables, DAGs (Directed Acyclic Graphs) and pipelines. How can you transform, optimize and scale your data workflow? Developers around the world (especially those who love Python) are using Apache Airflow — a platform created by the community to programmatically author, schedule and monitor workflows without limiting the scope of your pipelines. In this talk, we’ll review use cases, and you’ll learn best practices for how to: - use Airflow to transfer data, manage your infrastructure and more; - implement Airflow in practical use cases, including as a: - workflow controller for ETL pipelines loading big data; - scheduler for a manufacturing process; and/or - batch process coordinator for any type of enterprise; - scale and dynamically generate thousands of DAGs that come from JSON configuration files; - automate the release of both the DAGs and infrastructure updates via a CI/CD pipeline; - run all tasks simultaneously using Airflow. Both beginner and intermediate developers will benefit from this talk, and it is ideal for developers wanting to learn how to use Airflow for managing big data. Beginners will learn about dynamic DAG factories, and intermediate developers will learn how to scale DAG factories to thousands of DAGS — which is something Airflow can’t do out of the box. After this talk and live demo, people will learn best practices (including access to a code repo) that will allow them to scale to thousands of DAGs and spend more time having fun with big data.
Watch
Talks - Ludovico Bianchi: Using Python's import machinery to handle API deprecations
For any software project with an established user base, introducing breaking changes in its API can be daunting. To minimize disruptions for users, projects are incentivized to plan these transitions carefully, which may include API deprecations, where messages warning users of upcoming changes are added to the affected APIs while they’re still functional. However, this imposes extra workload for the project’s maintainers, as both old and new versions of the API must be kept functional throughout the transition period. As a maintainer of a software project undergoing preparations for a major version release, I recently found myself in a similar situation: our goal was to provide backward compatibility with the previous version for as long as possible, without impacting the development of new features. Practically, this included dealing with a radical restructuring of the Python codebase, resulting in hundreds of modules being relocated, split, or removed. Was there any way to ensure that the deprecated import paths could still be used without errors, without having to maintain two separate versions of the package? Fortunately, the answer to “can you do that in Python?” is more often than not “yes!”; for this particular case, the path to success turned out to be through the importlib package of the standard library. For something so close to Python’s internals, importlib is both accessible and extensible, allowing ordinary code to customize almost completely how and what modules can be imported---including modules that are not there anymore! This intermediate-level talk will present a complete solution based on Python’s importlib machinery that allows to redirect modules or module attributes with deprecations in a simple, robust, and scalable way. While the context of the solution is especially relevant for project maintainers, the focus is on importlib techniques that are generally applicable.
Watch
Talks - Sanskar Jethi: Robyn: An async Python web framework with a Rust runtime
With the rise of Rust bindings being used in the Python ecosystem, we know that throughput efficiency is one of the top priority items in the Python ecosystem. Inspired by the extensibility and ease of use of the Python Web ecosystem and the increase of performance by using Rust as a core, Robyn was created. Robyn is one of the fastest Python web frameworks in the current Python web ecosystem. With a runtime written in Rust, Robyn achieves near-native rust performance while still having the ease of writing Python code. This talk will focus on the increased involvement of Rust in the Python ecosystem. It will also demonstrate why Robyn was created, the technical decisions behind Robyn, the increased performance by using the Rust runtime, how to use Robyn to develop web apps, and most importantly, how the community is helping Robyn grow! I will briefly demonstrate my experience and challenges of building a community around the project and how it allowed Robyn to ensure a smooth sail even in turbulent situations. I shall also share my future plans for Robyn.
Watch
Talks - Alireza Farhidzadeh: Getting Around the GIL: Parallelizing Python for Better Performance
One of the ever-present banes of a data scientist’s life is the constant wait for the data processing code to finish executing. Slow code affects almost every step of a typical data pipeline: data collection, data pre-processing/parsing, feature engineering, etc. Many times, the lengthy execution times force data scientists to work with only a subset of data, depriving him/her of the insights and performance improvements that could be obtained with a larger dataset. One of the tools that can mitigate this problem and speed up data science pipelines (and CPU-bound programs) is parallelization. Parallelization is a useful way to work around the limitations of the Global Interpreter Lock (GIL), a key feature of Python that prevents code from fully utilizing multiple processor cores and can impact performance. In this session, we’ll walk through several ways to parallelize Python code, depending on the specific needs of your program and the type of parallelism you want to achieve.
Watch
Talks - Josh Weissbock, Sheila Flood: Using Python to Help the Unhoused
How a group of volunteers from around the globe use Python to help an NGO in Victoria, BC, Canada to help the unhoused. By building a tool to find social media activity on unhoused in the Capitol Region, the NGO can use a dashboard of results to know where to move their limited resources.
Watch
Talks - E. Johnson: Skynet 101 How to Keep Your Machine Learning Code From Getting Away From You
Machine learning can feel pretty mysterious at times, but as python developers you have so many of the tools you need to be a part of it! With basic python experience you can use libraries like pandas and tools like Jupyter Notebooks to analyze and manipulate data sets. By apply Test-Driven Development practices to you analysis you can feel confident about what your building. You can build well developed and well tested cleaning scripts and functions using pytest and use these functions in your notebooks and scripts. You can even build simple recommendation engines using libraries such as Scikit Learn! As a part of this talk we will walk through the process of data analysis, data cleaning, feature preparation, and building a simple movie recommendation engine. As we move through those steps, my main focus is to teach engineers how they can incorporate Test-Driven Development into the data cleaning process and the building of our engine. I will also walk through strategies for data analysis and explain at a high level a couple ML concepts that we can use. As participants get the chance to see live examples of how to use Test Driven Development in data analysis and machine learning they can get a handle on some core concepts and learn how to ensure quality in the code that they produce.
Watch
Talks - Paolo Melchiorre: A pythonic full-text search
A full-text search on a website is the best way to make its contents easily accessible to users because it returns better results and is in fact used in online search engines or social networks. The implementation of full-text search can be complex and many adopt the strategy of using dedicated search engines in addition to the database, but in most cases this strategy turns out to be a big problem of architecture and performance. In this talk we'll see a pythonic way to implement full-text search on a website using only Django and PostgreSQL, taking advantage of all the innovations introduced in latest years, and we'll analyze the problems of using additional search engines with examples deriving from my experience on djangoproject.com. Through this talk you can learn how to add a full-text search on your website, if it's based on Django and PostgreSQL, or you can learn how to update the search function of your website if you use other search engines.
Watch