List of videos

Talk: Maria Jose Molina-Contreras - How to build an intelligent “indoor garden”

Presented by: Maria Jose Molina-Contreras, phD Most people around the world are getting more and more interested in living a long and healthy life nowadays. And many studies have proven that growing house plants, as long as being a trend, improves health. The truth is, we are so busy with work, and even when we get home, that we do not have enough time to properly water our plants when they need it. In this talk, you are going to learn how to build a functional and beginner friendly system to keep your plants alive using different sensors, microcontrollers and CircuitPython. This system is going to water your plants based on their necessities and, step by step, you are going to add components and functionalities to the system; for instance, you will give your plants “a voice” to inform you about their deficiencies. Moreover, in this project, you will learn how to create a Web application with Flask, how to set up a Raspberry Pi as a local server and how to use a cloud IoT service for Data Analysis. Finally, you will see how Circuitpython can play an amazing role in these kind of situations, by helping plants to survive longer, and by making our responsibility lighter. This will also help you to understand that it is an excellent choice to start programming hardware and connected devices for everyone!

Watch

Talk: Chris Seto - Big O No: Django ORM runtime complexity and how to avoid it using LATERAL JOINS

Presented by: Chris Seto N+1 queries are as common as blog posts about how to solve them. What happens when you want to get a list of blog posts, the comments on them, and the comments’ respective authors? An exponential number of queries. The rise of GraphQL and REST APIs that provide “include” semantics make these situations increasingly common and painful. Clever use of prefetch_related and select_related may help in a pinch but never fully solve the problem. Learn how to identify these inefficient queries and optimize them using SQL aggregations and LATERAL JOINs. Talk slides: https://docs.google.com/presentation/d/1JtlzTGAwcTiltbo_yvxLmm8fTydvhfK5i-wQ9YmS77I/edit?usp=sharing Talk resources: - Companion Repo: https://github.com/chrisseto/pycon2020-big-o-no - Mentioned Library: https://github.com/chrisseto/django-include

Watch

Talk: Catherine Nelson - Practical privacy-preserving machine learning in Python

Presented by: Catherine Nelson Machine learning is hungry for data, usually collected from users of a product and often including a lot of personal or sensitive information. What if we could build accurate machine learning models while still preserving user privacy? There’s a growing number of tools in Python to help us achieve this, ranging from federated learning, where a user’s data remains on their own device, to algorithms for training models on encrypted data. In this talk, I’ll tour the landscape of these tools and review what works, what doesn’t work, and where they fit in a machine learning pipeline. Data privacy is a huge concern for everyone in tech these days, thanks to both legislation such as the GDPR, and user opinions driven by scandals in the media. Machine learning is at the forefront of this because it’s hungry for large amounts of training data, but it’s also an area where there’s lots of research on developing solutions that protect user privacy. When I started learning about privacy-preserving machine learning, I found a bewildering number of research papers, introducing some really cool solutions, but very little practical advice on how to apply them in a real-world situation. This is the talk I wish I could have attended at the start of my learning journey! I’ll review the landscape of Python solutions for privacy-preserving ML and show how they fit into a machine learning pipeline. I’ll explain the tradeoffs of each method and also talk a little about the ethics of using personal data for training ML models. Tools and packages covered will include TensorFlow Privacy, TensorFlow Encrypted and PySyft. Talk slides: https://github.com/drcat101/pycon-2020-privacy

Watch

Talk: Pratyush Das - Python in High Energy Physics

Presented by: Pratyush Das High Energy Physics is the study of the most fundamental constituents of matter and how these elementary particles interact. Often synonymous to Particle Physics, High Energy Physics seeks to find the secrets of the Universe, one of the recent major discoveries being that of the Higgs Boson that confirmed the Standard Model that dictates how all the forces in the Universe interact with each other. High Energy Physics is probably the physics sub-field that has adopted Python most rapidly, only second to Astrophysics. The talk starts with a look at how computing has looked like in the field of High Energy Physics in the past and how a lot of physicists played major roles in the development of Computer Science. It then explores the emergence of Python as the language of choice for several physicists and two of the major libraries that have been vital to the adoption of Python in the High Energy Physics community - cppyy and uproot. These are especially important since they demonstrate the different ways one could approach shifting the High Energy Physics community from C++ to Python successfully. The talk will focus on a review of where and how Python is used in the High Energy Physics community and how it is slated to look like in the future. High Energy Physics has its own python toolkit, scikit-hep which comes with a set of python libraries for use by physicists. The Scikit-HEP project is a community-driven and community-oriented project with the aim of providing Particle Physics at large with an ecosystem for data analysis in Python. It is also about improving the interoperability between High Energy Physics tools and the scientific ecosystem in Python. This year is ideal for this particular talk, being the year when according to some available data, Python usage trumps C++ usage in several High Energy Physics experiments at CERN - as some physicists have dubbed it, this is the year of Python in High Energy Physics.

Watch

Talk: Aly Sivji - If Statements are a Code Smell

Presented by: Aly Sivji if statements are elements of a programming language that allow us to control what statements are executed. By chaining together a series of if statements, we can solve any problem we can think of. But code with too many if statements is hard to read and even harder to change. Workarounds that once allowed us to move fast, now get in the way when we go in to make modifications. It doesn’t have to be this way! This talk demonstrates HOWTO handle complex conditional logic with simple Python classes. The material will be presented in the context of a code refactor for an open-source project: we examine the initial solution featuring duplicate if statements, show how hard it is to make a change, and walk through the process of refactoring if blocks into polymorphic classes. The case study has been simplified to illustrate concepts you can apply to your own code. After this talk, you will be able to identify situations where an object-oriented solution can be used to improve software design. You will also be exposed to tradeoffs we need to think about before refactoring to higher-level abstractions. Talk slides: http://bit.ly/code-smell-if-statements Talk resources:: https://github.com/alysivji/talks/

Watch

Talk: Shreya Khurana - How multilingual is your NLP model?

Presented by: Shreya Khurana Natural language is constantly evolving. With social media having its own language and interactions becoming more global, NLP models need more than just monolingual corpora to understand and make sense of all this data. Roughly, 50% of the world speaks two or more languages. This comes as a challenge to NL systems because conventional models are trained to understand one language or only translate from one to the other. In this talk, we’ll focus on Natural Language Understanding (NLU) for small multilingual texts. A key step in building NLU systems is language identification. First, we’ll give an introduction to existing frameworks for this task in Python like cld3, langid, langdetect and will also have a short discussion on their shortcomings. Another area of concern is transliterated and code-switched text, which consists a combination of two or more structurally different grammars and vocabulary. This type of data can be clearly seen in Tweets and comments on Facebook as well as product reviews. What makes this problem very challenging is the lack of annotated datasets and the added noise of having no “correct” grammar and spelling. We discuss the approaches to solve this using web crawlers and self-generated datasets. The next section of this talk will be on using the multilingual BERT model released by Google, which is trained in 104 languages. We’ll see some examples of how this model performs when given text pieces in different languages. In the final section, we’ll discuss how to evaluate the model for different tasks. Talk resources: https://github.com/ShreyaKhurana/pycon/

Watch

Talk: Manojit Nandi - The Limitations and Danger of Facial Recognition

Presented by: Manojit Nandi Biometric scanners, such as face recognition technology, have seen widespread adoption in applications, such as identifying suspected criminals, analyzing candidate’s facial expressions during job interviews, and monitoring attendance at schools. As these technologies have become more pervasive, many organizations have raised potential concerns about the way these technologies schematize faces. Studies have shown commercial face recognition software has noticeably lower accuracy on darker-skinned individuals, and automatic gender recognition systems regularly misgender trans and non-binary individuals. In addition, many scholars have written about the rise of techno-surveillance and looming threat of constant government tracking of citizens. In this talk, I will discuss these issues, and what we as technologists do to prevent building software that enables harm upon vulnerable populations. Talk slides: https://speakerdeck.com/lejit/the-limitations-and-dangers-of-facial-recognition

Watch

Talk: Emmanuelle Gouillart - Building interactive applications for image data

Presented by: Emmanuelle Gouillart Images are an important class of data in science or business. Tasks such as quantification of organ geometry in medical imaging, or construction of training sets and pipelines for machine learning models, typically rely on a combination of interactive user annotations and image processing algorithms. In this talk I will present several open-source Python packages for interactive image processing, and how to combine them for advanced applications. Dash is an open-source framework for building interactive analytical web applications in pure Python (or R). It comes with a set of interactive components which are the bricks from which to build easily custom analytical applications, such as figures using the plotly visualization library, interactive data tables, dropdowns, sliders, etc. These components interact together thanks to callbacks fired when a component is modified. After a demo of how to build an application with Dash, I will show how to interact with image data within Dash for exploring image characteristics or annotating images with various kinds of shapes (from rectangular bounding-box selection to freehand-brush painting of objects). In addition, Dash applications can make use of Python data-science packages in order to use advanced algorithms to process user-provided annotations. I will focus mostly on scikit-image, and briefly mention machine learning / deep learning tools as well. scikit-image is a popular library for processing 2D and 3D images as Numpy numerical arrays, with a focus on scientific imaging and pedagogical example-based documentation. I will show how to use scikit-image for various image processing tasks, from basic preprocessing (e.g. normalizing image geometry or exposure) to advanced object segmentation tasks. I will finally show how combining scikit-image and Dash can result in advanced image processing applications, which can be written quickly thanks to simple APIs and thorough documentation.

Watch

Talk: Igor T. Ghisi - Write Less and Test More with Data Regression Testing

Presented by: Igor T. Ghisi As data structures of a project increases in size and complexity, it becomes harder and harder to preserve test completeness. Testing objects with dozens of attributes and arrays with hundreds of values could turn into a laborious task. Often, programmers let these kind of data partially tested, especially if the required code coverage was already achieved. In this talk we’ll show how to increase test completeness for data structures by applying data regression testing. We’ll be presenting pytest-regressions, a pytest plugin that helps to test datasets and objects by automatically serializing expected data on disk and later checking test results against it. We’ll also show how pytest-regressions make it easier to inspect test data and debug failing tests. The talk will demonstrate examples of data regression being applied to numerical algorithms, web APIs, Flask views and SQLAlchemy models. Talk slides and resources: https://github.com/igortg/pycon2020-pytest-regressions

Watch