List of videos

Sponsor Presentation—The ChatGPT Privacy Tango: Dancing with Data Security and Large Language Models

By Jason Mancuso and Mike Gardner Sponsor: Cape Privacy In the world of AI and natural language processing, privacy and utility often find themselves engaged in a delicate dance. As users attempt to leverage the power of Large Language Models (LLMs) like ChatGPT for sensitive or confidential data, they face the challenge of maintaining privacy without compromising the value these models bring. Enter the "ChatGPT Privacy Tango" – a metaphor for the intricate steps needed to balance these competing interests. During this talk, we'll delve into a system we've designed to help users navigate the Privacy Tango, striking a balance between preserving privacy and maximizing utility with ChatGPT and LLMs. We'll discuss the significance of protecting personally identifiable information (PII) and maintaining data security while still enjoying the advantages of AI-powered language models. Additionally, we'll cover the pros and cons of this approach and touch on alternative systems that we've considered. We invite you to join us as we explore the fascinating interplay of data privacy, secure enclaves, and PII removal, shedding light on the path towards more privacy-aware applications with Large Language Models. By the end of our session, you'll be better prepared to dance the ChatGPT Privacy Tango, armed with the knowledge and tools needed to safeguard sensitive information while harnessing the power of ChatGPT and LLMs.

Watch
Talks - Nicholas H.Tollervey, Paul Everitt: Build Yourself a PyScript

PyScript and Pyodide have gained a lot of attention, as Python in the browser presents interesting opportunities. And architectural questions as well. What does it mean to write an extensible, friendly web platform targeting Python? In this talk, learn how PyScript works and watch a treatment of key technical issues for writing web apps with the WebAssembly version of Python. What does “file” mean? How do you install something? What are web workers and how do they impact your architecture? PyScript itself is constantly evolving on these topics. Come for a spirited discussion with a fast-paced format.

Watch
Talks - Bert Wagner: Cross-Server Data Joins on Slow Networks with Python

While working from home has its perks, you've found one thing missing in your remote work life: speed of network data transfer. It doesn't matter if you can write the most efficient Python data transformation code when your jobs are bottlenecked by slow data movement happening between your local laptop and remote servers. In this talk we will address techniques for querying and joining data across distant machines efficiently with Python. We will also discuss how to handle scenarios where you need to join datasets that won't fit in your laptop's memory, including several techniques and packages for making cross server joins. This session won't stop you from getting angry when your ISP throttles your home internet connection, but it will teach you ways to work with local and remote datasets as efficiently as possible.

Watch
Talks - Brandt Bucher: Inside CPython 3.11's new specializing, adaptive interpreter

Python 3.11 was released on October 24th, bringing with it a new "specializing, adaptive interpreter." As one of the engineers who works on this ambitious project, my goal is to introduce you to the fascinating way that your code now optimizes itself as it's running, and to explore the different techniques employed under-the-hood to make your programs 25% faster on average. Along the way, we'll also cover many of the challenges faced when optimizing dynamic programming languages, some of the tools you can use to observe the new interpreter in action, and what we're already doing to further improve performance in Python 3.12 and beyond.

Watch
Talks - Victor Stinner: Introducing incompatible changes in Python

In the Python 2 era, it was decided to migrate at a D-Day: convert all your code base to Python 3. It didn't go as well as expected. We learnt lessons from this mistake. Incompatible changes are now introduced differently in Python. Today, changes start with a deprecation warning for at least two Python releases before removing old functions. We think about how to write a single code base working on the old and new Python versions. More and more often, instructions to migrate existing code are provided, or even automated tools. Changes breaking too many projects are reverted when there is not enough time to update enough projects. Code search helps detecting affected projects, notify them, and maybe also propose changes to prepare their code. In the future, Python is working on a stable ABI to be able to build C extensions once and use them on many Python versions. The HPy project is an interesting candidate for this goal. More and more projects are being tested on the Python version currently under development (Python 3.12)

Watch
Talks - Ron Nathaniel: How To Monitor and Troubleshoot Applications using OpenTelemetry

OpenTelemetry is a free, open-source Observability Protocol. OpenTelemetry sits at the application layer, and exports Traces, Metrics, and Logs to a backend for observing. It is extremely helpful and beneficial to developers in the mean "time-to-detection" and "time-to-resolution" of bugs and issues that occur at the application layer; this ranges from detecting and alerting for errors raised (such as TypeError), to finding that a specific microservice (such as AWS Lambda) ran for twice as long as usual, all the way to seeing the output of a service and comparing it to the expected output to find a bug in the logic of the service. This talk is meant as a great eye-opening introduction into basic Monitoring and Troubleshooting code that may be running in a galaxy far, far away on a Cloud Provider’s computer. This talk is geared towards complete beginners to the Monitoring and Observability world, and to show them just how easy it is to get set up and running. No OpenTelemetry or otherwise experience is needed, just a basic understanding of Python syntax to read and understand the minimal code changes required for OpenTelemetry.

Watch
Talks - Erik Tollerud: How Python is Behind the Science of the James Webb Space Telescope

The James Webb Space Telescope (JWST) is one of the largest science projects in history. Its aim is to blow the door open on infrared astronomy: it has already found the earliest galaxies, will reveal the birth of stars and planets, and look for planets that could harbor life outside our solar system. Not to mention it has and will produce a lot of spectacular pictures that help us all understand our place in the cosmos in a way never before possible. And while there were many varied programming languages used for development and operation of JWST, the language used for most of the science is Python. In this talk I will walk through some of the early science of JWST and how it has been made possible by Python and the broad and deep open source Python scientific ecosystem.

Watch
Talks - Uzoma Nicholas Muoh: Improving Efficiency in Transportation Networks using Python

When we think about what Python is for, we often think of things like analytics, machine learning, and web apps, but python is a workhorse that plays a tremendous and often invisible role in our day-to-day lives, from medicine to finance, and even the transportation of goods from manufacturers to the shelves of our neighborhood stores. Transportation networks are highly dynamic, goods are always moving from point A to point B and money is being gained or lost every minute. Improving efficiency in a transportation network is critical to the survival of a business that provides transportation and distribution services as well as ensuring timely delivery of goods to customers. This talk examines three real-world examples of how Python is used to improve the efficiency of transportation networks, particularly we will explore: * Finding the optimal match between a driver and a load at the lowest possible cost using Google's ortools; * Generating recommendations for macro level optimizations to a transportation network using networkX; and * Helping the decision making process by answering the question "Should I accept this work?" using skfuzzy. Key Takeaways include: * Graph analytics and data science concepts that facilitate getting goods from manufacturers to stores more efficiently and at a lower cost to businesses; and * An appreciation of the complexity of the logistics industry and the role Python plays in making the life of drivers better.

Watch
Talks - Christopher Ariza: Building NumPy Arrays from CSV Files, Faster than Pandas

Twenty years ago, in 2003, Python 2.3 was released with csv.reader(), a function that provided support for parsing CSV files. The C implementation, proposed in PEP 305, defines a core tokenizer that has been a reference for many subsequent projects. Two commonly needed features, however, were not addressed in csv.reader(): determining type per column, and converting strings to those types (or columns to arrays). Pandas read_csv() implements automatic type conversion and realization of columns as NumPy arrays (delivered in a DataFrame), with performance good enough to be widely regarded as a benchmark. Pandas implementation, however, does not support all NumPy dtypes. While NumPy offers loadtxt() and genfromtxt() for similar purposes, the former (recently re-implemented in C) does not implement automatic type discovery, while the latter (implemented in Python) suffers poor performance at scale. To support reading delimited files in StaticFrame (a DataFrame library built on an immutable data model), I needed something different: the full configuration options of Python's csv.reader(); optional type discovery for one or more columns; support for all NumPy dtypes; and performance competitive with Pandas read_csv(). Following the twenty-year tradition of extending csv.reader(), I implemented delimited_to_arrays() as a C extension to meet these needs. Using a family of C functions and structs, Unicode code points are collected per column (with optional type discovery), converted to C-types, and written into NumPy arrays, all with minimal PyObject creation or reference counting. Incorporated in StaticFrame, performance tests across a range of DataFrame shapes and type heterogeneity show significant performance advantages over Pandas. Independent of usage in StaticFrame, delimited_to_arrays() provides a powerful new resource for converting CSV files to NumPy arrays. This presentation will review the background, architecture, and performance characteristics of this new implementation.

Watch