Tech confs

Talks - Josh Wiedemeier: There and Back Again: Reverse Engineering Python Binaries

Companies and malware authors use packaging tools to distribute products and payloads as Python bytecode (.pyc) files, often thinking that their secret logic will be unreadable by humans. Using a simple example, we will teach curious developers how to interpret and decompile Python bytecode by hand. Finally, we will discuss challenges and solutions of automating Python decompilation. This talk is targeted towards intrepid intermediate Python developers who are looking to take a look under the hood, and reverse engineers who are looking to add Python binaries to their repertoire. Unlike previous bytecode-oriented talks at PyCon, which primarily focus on Python's execution model, this talk is primarily interested in recovering Python source code from Python bytecode. Slides: https://pycon-assets.s3.amazonaws.com/2024/media/presentation_slides/118/2024-05-14T19%3A39%3A51.771340/JoshWiedemeier-PyCon24-ThereAndBackAgain.pdf

Watch

Talks - Michael Chow, Richard Iannone: Making Beautiful, Publication Quality Tables in Python...

Full title: Making Beautiful, Publication Quality Tables in Python is Possible in 2024 Tables are undeniably useful for data work. We have many excellent DataFrame libraries in Python and they give us the flexibility to manipulate data to our hearts content. But what happens when comes to presenting tables to others? The display of tables can be beautiful. Tables can convey information effectively, just as plots do and, sometimes, it’s the better way to present data. Truly, the time has come to bridge the divide between raw DataFrame output and wondrously-structured tables suitable for publication. Let's review the state of ‘display tables’ in 2024. We’ll go over which table components make for effective displays of information. It’s surprising but there are many considerations that go into making a well-crafted table. We’ll take a look at the combinations of Python packages that fit together to make this important task possible, and marvel together at the tabular results they can provide.

Watch

Talks - Pablo Galindo Salgado: Profiling at the speed of light

Did you know that Python 3.12 will include one of the world's smallest just-in-time (JIT) compilers? Also, you will be surprised to learn it is not what you probably think it is. Python 3.12 will include support for the Linux perf profiler. The Linux perf profiler is a very powerful tool that allows you to profile and obtain information about the performance of your application. perf also has a very vibrant ecosystem of tools that help with the analysis of the data that it produces. In this talk, we will talk about how this exciting feature was implemented, how the support provided by the perf profiler differs from other performance-oriented profilers for Python, and how it can be used effectively, including how to activate it dynamically to enable production profiling. We will also cover some of the requirements that are needed to obtain the best results, as well as some of the limitations of the implementation and how those can affect your metrics. Being able to understand where our Python applications are expending their time is crucial to being able to improve the performance characteristics of our applications. Several tools already exist to help with this task, but they all have their own limitations, especially when native code written in C, C++, Rust, etc. is involved. Being able to gather and cross-correlate performance information with other performance-related markers, such as branch mispredictions, cache misses, context switches, and other events, can be key in understanding some of the most challenging profiling puzzles.

Watch

Talks - Paul Ganssle: pytest for unittesters

Are you a unittest user interested to learn more about pytest? Do you want to learn to write more idiomatic pytest tests? Do you use neither and want an overview of some of the differences between the two frameworks? If you answered yes to any of these questions, then this talk is for you! Join us for an introduction to fixtures, test parameterization,and an explanation of some of pytest's subtler user experience enhancements, in a talk that will just scratch the surface of pytest's extensive feature set.

Watch

Talks - Reuven M. Lerner: Times and dates in Pandas

Pandas is famously flexible and capable at analyzing numeric data. But Pandas is also flexible and capable at working with times and dates. In this talk, I'll describe the dtypes associated with times and dates, the sorts of calculations you can perform, issues with parsing and importing datetime data, and how you can perform more complex tasks, such as grouping, pivoting, and resampling. By the time this talk is over, you'll be able to work with time-based data in new ways. times and dates work, from handling inputs to performing sophisticated analysis. Slides: https://pycon-assets.s3.amazonaws.com/2024/media/presentation_slides/81/2024-05-17T11%3A11%3A55.908579/Dates_and_times_in_Pandas.key.pdf

Watch

Talks - Russell Keith-Magee: Build a data visualization app for your phone

The modern mobile phone is an incredibly powerful computing device. However, mobile platforms have historically excluded the Python data science community, requiring specialist platform-specific skills, or making the use of Python data science tools exceedingly difficult. This isn't true any more. In this talk, you'll learn how to build and run an app on your phone that uses the Python data analysis and visualization tools you're already familiar with, like NumPy and Matplotlib. No special mobile development skills are required; only a basic familiarity with Python. Slides: https://pycon-assets.s3.amazonaws.com/2024/media/presentation_slides/40/2024-05-15T12%3A22%3A44.310061/slides.txt Note: captions are not available but we are working to fix this

Watch

Talks - Sydney Runkle: Pydantic Power-up: Performance Tips for Lightning-Fast Python Applications

Pydantic is the most widely used data validation library for Python. With the V2 release, the library shifted to using Rust for core validation logic, which resulted in 5-50x speedups compared to V1. Though Pydantic is already quite efficient, there are some little-known performance tips and tricks you can use to ensure optimal performance. In this talk, I’ll delve into a spectrum of optimizations, ranging from one line fixes to larger scale design modifications that can help you squeeze the best performance out of Pydantic. In terms of one line fixes, I’ll suggest changes ranging from opinionated JSON loading syntax to TypeAdapter usage tips. The focal point of the talk will be tagged (also called discriminated) unions, a Pydantic tool used to efficiently validate union types, while also taming related validation errors. Listeners will walk away from this talk with a more nuanced understanding of performance with Pydantic, an abundance of examples that they can apply to their own code, and insights into upcoming performance enhancing features coming to Pydantic.

Watch

Talks - Bradley Dice: Hacking `import` for speed: how we wrote a GPU accelerator for pandas

Python’s import system is eminently hackable. Often, this is a tool of last resort, but it can be extremely powerful. In this talk, we’ll describe our ambitious effort to hack import pandas to accelerate large parts of it on the GPU using cuDF: a GPU DataFrame library. We’ll cover the basics of import hacking and other tricks like Pythonic proxy patterns. We’ll show how we use these more dynamic features of Python to effectively accelerate any code that uses pandas, including third-party libraries. We’ll also get into the technical and social problems that currently necessitate these sophisticated solutions, and share some thoughts on solving them. It will be a story of successes, failures, wishes and tears, and excursions into exciting parts of Python many developers may not have encountered before! This talk is for the Pythonista interested in the import system and how to hack it for performance. It is also for developers interested in the question of speeding up the vast ecosystem built on top of libraries like numpy and pandas without code changes.

Watch

Talks - Jodie Burchell: Lies, damned lies and large language models

Would you like to use large language models (LLMs) in your own project, but are troubled by their tendency to frequently “hallucinate”, or produce incorrect information? Have you ever wondered if there was a way to easily measure an LLM’s hallucination rate, and compare this against other models? And would you like to learn how to help LLMs produce more accurate information? In this talk, we’ll have a look at some of the main reasons that hallucinations occur in LLMs, and then focus on how we can measure one specific type of hallucination: the tendency of models to regurgitate misinformation that they have learned from their training data. We’ll explore how we can easily measure this type of hallucination in LLMs using a dataset called TruthfulQA in conjunction with Python tooling including Hugging Face’s datasets and transformers packages, and the langchain package. We’ll end by looking at recent initiatives to reduce hallucinations in LLMs, using a technique called retrieval augmented generation (RAG). We’ll look at how and why RAG makes LLMs less likely to hallucinate, and how this can help make these models more reliable and usable in a range of contexts. Slides: https://pycon-assets.s3.amazonaws.com/2024/media/presentation_slides/48/2024-05-10T12%3A35%3A48.498968/lies-damned-lies-and-llms-final-speaker-notes.pdf

Watch

List of videos

Talks - Josh Wiedemeier: There and Back Again: Reverse Engineering Python Binaries

Talks - Michael Chow, Richard Iannone: Making Beautiful, Publication Quality Tables in Python...

Talks - Pablo Galindo Salgado: Profiling at the speed of light

Talks - Paul Ganssle: pytest for unittesters

Talks - Reuven M. Lerner: Times and dates in Pandas

Talks - Russell Keith-Magee: Build a data visualization app for your phone

Talks - Sydney Runkle: Pydantic Power-up: Performance Tips for Lightning-Fast Python Applications

Talks - Bradley Dice: Hacking `import` for speed: how we wrote a GPU accelerator for pandas

Talks - Jodie Burchell: Lies, damned lies and large language models