List of videos

How I used pgvector and PostgreSQL® to find pictures of me at a party — Tibs

[EuroPython 2024 — South Hall 2B on 2024-07-12] How I used pgvector and PostgreSQL® to find pictures of me at a party by Tibs https://ep2024.europython.eu/session/how-i-used-pgvector-and-postgresql-r-to-find-pictures-of-me-at-a-party Nowadays, if you attend an event you're bound to end up with a catalogue of photographs to look at. Formal events are likely to have a professional photographer, and modern smartphones mean that it's easy to make a photographic record of just about any gathering. It can be fun to look through the pictures, to find yourself or your friends and family, but it can also be tedious. At our company get-together earlier in the year, the photographers did indeed take a lot of pictures. Afterwards the best of them were put up on our internal network - and like many people, I combed through them looking for those in which I appeared (yes, for vanity, but also with some amusement). In this talk, I'll explain how to automate finding the photographs I'm in (or at least, mostly so). I'll walk through Python code that extracts faces using OpenCV, calculates vector embeddings using imgbeddings and OpenAI, and stores them in PostgreSQL® using pgvector. Given all of that, I can then make an SQL query to find which pictures I'm in. Python is a good fit for data pipelines like this, as it has good bindings to machine learning packages, and excellent support for talking to PostgreSQL. You may be wondering why that sequence ends with PostgreSQL (and SQL) rather than something more machine learning specific. I'll talk about that as well, and in particular about how PostgreSQL allows us to cope when the amount of data gets too large to be handled locally, and how useful it is to be able to relate the similarity calculations to other columns in the database - in our case, perhaps including the image metadata. --- This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License: https://creativecommons.org/licenses/by-nc-sa/4.0/

Watch

Fundamentals of Retrieval Augmented Generation — Catalin Hanga

[EuroPython 2024 — Terrace 2A on 2024-07-12] Fundamentals of Retrieval Augmented Generation by Catalin Hanga https://ep2024.europython.eu/session/fundamentals-of-retrieval-augmented-generation Retrieval Augmented Generation (RAG) has emerged in recent years as a popular technique at the crossroads of Information Retrieval and Natural Language Generation. It represents a promising new approach that combines the strengths of both retrieval-based systems and generative AI models, aiming to address the limitations of each, while enhancing their overall performance on document intelligence tasks. This talk will introduce the key frameworks, methodologies and advancements in RAG, exploring its ability to empower Large Language Models with a deeper comprehension of context, by leveraging pre-existing knowledge from external corpora. We will review the theoretical foundations, practical applications, and technical challenges associated with RAG, showcasing its potential to impact various fields, such as document summarization or database management. Through this talk, attendees will gain insights into the most relevant topics related to RAG, including token embedding, vector indexing and semantic similarity search. --- This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License: https://creativecommons.org/licenses/by-nc-sa/4.0/

Watch

Representation is King: The Journey to Quality Dialog Embeddings — Adam Zíka

[EuroPython 2024 — Terrace 2A on 2024-07-12] Representation is King: The Journey to Quality Dialog Embeddings by Adam Zíka https://ep2024.europython.eu/session/representation-is-king-the-journey-to-quality-dialog-embeddings In natural language processing, embeddings are crucial for understanding textual data. In this talk, we’ll explore sentence embeddings and their application in dialog systems. We'll focus on a use case involving the classification of dialogs. We'll demonstrate the necessity of sentence transformers for this problem, specifically utilizing one of the top-performing small-sized sentence transformers. We will show how to fine-tune this model with both labeled and unlabeled dialog data, using the SentenceTransformers Python framework. This talk is practical, packed with easy-to-follow examples, and aimed at building intuition around this topic. While some basic knowledge of Transformers would be beneficial, it is not required. Newcomers are also welcome. --- This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License: https://creativecommons.org/licenses/by-nc-sa/4.0/

Watch

Which LLM said that? - watermarking generated text — Adam Kaczmarek

[EuroPython 2024 — Terrace 2A on 2024-07-12] Which LLM said that? - watermarking generated text by Adam Kaczmarek https://ep2024.europython.eu/session/which-llm-said-that-watermarking-generated-text With the emergence of large generative language models there comes a problem of assigning the authorship of the AI-generated texts to its original source. This raises many concerns regarding eg. social engineering, fake news generation and cheating in many educational assignments. While there are several black-box methods for detecting if text was written by human or LLM they have significant issues. I will discuss how by watermarking you can equip your LLM with a mechanism that undetectable to human eye can give you the means of verifying if it was the true source of a generated text. --- This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License: https://creativecommons.org/licenses/by-nc-sa/4.0/

Watch

Those annotations can have things other than typing?! — Mattijs Ugen

[EuroPython 2024 — Terrace 2A on 2024-07-12] Those annotations can have things other than typing?! by Mattijs Ugen https://ep2024.europython.eu/session/those-annotations-can-have-things-other-than-typing Annotating functions with typing information is commonplace nowadays. Annotations have become synonymous with typing information, even though they could be just about anything you’d want. Are there use cases for function annotations other than typing? Is that useful? Should you care? Should you stop using typing? --- This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License: https://creativecommons.org/licenses/by-nc-sa/4.0/

Watch

MLtraq: Track your ML/AI experiments at hyperspeed — Michele Dallachiesa

[EuroPython 2024 — Terrace 2A on 2024-07-12] MLtraq: Track your ML/AI experiments at hyperspeed by Michele Dallachiesa https://ep2024.europython.eu/session/mltraq-track-your-ml-ai-experiments-at-hyperspeed Every second spent waiting for initializations and obscure delays hindering high-frequency logging, further limited by what you can track, an experiment dies. Wouldn’t loading and starting tracking in nearly zero time be nice? What if we could track more and faster, even handling arbitrarily large, complex Python objects with ease? In this talk, I will present the results of comparative benchmarks covering Weights & Biases, MLflow, FastTrackML, Neptune, Aim, Comet, and MLtraq. You will learn their strengths and weaknesses, what makes them slow and fast, and what sets MLtraq apart, making it 100x faster and capable of handling tens of thousands of experiments. This presentation will not only be enlightening for those involved in AI/ML experimentation but will also be invaluable for anyone interested in the efficient and safe serialization of Python objects. --- This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License: https://creativecommons.org/licenses/by-nc-sa/4.0/

Watch

Exploring Apache Iceberg: A Modern Data Lake Stack — Gowthami Bhogireddy

[EuroPython 2024 — Terrace 2A on 2024-07-12] Exploring Apache Iceberg: A Modern Data Lake Stack by Gowthami Bhogireddy https://ep2024.europython.eu/session/exploring-apache-iceberg-a-modern-data-lake-stack Bloomberg is a leading provider of financial data, with information spanning multiple decades. Handling and organizing these huge datasets can be challenging, with typical concerns including sluggish query performance, high storage costs, and data consistency problems. This talk will describe how Apache Iceberg is revolutionizing big data management, offering ACID transactions, time travel, and seamless schema evolution that enable lightning-fast query performance and robust data consistency for even our largest workloads. The session will introduce Apache Iceberg, an open-source table format that enables incremental updates, versioning, and schema evolution. The discussion will focus on how these features address common big data management challenges, improve query performance, and reduce storage costs. Finally, the session will outline how our Enterprise Data Lake Applications engineering team has harnessed the capabilities of Apache Iceberg (especially PyIceberg) to revolutionize our data management and analytical processing workflows. Attendees will be able to apply the best practices discussed in the talk to build better infrastructure for their growing data demands and spur innovation within their organization. --- This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License: https://creativecommons.org/licenses/by-nc-sa/4.0/

Watch

PEP 683: Immortal Objects - A new approach for memory managing — Vinícius Gubiani Ferreira

[EuroPython 2024 — Terrace 2B on 2024-07-12] PEP 683: Immortal Objects - A new approach for memory managing by Vinícius Gubiani Ferreira https://ep2024.europython.eu/session/pep-683-immortal-objects-a-new-approach-for-memory-managing For most people that use Python, worrying about memory is not an issue. But that's not the case when you have to handle a lot of requests on a large scale. So how do you reduce memory consumption without affecting the CPU? In this presentation I'll discuss about memory management in Python from the basics, where the necessity for PEP 683 came from, and the changes introduced by it. I also intend to discuss why this PEP is so important for the language, and what we'll be able to achieve with it in the future, such as changes to the GIL and true parallelism. The talk is targeted for folks who are intermediate/advanced pythonistas. People who are just starting with Python (maybe less than 1.5 years) may feel a bit lost. Even so, curious learners are more than welcome to join, and I'll try my best to make it easy for all audiences on this advanced topic. After this presentation, participants will learn a bit more about how memory management works under the hood in python, and how it may change in the next couple of years. --- This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License: https://creativecommons.org/licenses/by-nc-sa/4.0/

Watch

You are sharing your code wrong (and what to do about it) — Jeremiah Paige

[EuroPython 2024 — Terrace 2B on 2024-07-12] You are sharing your code wrong (and what to do about it) by Jeremiah Paige https://ep2024.europython.eu/session/you-are-sharing-your-code-wrong-and-what-to-do-about-it Everyone who writes also distributes Python code. The only reliable way to share Python code is by packaging it, any other way hurts your consumers. Packaging can be an intimidating topic most would rather avoid but following just a few best practices of packaging can make your code much easier to share, even without going through the process of uploading to pypi.org. --- This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License: https://creativecommons.org/licenses/by-nc-sa/4.0/

Watch