List of videos

Norberto Leite - Identity at Scale; How Okta uses Postgres (PGConf.EU 2024)

At Okta we have a wide range of different state and persistence dedicated systems. These range from regular key-value stores like Redis, a large fleet of NoSQL databases of the likes of DynamoDB and MongoDB, and ever larger and more predominant usage of PostgreSQL for a wide majority of new and old use cases. This talk is all about the operational challenges of managing a fleet of 650+ Postgres database clusters, spread across two cloud providers, and in a variety of different use cases, with different operational profiles to each of them. We will uncover a range of operational challenges such as: * Database operational procedures: backups/restore, version management, data retention and observability for 650+ clusters * Sizing: when and how to scale up individual clusters * Same database, different use cases: full text search vs basic CRUD vs spike traffic management are handled from a database perspective and what issues/challenges we encounter dealing with non-predictable problems. * Same database engine, different deployments: tactics and deployment practices on how to handle multi and single tenant deployments, the performance and operational implications of such scenarios. * Database Topology: how our clusters are designed per client and resiliency needs in mind. What we would like the attendee to get from this session: * A good set of live production examples and lessons learned on managing a large postgres fleet on a variety of different constraints. * How different use cases, combined with single vs multi-tenant deployments can drive your database optimisation efforts * What and how to handle performance issues at scale due to unexpected load or cascading effects of "bad" releases. * How to build for failure and where databases take an important role on such an architecture. This is an operational focussed talk were we will focus on a lot lessons learned, where a natural comparison between different database engines will be used for highlighting the how and why we are using PostgreSQL and where we are heading in terms of broader adoption of the technology @okta-oauth-dep.

Watch

Julian Markwort - Comparing Connection Poolers for PostgreSQL (PGConf.EU 2024)

There is a growing number of connection poolers for PostgreSQL available, and there are few comparisons of their features and performance. In addition to the tried and true options of pgpool-II (released 2006) and pgbouncer (2007), there are new contenders: odyssey (2019), pgagroal (2019), pgcat (2022), and supavisor (2023). The talk will compare at least four of the mentioned connection poolers. Join me for an exploration of connection poolers for PostgreSQL: - basics of connection pooling - challenges of operating connection poolers - introduction and feature comparison of different poolers - performance evaluation of different poolers - general guidance on when and how to implement connection poolers

Watch

Naisila Puka - Supporting a New PostgreSQL Version in Your Extension... (PGConf.EU 2024)

Naisila Puka - Supporting a New PostgreSQL Version in Your Extension; A Citus Case Study (PGConf.EU 2024) In this talk, we will explore strategies for supporting a new PostgreSQL major version in PG extensions, using Citus as a case study. Citus is a distributed database extension for PostgreSQL. Generally, PG extensions allow users to add custom functionality, enhance database capabilities, and optimize performance. Because of this, extensions often have complex hooks into the PostgreSQL query planner and executor, requiring significant effort to remain updated rather than becoming a fork. We'll cover everything from reviewing release notes and identifying compatibility issues to making code adjustments for major query planner/executor changes and minor crashes due to changes in catalog tables. This talk provides a behind-the-scenes look at the challenges of keeping Citus up-to-date with Postgres 15 and 16 (and the progress on 17beta2). The lessons learned and strategies used will merge into practical tips for managing similar issues in other PostgreSQL extensions.

Watch

Jonathan S. Katz - Dissimilarity search... (PGConf.EU 2024)

Jonathan S. Katz - Dissimilarity search; implementing in-memory vector search algorithms to PostgreSQL (PGConf.EU 2024) Searching over vector spaces is a well-studied concept in mathematics and computer science, but the problem space has new meaning with the rise of AI/ML systems, particularly generative AI, that output very large vectors. Prior to generative AI, we predominately saw vector search in databases through geospatial, full-text, and bioinformatic/cheminformatic searches, all of which are supported in PostgreSQL either natively or through extensions. However, each of these domains have specific requirements that avoided some of the challenge we've seen searching over vectors from AI/ML systems, including using small (3/4-dim) vectors, using vectors as a filter instead of finding order (where "K-nearest neighbor" (K=NN) requires ordering), or allowing for high latency searches. Modern approximate nearest neighbor search (ANN) has evolved over the past 25 years, and has produced a variety of algorithms that propose efficient search methods over vector spaces that can return related results with a high degree of relevancy (measured as "recall"). Some of these algorithms include IVF ("inverted file"), HNSW ("hierarchical navigable small worlds"), LSH ("locality-sensitive hashing"), and others. However, most ANN algorithmic design focuses on the efficiency of how the index behaves in memory -- some of these performance and search relevancy benefits may not carry over to databases due to considerations like on-disk layout, ongoing updates, available system memory, and resource constraints due to other workloads on the same system. In this talk, we'll take a deep look at a variety of ANN algorithms that can be implemented in PostgreSQL, and what considerations we need to make to ensure they're implemented efficiently. We'll explore the pgvector implementations of the IVFFlat and HNSW algorithms and design decisions that allow for efficient indexing and search relevancy while achieving a higher level of recall, including index data structures, organization of data pages, effective update management, and query costing. We'll also explore PostgreSQL-specific details that impact these implementations, including lock management, background workers for index build parallelism, leveraging expressions to extend index implementations, and more. We'll also explain the tradeoffs between in-memory and on-disk representations of a variety of ANN algorithms. At the end of this talk, you'll understand the considerations you need to make when implementing an ANN search algorithm for PostgreSQL, which can also be extended to other databases systems.

Watch

Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installation (PGConf.EU 2024)

Even an experienced PostgreSQL DBA can not always say that upgrading between major versions of Postgres is an easy task, especially if there are some special requirements, such as downtime limitations or if something goes wrong. For less experienced DBAs anything more complex than dump/restore can be frustrating. In this talk I will describe why we need a special procedure to upgrade between major versions, how that can be achieved and what sort of problems can occur. I will review all possible ways to upgrade your cluster from classical pg_upgrade to old-school slony or modern methods like logical replication. For all approaches, I will give a brief explanation how it works (limited by the scope of this talk of course), examples how to perform upgrade and some advice on potentially problematic steps. Besides I will touch upon such topics as integration of upgrade tools and procedures with other software — connection brokers, operating system package managers, automation tools, etc. This talk would not be complete if I do not cover cases when something goes wrong and how to deal with such cases.

Watch

Grant Fritchey - Leveraging AI as a PostgreSQL DBA (PGConf.EU 2024)

Artificial Intelligence (AI) is all the rage these days. You can find all sorts of information about how to build your own AI, consume your PostgreSQL data and more. However, what if you just want to get a little better at your job? Can AI do anything for those of us who are just trying to muddle through? The short answer is, yes. The longer answer is what this session is all about. You can take advantage of what AI can do for you in simple tasks such as generating meaningful test data, simple query tuning and more. Come learn how you can put AI to work in your day-to-day tasks, making things just a little bit easier. You can use AI as a way to learn PostgreSQL better and as a way to get some tasks done faster and more accurately. AI is out there, so you may as well make use of it. Come to this session to learn how.

Watch

Yugo Nagata - pg_ivm - Extensions for Rapid Materialized View Update (PGConf.EU 2024)

Materialized views are database objects that store the results of a query. It enables rapid query responses, but to keep the state up to date after an underlying table is modified, materialized views require to be refreshed. Currently, the only way PostgreSQL provides is REFRESH MATERIALIZED VIEW command, but it could take a long time since the whole view is rebuilt by executing the query again pg_ivm is an extension module that provides the way to update the materialized view rapidly using a technique called Incremental View Maintenance (IVM), which computes and applies only the incremental changes to the materialized view without rebuilding the whole view. This feature has been also proposed for PostgreSQL core, and pg_ivm is the extension module version of this. In this talk, I will provide a short introduction of pg_ivm, the idea and design behind it, how it works, and performance. I will also show the current status of our project including recent and ongoing works, for example, outer-join support..

Watch

Thomas Munro, Nazir Bilal Yavuz - Streaming I/O and vectored I/O (PGConf.EU 2024)

Traditionally, PostgreSQL did all relation I/O as sequential 8KB pread() or pwrite() system calls. This talk is about a new "streaming" programming model that allows multiple blocks to be read or written at the same time, and eventually in the background ahead of time. We will talk about several things that cover new features in PostgreSQL 17 and ongoing work in PostgreSQL 18: * overview of system file I/O interfaces * storage manager changes for multi-buffer operations * buffer manager changes for multi-buffer operations * the stream abstraction * "synchronous" streaming I/O * partial steps to asynchronous I/O * use cases (heap/index scans, vacuum, recovery, ...) We will also talk more briefly about some aspects of the next stages of the larger AIO project: * overview of kernel buffering, pros, cons and motivations for escaping from it * why asynchronous I/O and direct I/O go together * a peek at WIP asynchronous I/O system * asynchronous streaming I/O

Watch

Rafael Thofehrn Castro - Debugging active queries... (PGConf.EU 2024)

Rafael Thofehrn Castro - Debugging active queries with mid-flight instrumented explain plans (PGConf.EU 2024) Have you ever wondered what an active query is doing under the hood? Instrumented explain plans obtained with EXPLAIN ANALYZE are very powerful when investigating bottlenecks in a query, with one small limitation: a query needs to finish in order for the plan with statistics to become visible. I recently wrote a patch for the already existing in-progress patch "Logging plan of the running query" to also log instrumentation details collected up until the time the plan was logged. This includes rows collected and time spent so far in each plan node. In this talk I will present an extended/experimental version of that patch where active queries with an enabled flag print the instrumented execution plan to a catalog table in a regular interval and demonstrate how this can help troubleshoot queries that never finish.

Watch