Monitorama PDX 2024 - Disintegrated telemetry: The pains of monitoring asynchronous workflows

Conference: Monitorama PDX 2024

Year: 2024

Johannes Tax's session from Monitorama PDX 2024. Many tools and best practices around instrumentation and observability are tailored to synchronous request/response workflows, HTTP and RESTful APIs being the most prominent examples. However, if you have to instrument and monitor a system that relies on asynchronous communication based on events or messages, you'll soon find out that established concepts and practices don't work so well. Observing loosely coupled processing steps often leads to disintegrated telemetry, which makes it hard to derive actionable insights. In this talk, I focus on the challenge of correlating the disintegrated telemetry pieces (metrics and traces) that are emitted during the lifetime of a message or an event. I describe the problem and present possible solution approaches. I show how each solution approach is broken in its own way, and provide insights that help you to choose the least broken solution for your scenario. Finally, to show some light at the end of the tunnel, I give an overview of standardization efforts in this space, including W3C context propagation drafts for messaging protocols, and the messaging semantic conventions created by the OpenTelemetry messaging workgroup, which I'm leading.