List of videos

Monitorama PDX 2023 - If you look like a problem, developers will solve you

Ivan Merrill's session from Monitorama PDX 2023. Adding observability and monitoring to services is often unfortunately seen as a 'necessary' task before going live and mandated by organisations. As a result a culture is created where this important task is seen as something that needs to be done to get a tick in a box, a problem to be solved, as opposed to a really valuable tool for developers. If the path to implementing your chosen toolset and meeting the mandated requirements is too much of a problem it's likely teams will look to solve this problem another way without you. This talk explores how we can change this mindset, and work to create a culture where the value of implementing and using monitoring is understood by developers and product management. I will share what I've found has worked in previous roles in large financial organisations and try to provide actionable advice. The path to better monitoring and greater levels of observability doesn't involve changing tools but is in fact one of human interactions.

Watch
Monitorama PDX 2023 - Meet Zeek, the extensible, scriptable network monitor

Christian Kreibich's session from Monitorama PDX 2023. Network monitoring is key for understanding your infrastructure, whether that's your home network or a thousand-seat corporate environment. Using its domain-specific scripting language, the Zeek network monitor helps you turn the packets in your network into streams of actionable logs, organized around protocols and themes that matter to you. Zeek is a mature, battle-hardened platform and ecosystem that runs on anything from Raspberry Pi's to to industrial-scale deployments, such as Microsoft Defender I am the technical lead for the Zeek project, and in this talk I'll give an overview of Zeek, its architecture and capabilities, and the goals of the project.

Watch
Monitorama PDX 2023 - Know your data: The stats behind your alerts

Dave McAllister's session from Monitorama PDX 2023. Quick, what's the difference between the mean, the mode and the median? And which mean do you mean?Do you need a Gaussian or a normal distribution? And does your choice impact the alerts and observations you get from your observability tools? Come get refreshed on the impact some basic choices in statistical behavior can have on what gets triggered. Learn why a median might be the choice for historical anomaly or sudden change. Jump into Gaussian distributions, data alignment challenges and the trouble with sampling. Walk out with a deeper understanding of your metrics and what they might be telling you.

Watch
Monitorama PDX 2024 - Logs Are Good, Actually

Alex Hidalgo's session from Monitorama PDX 2024. The monitoring and observability space has moved at an extremely rapid pace over the last few years. Part of this is due to legitimate technological improvements in terms of standards, tooling, and advanced vendor solutions. But a large part of why it feels like the space is moving at such a frenetic pace is due to marketing departments and talking heads just like me. While it might feel like there is intense pressure to adopt all of the newest and most advanced concepts that fall into the category of "monitoring" or "observability", I'd like to make an argument for the continued importance of our oldest, and perhaps most important, source of telemetry: the humble log line.

Watch
Monitorama PDX 2024 - The Ticking Timebomb of Observability Expectations

David Caudill's session from Monitorama PDX 2024. It’s very easy to convince Engineers and Managers to “monitor everything” — who doesn’t want as much information as they can possibly have about what’s happening in their system? At surface level, this sounds like a great plan. This has become the dominant approach by engineering teams: simply install an agent, sidecar, or SDK, and everything � will be monitored for you. Want to know how your Kubernetes cluster is doing? Here’s 10k “turnkey” metrics! The numbers become gigantic as architectures continue to fragment from monoliths to rocks(SOA) to pebbles (microservices) to…a gaseous cloud of lambdas? Doesn’t matter, add this line and ship dozens of metrics from every single lambda execution. Ship it all, monitor everything, and sort it out later. After all, we can’t possibly know what the cause of an incident in the future might be! Every second of outage costs us money! For every single interaction we capture scientific levels of data, constantly vigilant, expecting at any moment we might need to comb through it to understand a complex outage. The trouble is, this is extraordinarily expensive computationally, cognitively, and financially. The financial and computational cost of this has been subsidized by VC investment in the past, which were in turn subsidized by the historically low interest rates of the 2010s. As you’ve probably noticed, that party is over. The cognitive costs are still subsidized by simply putting on the confident “Serious Senior Engineer” face and pretending we know what all this stuff means. In this talk, we'll cover a little bit of how we got here, the cognitive biases that keep us here, and some specific guidance on better ways to approach these problems in a cost effective way.

Watch
Monitorama PDX 2024 - The Hater's Guide To OpenTelemetry

Austin Parker's session from Monitorama PDX 2024. "If I have to watch one more talk that references xkcd 927, I'll drop our logging db myself." "The only thing I need to trace is my path out of this building at five o'clock each day." Hacker News comments or your personal thoughts on everyone's favorite observability project, OpenTelemetry? The answer may surprise you! In this talk, levity will be enforced and the takes will be piping hot, as you learn about the many ways that OpenTelemetry has been completely abused by the commercial 'observability community', why it didn't have to be this way, and how in spite of the best efforts of many millions of dollars in marketing pablum it's still a pretty good project.

Watch
Monitorama PDX 2024 - Things I wish I knew before we decided to migrate our metrics infrastructure..

Suman Karumuri's session from Monitorama PDX 2024. In this talk, I will delve into the key considerations and valuable lessons learned from transitioning some use cases from a proprietary metrics system to an in-house metrics platform utilizing open-source components. The allure of open-source systems—characterized by their transparency, adaptability, community support, and cost-efficiency—prompted this significant shift. Although theoretically straightforward, the practical implementation of this migration proved to be immensely complex. We initially underestimated the multifaceted nature of the transition, which entailed several simultaneous changes: migrating from StatsD to OTel, mastering the operation of our own metrics store, transitioning from a vendor-specific UI to Grafana, and adopting a new query language, among other significant changes. Beyond the technical hurdles, our team faced cultural and operational challenges. Running a large-scale metrics store was not within our initial expertise, so we had to quickly acquire the necessary operational knowledge. Open source solutions, though powerful, often required extensive tuning to ensure reliability. As our metrics workload grew, we adopted a multi-cluster strategy, which, while scaling our operations, introduced complexities for our developers. To address this, we implemented an additional layer of abstraction, providing multiple clusters as a single cluster to our customers, enhancing usability. Moreover, we encountered challenges related to aggregation, deploying newer open source components, resolving circular dependencies in our in-house Kubernetes and service mesh infrastructure, migrating dashboards and alerts, and ensuring the correctness of tens of thousands of dashboards and hundreds of thousands of alerts. Additionally, it was a significant effort to retrain our engineers on a new query language, adapt to a new UI, and integrate with a new alerting infrastructure, all of which added complexity to our migration journey. Attendees of this talk will gain a deep understanding of the intricacies involved in such a migration, enabling them to better navigate their own journeys when faced with similar challenges.

Watch
Monitorama PDX 2024 - The Observability Data Lake, 1 year on

David Gildeh's session from Monitorama PDX 2024. Last year, we spoke about our vision for the Observability data lake in my talk "How to Scale Observability without Bankrupting the Company". This year we have the system running in production with all of our trace data. This update will discuss some of our learnings, challenges and results putting all of our trace data into the system for Netflix. By June we'll have a lot more to share, so keeping the abstract high level now, we're still considering open-sourcing this so may even use Monitorama to announce that, but no promises as the cost of running an open-source project is quite high so maybe later in the year if June is too soon.

Watch
Monitorama PDX 2024 - The subtle art of misleading with Statistics

Dave McAllister's session from Monitorama PDX 2024. "Lies, damned lies and statistics." While true, only statistics allows you to lie to yourself. Let's explore how statistics can sometimes trick us into believing something that's not true. This isn't always done on purpose; often, we mislead ourselves without realizing it. We'll look at how focusing too much on recent events, choosing specific data to look at, and making assumptions about the size of a group can lead us to the wrong conclusions. We'll dissect common practices like the misuse of graphical representations, the confusion between correlation and causation, and the manipulation of scale and averages. These practices, often overlooked or misunderstood, can result in false indicators, misleading correlations, and distracting information. Through real-world examples, we demonstrate how these statistical pitfalls can shape narratives, influence decisions, and impact public opinion. This presentation aims to teach you how to look at statistics more critically, understand their limits, and avoid fooling yourself with numbers.

Watch