List of videos

Terraform apply secured by Open Policy Agent | Peter ONeill | Conf42 SRE 2022

Terraform has unprecedented control over the mission-critical infrastructure for our businesses and organizations. Think about the last time a misconfiguration went unnoticed for long enough to impact customers or cause an outage. Everyone should have a second set of eyes when deploying code that has the potential to create a negative impact. Let Open Policy Agent (OPA) be that second set of eyes. OPA is an open source general-purpose policy engine that is especially adept at working with configuration data like Terraform manifest files. Using OPA, we can write policies that will ensure that resources created by any team and any engineer are compliant with the organization’s rules and requirements. Implementing policy can be challenging, but it doesn’t have to be. OPA comes paired with a purpose-built dedicated policy language called Rego. This talk will show how to get started by deploying an OPA into your CI/CD pipeline and writing your first Rego policies to secure some of the primary AWS resources we use every day. Other talks at this conference 🚀🪐 https://www.conf42.com/sre2022 — 0:00 Intro 2:13 Talk

Watch
Hodor: Detecting and addressing overload | Bryan Barkley & Vivek Deshpande | Conf42 SRE 2022

When pushed hard enough any system will eventually suffer, and ultimately fail unless relief is provided in some form. At LinkedIn, we have developed a framework for our microservices to help with these issues: Hodor (Holistic Overload Detection & Overload Remediation). As the name suggests, it is designed to detect service overloads from multiple potential root causes, and to automatically improve the situation by dropping just enough traffic to allow the service to recover. Hodor then maintains an optimal traffic level to prevent the service from reentering overload. All of this is done without manual tuning or specifying thresholds. In this talk, we will introduce Hodor, provide an overview of the framework, describe how it detects overloads, and how requests are dropped to provide relief. Other talks at this conference 🚀🪐 https://www.conf42.com/sre2022 — 0:00 Intro 2:13 Talk

Watch
Building Openmetrics Exporter | Piyush Verma | Conf42 SRE 2022

Openmetrics-exporter - https://last9.io/openmetrics-exporter, or OME, is an Observability-as-Code framework that reduces the toil of finding-and-combining useful metrics from layers and hundreds of components involved in modern cloud-native systems. Every source, component, or metric is just a simple configuration file because the only “code” you should focus on is for your customers. It leverages plugin architecture to support data sources. It relies heavily on data frame processing to combine metrics from various metrics sources before they are all converted into Openmetrics format, ready to be piped out by a Prometheus. Traditionally, such correlation and post-processing have been the responsibility of additional Data Pipelines, but with OME, it’s as simple as writing a configuration file. At its core, OME uses Hashicorp Configuration Language (HCL) to build a DSL that can allow declarative input to build metric Pipelines. The talk is mainly about what you can solve using OME. But it also takes a concise journey of “behind-the-scenes” The need to build Openmetrics-exporter, picking a configuration language that was easily editable by humans, creating a DSL around it, and, more importantly, leveraging Golang for Data Science needs. Other talks at this conference 🚀🪐 https://www.conf42.com/sre2022 — 0:00 Intro 2:13 Talk

Watch
Exposing Log-Metrics To Prometheus With Best Practice |Samuel Arogbonlo | Conf42 SRE 2022

In this age of fast-growing advancement in cloud implementations, there is a great need to manage logs effectively. In some cases, you have to study the metrics and know what the system is about; it helps in understanding your system to take decisions, post-mortem analysis and several other interesting functions. First off, you should understand that Vector is a high-performance, end-to-end (agent & aggregator) observability data pipeline that puts you in control of your observability data. It directly orchestrates the operation of collecting, transforming and routing all your logs, metrics, and traces to any vendors you want today or tomorrow. Vector enables dramatic cost reduction, novel data enrichment, and data security where you need it, not where is most convenient for your vendors. Additionally, it is open source and up to 10x faster than every alternative in the space. Other talks at this conference 🚀🪐 https://www.conf42.com/sre2022 — 0:00 Intro 2:13 Talk

Watch
The State of DevOps and Observability in 2022 | Dotan Horovits | Conf42 SRE 2022

There are many opinions on DevOps, open source, and observability, but what is actually being practiced? What can we learn from the collective experience of the community? We went and surveyed over 1000 engineers across the globe about their DevOps practices, challenges, and more, with special focus on enterprise observability. This session will share data and insights from the survey, with key trends (compared to previous years’ DevOps Pulse surveys), points of interest, and challenges that developers experience on a daily basis. This session will help you learn from the collective experience and emerging best practices in the community, to help guide decisions on processes, tooling and architecture choices. The survey analyzes topics such as: - What are your challenges with running Kubernetes in production? - How long does it take to troubleshoot production issues? - Which tools do you use for ticketing, event correlation and notifications? - Who is responsible for ensuring observability? - How do enterprises handle shared services? And much more. Other talks at this conference 🚀🪐 https://www.conf42.com/sre2022 — 0:00 Intro 2:13 Talk

Watch
Smoke detectors in large scale production systems | Abhijeet Mishra | Conf42 SRE 2022

Static alerting thresholds no longer cut it for modern distributed systems. With production systems scaling rapidly, using static alerting to observe critical systems is a recipe for disaster. Observability tools have recognized this need and provide a way to ""magically"" catch deviations from normal system behavior instead. - What is that magic? What goes in to deciding whether a spike or a drop is violating a known ""good"" condition or not? - How do we avoid alert fatigue? - How do you factor in seasonality - low off peak hours and high holiday traffic? The rabbit hole goes deeper than I imagined. As a part of the core data science team at Last9, I ran into scenarios where my assumptions of building anomaly detection engines were shattered and rebuilt with every interaction with production traffic. In this talk, I will talk about: - What I learnt when trying to find answers to the above questions. - How known theoretical models map to real world workloads e.g. streaming services, high frequency trading applications etc. - The science that goes behind choosing and calculating the right SLOs for different SLIs and sending out early warnings and how to measure and improve leading and lagging indicators pertaining to system health. Other talks at this conference 🚀🪐 https://www.conf42.com/sre2022 — 0:00 Intro 2:13 Talk

Watch
Kubernetes monitoring - how to improve it | Aliaksandr Valialkin | Conf42 SRE 2022

The popularity of Kubernetes changed the way how people deploy and run the software. It also brought additional complexity of Kubernetes itself, microservice architecture, short release cycles - all these became a challenge for monitoring systems. The truth is, adoption and popularity of Kubernetes had severe impact on monitoring ecosystem, on its design and tradeoffs. The talk will cover what are monitoring challenges when operating Kubernetes, such as increased metrics volume, services ephemerality, pods churn, distributed tracing, etc. And how modern monitoring solutions are designed specifically to address these challenges and at what cost. Other talks at this conference 🚀🪐 https://www.conf42.com/sre2022 — 0:00 Intro 2:13 Talk

Watch
Adding OpenTelemetry to Production Apps: Lessons Learned | Dave McAllister | Conf42 SRE 2022

Observability is increasingly important in our modern apps/cloud-native world. However, when adding observability to existing production apps, there are a number of tradeoffs in approaches and in tools. Often, these tradeoffs are an exercise in confusion, leading to decision paralysis. We took on the challenge of adding observability to NGINX MARA, investigating choices, discovering and addressing challenges while keeping to open source solutions whenever possible. You'll come away with an understanding of how the three classes of data (Metrics, Traces, Logs) work together, why we chose the solutions we used and how we extended past the normal space into health checks, introspection and core dumps. Come learn from our experience in dealing with OpenTelemetry and related tools, from traces, metrics and logs, in working with production class apps and discover what approach finally worked for us. Other talks at this conference 🚀🪐 https://www.conf42.com/sre2022 — 0:00 Intro 2:13 Talk

Watch
One Woman Show of Migrating an Entire R&D SCM From Bitbucket to GitLab | Hila Fish | Conf42 SRE 2022

Writing code is something that we learned. Managing a project E2E - Probably not that much. In this talk, I’ll share my journey of migrating the entire R&D’s codebase from BitBucket to Gitlab on my own - But with the great help of people along the way - Planning, implementation, and handoffs. I’ll share best practices for managing a technical project with a lot of takeaways you could adopt so your project will be handled smoothly and successfully. Other talks at this conference 🚀🪐 https://www.conf42.com/sre2022 — 0:00 Intro 2:13 Talk

Watch