List of videos

Games We Play to Improve on Incident Response | Austin King | Conf42 SRE 2021

Austin King - Founder @ OpsDrill.com Incident Response is a core competency for SRE teams, but how can teams practice and improve? Inefficient incident response can be costly to a company. It causes lost revenue and destroys customer trust. We will discuss mainstream games, a conceptual frameworks for creating team specific drills, and finally introduce an innovative research topic - outage simulation. An outage simulator gives SRE teams a tool for practicing incident response. Core incident response skills are: severity triage, communication, delegation, and system familiarity. Drilling on these increases these skills, knowledge, efficiency, team cohesion and resilience. SRE Managers and ICs will learn why games such as “Keep Talking and Nobody Explodes” are played by many SRE teams. They will learn in detail how to create their own fire-drills from existing runbook entries. An overview of chaos testing and gamedays will be mentioned to provide the broader context. We will cover the pros and cons of each of these methods. Lastly, we will present the concept of an incident response simulator as an open research topic. The hypothesis will be that an incident simulator is a good trade-off between non-domain specific games and full-blown Gamedays. This is a talk illustrated with slides. Other talks at this conference 🚀🪐 https://www.conf42.com​/sre2021 — 0:00 Intro 1:16 Talk

Watch
Fixing Broken Windows | Dmitry Vinnik | Conf42 SRE 2021

Dmitry Vinnik - Developer Advocate @ Facebook We all encountered a “Broken Window” theory in practice. The original idea was that if someone breaks a window in a neighbourhood and this window is not repaired right away, the entire area will start getting messier at an accelerated rate. The same theory is also true for Software Development. How many times have you looked at a legacy system with no code coverage, and decided not to write any tests because “”this is how we do things here”“? These bad practices behave just like those “”Broken Windows.”” They cause our code to degrade and become unusable. In this talk, we discuss how to break away from bad development practices and how to address major gaps in your legacy and current systems. We look at ways to successfully lead-by-example and to introduce refactoring culture into your team and organization. We cover tips and tricks that help to improve the development culture and to emphasize the general health of the codebase. Other talks at this conference 🚀🪐 https://www.conf42.com​/sre2021 — 0:00 Intro 1:16 Talk

Watch
Improve Your Automation to Reduce Toil | Mandi Walls | Conf42 SRE 2021

Mandi Walls - DevOps Advocate @ PagerDuty In the course of your day as an SRE, your knowledge and expertise are in high demand. You can’t do every task every person in your org needs from you without the help of comprehensive automation. Automation can be tricky. Some systems aren’t built with automation in mind, but assume that a human being will be there to keep an eye on things and fix errors on the fly, and we can’t be everywhere when there’s too much to do. Plus, you want to provide access to automation for the right folks and keep a record of when the tools were used. In this talk, we’ll cover some things to keep in mind when you’re building out your automation library, characteristics of good automation, and give you a look at PagerDuty Rundeck, a platform that will help you share your expertise with other folks in your organization. Build automation that works for you and gives you your time back! Other talks at this conference 🚀🪐 https://www.conf42.com​/sre2021 — 0:00 Intro 1:16 Talk

Watch
Building Near Real-time Analytics Solution on AWS | Shubhankar Sumar | Conf42 SRE 2021

Shubhankar Sumar - Senior Solutions Architect @ AWS To create value, companies must derive real-time insights from a variety of data sources that are producing data at high velocity and volume enabling faster react in real time to events affecting business. The need for analysing heterogeneous data from multiple sources (internal/external) more than ever. Thereby making the analytic landscape ever evolving with numerous technologies and tools and making the platform more and more complex. Therefore, building a futuristic analytic solution is not only time consuming but costly involving selection of right stack, acquiring talent and ongoing platform management and monitoring. In this session, we’ll discuss and demo on how you can leverage AWS stack to create a near real-time analytics solution with minimum to no coding for an e-commerce website while an option to integrate with pre-existing data sources. The solution offers the following advantages: - easy to build - elastic and fully managed - high available and durable - seamless integration with AWS Services - pay for what you use Other talks at this conference 🚀🪐 https://www.conf42.com​/sre2021 — 0:00 Intro 1:16 Talk

Watch
Chasing the Grail | Dmitry Chuyko | Conf42 SRE 2021

Dmitry Chuyko - Senior Performance Engineer @ BellSoft JDK 16 features full musl support but doesn’t include AOT and Graal JIT. Is it all gone? Can you still write your own JVMCI compiler? The GraalVM licensing model is changing, and alternatives appear fast. In his talk, Dmitry Chuyko shows what you will achieve with native image. He’ll look into the practices of building tiny and performant microservice containers using Graal and the associated tools. The future is now. Don’t miss it! Other talks at this conference 🚀🪐 https://www.conf42.com​/sre2021 — 0:00 Intro 1:16 Talk

Watch
Engineering Reliable Mobile Applications | Pranjal Deo | Conf42 SRE 2021

Pranjal Deo - Engineering Program Manager @ Google Why Mobile and SRE (Site Reliability Engineering)? - Mobile is nonuniform and uncontrollable - SRE responsibilities differ in critical ways from infrastructure or server-side application reliability engineering - Focus on where your end users are (i.e. today many users access services through mobile applications) - Specific challenges of mobile * Monitoring * Release management * Incident management Case studies - Doodle outage, etc. Future of SRE for Mobile - Visibility into mobile application performance - Find and react to issues before the user does - Measure SLIs throughout the product stack (from client to service) Other talks at this conference 🚀🪐 https://www.conf42.com​/sre2021 — 0:00 Intro 1:16 Talk

Watch
Let the machines optimize the machines | Stefano Doni | Conf42: SRE 2021

Stefano Doni - CTO @ Akamas SREs’ main goal is to achieve optimal application performance, efficiency and availability. A crucial role is played by configurations (e.g. JVM and DBMS settings, container CPU and memory, etc): wrong settings can cause poor performance and incidents. But tuning configurations is a manual and lengthy task, as there are 100s of settings in the stack all interacting in counterintuitive ways. In this talk, we present a new approach that leverages machine learning to find optimal configurations of the tech stack. The optimization process is automated and driven by performance goals and constraints that SREs can define (e.g. minimize resource footprint while matching latency and throughput SLOs). We show examples of optimizing Kubernetes microservices for cost efficiency and latency tuning container sizing and JVM options. With the help of ML, SREs can achieve higher application performance, in days instead of months, and have a lot of fun in the process! Other talks at this conference 🚀🪐 https://www.conf42.com​/sre2021 — 0:00 Intro 1:16 Talk

Watch
Pitfalls of Infrastructure as Code (And how to avoid them!) | Tim Davis | Conf42 SRE 2021

Tim Davis - DevOps Advocate @ env0 Are you looking to start your journey into Infrastructure as Code? Or have you already jumped in head-first? Either way, this session is for you! We’ll talk about many of the common pitfalls of IaC, and how you can avoid them. We’ll go over: * What IaC is * Types of pitfalls you may have * Infrastructure pitfalls * Coding pitfalls * Basic mitigation strategy for each pitfall We’ll go over all kinds of things that you may or may not have even thought of yet. Get your questions ready, because I’m here to help you be successful in your IaC journey! Other talks at this conference 🚀🪐 https://www.conf42.com​/sre2021 — 0:00 Intro 1:16 Talk

Watch
Enterprise SRE adoption framework | Vishnu Vardhan Chikoti | Conf42 SRE 2021

Vishnu Vardhan Chikoti - Senior SRE Manager @ Fanatics This talk is about a new enterprise SRE adoption framework, named Arctic. Given the growing focus on infrastructure and service/application reliability, more and more enterprises are adopting Site Reliability Engineering (SRE). It will be beneficial for enterprises to use a framework for SRE adoption like Scrum, XP or Kanban that exists for Agile adoption. Without the availability of framework(s) to help in adoption, it will be challenging for enterprises as they need to spend a lot of effort upfront to understand how to go about the SRE adoption and do the planning before they begin the actual journey. This talk includes the following things. - The two pillars of the framework - Other frameworks/concepts that can go hand-in-hand with this - What to look for when hiring SREs - both in terms of personality types and skill sets - A way to do the goal setting for the transformation. It is to be noted that as on the date of submission for this talk, this framework has not been used in any enterprise and has been conceptualised very recently. The hope is to seed the thought around frameworks for SRE adoption, present the current version of this framework to larger SRE community, gather feedback and start the usage of this framework by enterprises. This talk suits various audience - who have already started their SRE journeuy, those who are looking to start on it and even those who are still exploring to understand more about SRE. What is the problem that I am trying to address? Currently, there is no standard framework for SRE adoption similar to the frameworks like Scrum, XP, Kanban, etc that exist for Agile Adoption. Having a standardised framework will eliminate quite a bit of upfront effort thinking about “how to adopt SRE” at enterprises. Other talks at this conference 🚀🪐 https://www.conf42.com​/sre2021 — 0:00 Intro 1:16 Talk

Watch