List of videos

Using incidents to level-up your teams | Lisa Karlin Curtis | Conf42 SRE 2022
Incidents are a great opportunity to gather both context and skill. They take people out of their day-to-day roles, and force ephemeral teams together to solve unexpected and challenging problems. The first part of the talk will walk through the different things you can learn from incidents, including: - Taking you to the edges of the systems your team owns, and beyond - incidents help broaden your understanding of the context in which you're building - Showing you how systems fail, so you can learn to identify and build software with good observability, and considerations of failure modes - Expanding your network inside your organisation, making connections with different people, who you can learn from and collaborate with We'll then talk about how to get the best value from the incidents which you do have as an individual, thinking about when is an appropriate time to ask questions, and how to get your own learnings without 'getting in the way'. Finally, we'll discuss how to make this part of the culture of an organisation: as part of the leadership team, what can you do to encourage this across your teams? Other talks at this conference 🚀🪐 https://www.conf42.com/sre2022 — 0:00 Intro 2:13 Talk
Watch
How Static Code Analysis Prevents You From Waking Up at 3AM | Xe Iaso | Conf42 SRE 2022
Computer programming is a powerful field. You can tell the computer to do just about anything you want as long as you can describe it. The real problem comes when your intentions and what the computer understands from them differ. This talk would cover ways that static analysis tooling can prevent bad code from being sent into production with a particular focus on Go because that is the language that the speaker is the most experienced with. Waking up at 3 AM because of an obviously wrong bit of code is hitting a weird failure case and is causing downstream issues is a uniquely frustrating issue enough that it deserves to be categorically eliminated as much as possible. Static code analysis is an important part of reliability that will make it easier to make reliable systems because code that can't be put into production can't fail at 3 AM while you are trying to sleep. Other talks at this conference 🚀🪐 https://www.conf42.com/sre2022 — 0:00 Intro 2:13 Talk
Watch
Postmortem Culture at Google | Ramon Medrano Llamas | Conf42 SRE 2022
Writing postmortems after incidents and outages is an essential part of Google's SRE culture. They are blameless, widely shared internally, and allow us as an organization to maximize the insights from failures. We touch on how postmortems are written and used at Google, as well as how they can help in making decisions and driving improved reliability. We also show how you can get started with your own lightweight postmortem process. Other talks at this conference 🚀🪐 https://www.conf42.com/sre2022 — 0:00 Intro 2:13 Talk
Watch
On Call Like a King- utilize Chaos Engineering to be a better engineer | Eran Levy | Conf42 SRE 2022
The evolution of cloud native technologies and the need to scale engineering, leading organizations to restructure their teams and embrace new architectural approaches. Being a cloud native engineer is fun! But also challenging. These days engineers aren’t just writing code and building packages but are expected to know how to write the relevant Kubernetes resource YAMLs, use HELM, containerize their app and ship it to a variety of environments. It isn't enough to know it at a high level. Being a cloud native engineer means that it’s not enough to just know the programming language you are working on well, but you should also keep adapting your knowledge and understanding of the cloud native technologies you are depending on. Engineers are now required to write services that are just one of many other services that usually solve a certain customer problem. In order to enhance engineers' cloud native knowledge and best practices to deal with production incidents, we started a series of workshops called: “On-Call like a king” which aims to enhance engineers knowledge while responding to production incidents. Every workshop is a set of chaos engineering experiments that simulate real production incidents and the engineers practice on investigating, resolving and finding the root cause. In this talk I will share how we got there, what we are doing and how it improves our engineering teams expertise. Other talks at this conference 🚀🪐 https://www.conf42.com/sre2022 — 0:00 Intro 2:13 Talk
Watch
Freedom of K8s requires Chaos Engineering to shine in production | Henrik Rexed | Conf42 SRE 2022
Like any other technology transformation, k8s adoption typically starts with small “pet projects”. One k8s cluster here, another one over there. If you don’t pay attention, you may end up like many organizations these days, something that spreads like wildfire: hundreds or thousands of k8s clusters, owned by different teams, spread across on-premises and in the cloud, some shared, some very isolated. When we start building application for k8s, we often lose sight of the larger picture on where it would be deployed and more over what the technical constraints of our targeted environment are. Sometimes, we even think that k8s is that magician that will make all our hardware constraints disappear. In reality, Kubernetes requires you to define quotas on nodes, namespaces, resource limits on our pods to make sure that your workload will be reliable. In case of heavy pressure, k8s will evict pods to remove pressure on your nodes, but eviction could have a significant impact on your end-users. How can we proactively test our settings and measure the impact of k8s events to our users? The simple answer to this question is chaos Engineering. During this presentation we will use real production stories to explain: - The various Kubernetes settings that we could implement to avoid major production outages. - How to Define the Chaos experiments that will help us to validate our settings - The importance of combining Load testing and Chaos engineering - The Observability pillars that we will help us validating our experiments Other talks at this conference 🚀🪐 https://www.conf42.com/sre2022 — 0:00 Intro 2:13 Talk
Watch
Unleashing Deploy Velocity with Feature Flags | Travis Gosselin | Conf42 SRE 2022
A lot of development teams have built out fully automated CI/CD pipelines to deliver code to production fast! Then you quickly discover that the new bottleneck in delivering features is their existence in long-lived feature branches and no true CI is actually happening. This problem compounds as you start spinning up microservices and building features across your multi-repo architecture and coordinating some ultra-fancy release schedule so it all deploys together. Feature flags provide you the mechanism to reclaim control of the release of your features and get back to short-lived branches with true CI. However, what your not told about feature flags in those simple ""if/else"" getting started demos is that there is an upfront cost to your development time, additional complexities and some pitfalls to be careful of as you begin expanding feature flag usage to the organization. If you know how to navigate these complexities you will start to unleash true velocity across your teams. In this talk, we'll get started with some of the feature flagging basics before quickly moving into some practical feature flagging examples that demonstrate its usage beyond the basic scenarios as we talk about UI, API, operations, migrations, and experimentation. We will explore some of the hard questions around ""architecting feature flags"" for your organization. Other talks at this conference 🚀🪐 https://www.conf42.com/sre2022 — 0:00 Intro 2:13 Talk
Watch
Open Testing: What if we open our tests like we open our source? | Andrew Knight | Conf42 SRE 2022
Testing is a vibrant discipline with well-established practices, but many times, nobody but the testers who write the tests ever see them. Tests could offer so much value if they were openly shared - with developers, product owners, and perhaps even end users. So, why don’t we open our tests like we open our source? There are so many parallel benefits: helping others learn, helping teams develop higher quality software, and helping users gain confidence in the products they use. Opening tests includes sharing the tools, frameworks, and even test cases themselves. In this talk, we will look at ways a team could be more open about testing in several ways: - Breaking down barriers between folks of different roles - Embracing living documentation with specification by example - Publicly releasing test reports - Sharing test tools and frameworks as open source projects - Building and sharing fully-generic test suites based on AI/ML to run against any app Not every team may be able to open up in all these ways, but any team could still benefit from the openings that shift-left practices can bring. Open Testing could be revolutionary. Let’s make it a reality! Other talks at this conference 🚀🪐 https://www.conf42.com/sre2022 — 0:00 Intro 2:13 Talk
Watch
Combat sports principles that apply to SRE | Paul Marsicovetere | Conf42 SRE 2022
As a Senior Cloud Infrastructure Engineer, I find that my life is vastly different from professional fighters and athletes competing in combat sports for many obvious reasons. However, there are some principles from the combat sports world that have an interesting application to professional life in Site Reliability Engineering (SRE). This talk will help demonstrate how these principles have helped me navigate through difficult situations in SRE as well as help any new engineers in SRE that are starting out. While there are not a lot of obvious overlaps on paper between being in combat sports and being in SRE, there is advice and guidance from those throwing punches that can help us knock-out certain SRE challenges. Other talks at this conference 🚀🪐 https://www.conf42.com/sre2022 — 0:00 Intro 2:13 Talk
Watch
Premiere - Conf42 Site Reliability Engineering (SRE) 2023
Schedule, Lineup & RSVP ➤ https://www.conf42.com/sre2023 Join Discord ➤ https://discord.gg/DnyHgrC7jC Upcoming CFPs ➤ https://www.papercall.io/events?cfps-scope=&keywords=conf42 0:00 Intro ➤ Sponsors & Partners keynote 0:50 Travis Rodgers ai 1:31 Clay Langston deep dive 1:54 Craig Risi 2:43 Jhonnatan Gil Chaves 3:16 Chandra Dixit tools 3:45 Prathamesh Sonpatki (wrong speaker card) 4:41 Safeer CM & Garima Bajpai culture 4:59 Yury Nino 5:28 Cam Beaudoin 5:49 Florian Hoeppner, Marco Torre, Alexander Schaper 6:18 Mandeep Ubhi 6:59 Eicardo Castro 7:52 Saurabh Bangad lessons learned 8:33 Emily Arnott 9:08 Trista Pan 9:51 Fabio Alves 10:50 Gonzalo Maldonado 11:26 Jayaganesh Kalyanasundaram 11:57 Ramon Medrano Llamas 12:30 Muhammad Jihad 13:03 Thank you, join Discord ➤ https://discord.gg/DnyHgrC7jC
Watch