List of videos

Measuring Reliability in Production | Ramon Medrano Llamas | Conf42 SRE 2023

Read the abstract ➤ https://www.conf42.com/Site_Reliability_Engineering_2023_Ramon_Medrano_Llamas_reliability_production Other sessions at this event ➤ https://www.conf42.com/sre2023 Join Discord ➤ https://discord.gg/DnyHgrC7jC Chapters 0:00 Intro 0:29 Talk

Watch
Starting the SLOs Implementation | Muhammad Jihad | Conf42 SRE 2023

Read the abstract ➤ https://www.conf42.com/Site_Reliability_Engineering_2023_Muhammad_Jihad_starting_slos_implementation Other sessions at this event ➤ https://www.conf42.com/sre2023 Join Discord ➤ https://discord.gg/DnyHgrC7jC Chapters 0:00 Intro 0:29 Talk

Watch
Premiere - Conf42 Site Reliability Engineering (SRE) 2024

Support our mission ➤ https://www.conf42.com/support Schedule, Lineup & RSVP ➤ https://www.conf42.com/sre2024 Join Discord ➤ https://discord.gg/DnyHgrC7jC Upcoming CFPs ➤ https://www.papercall.io/events?cfps-scope=&keywords=conf42 0:00 Intro ai 0:53 Michele Dodic & Anastasia Archangelskaya - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Michele_Dodic_Anastasia_Archangelskaya_journey_nextgen_aio 1:30 Asutosh Mourya - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Asutosh_Mourya_futureproofing_integrating_resilience 1:53 Indika Wimalasuriya - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Indika_Wimalasuriya_amplifying_genai_reliability chaos 2:30 Peter De Tender - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Peter_De_Tender_stresstesting_azure_chaos 3:15 Hareesh Iyer - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Hareesh_Iyer_chaos_engineering_practical cloud 3:44 Ederson Brilhante - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Ederson_Brilhante_building_secure_flexible 4:31 Joshua Fox - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Joshua_Fox_wafs_web_application 5:04 Nikolay Sivko - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Nikolay_Sivko_zeroinstrumentation_observability_ebpf 5:45 Alex Dejanu - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Alex_Dejanu_k8s_strategies_ask culture 6:08 Jorge Luis Castro Toribio - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Jorge_Luis_Castro_Toribio_building_reliable_community 6:46 Evgenii Korneev - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Evgenii_Korneev_resilient_teams_blueprint 7:17 David Argent - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_David_Argent_avoid_agile_victim deep dive 7:53 Dan Slimmon - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Dan_Slimmon_clinical_troubleshooting_diagnose 8:34 Pravar Agrawal - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Pravar_Agrawal_debugging_cluster_oncall 9:02 Aleksei Popov - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Aleksei_Popov_maze_complexity_distributed_systems 9:38 Harel Safra - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Harel_Safra_infrastructure_ends_practical 10:02 Serter Kazim Solak - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Serter_Kazim_Solak_visualization_techniques_datasets reliability 10:38 Marco Pierobon - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Marco_Pierobon_challenges_platform_tips 11:17 Jaiprakash Pherwani - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Jaiprakash_Pherwani_manage_service_risk 11:55 Dmitrii Pakhomov - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Dmitrii_Pakhomov_resilience_fintech_strategies scaling 12:32 German Urikh - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_German_Urikh_code_review_tips 13:13 Pranay Prateek - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Pranay_Prateek_scaling_opentelemetry_kafka (no intro) Adam Gardner - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Adam_Gardner_reality_platform_engineering transformation 13:56 Pini Reznik - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Pini_Reznik_error_carbon_empowering (no intro) Ricardo Castro - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Ricardo_Castro_slos_eventbased_navigating 14:42 Chinmay Naik - https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Chinmay_Naik_devops_mlops_scaling 15:20 thank you!

Watch
Journey to Next-Gen AIOps: eBPF & GenAI | Michele Dodic & Anastasia Archangelskaya | Conf42 SRE 2024

Read the abstract ➤ https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Michele_Dodic_Anastasia_Archangelskaya_journey_nextgen_aio Other sessions at this event ➤ https://www.conf42.com/sre2024 Support our mission ➤ https://www.conf42.com/support Join Discord ➤ https://discord.gg/DnyHgrC7jC Chapters 0:00 intro 0:26 preamble 0:34 speakers 1:12 agenda motivation 1:43 key sre challenges 3:11 why do sres need aiops? aiops journey 4:29 transformation from monitoring to zero-touch introduction to ebpf 9:58 what is ebpf? 10:27 how does ebpf work? next-gen aiops 11:38 aiops: current challenges on the market 14:08 use case *1: risk assessment in containerized applications 15:47 use case *2: securing devops pipelines 17:13 use case *3: sre copilot powered by genai 18:36 demo architecture 23:54 conclusion takeaways 24:40 thank you!

Watch
Future-Proofing SRE: Integrating AI for Resilience and Efficiency | Asutosh Mourya | Conf42 SRE 2024

Read the abstract ➤ https://www.conf42.com/Site_Reliability_Engineering_sre_2024_Asutosh_Mourya_futureproofing_integrating_resilience Other sessions at this event ➤ https://www.conf42.com/sre2024 Support our mission ➤ https://www.conf42.com/support Join Discord ➤ https://discord.gg/DnyHgrC7jC Chapters 0:00 intro 0:26 preamble 0:36 how ai fits into sre workflow 1:38 intelligent filtering and prioritization 4:56 anomaly detection 7:29 analysis and summarisation 9:41 foreccasting and predictive analysis 12:04 reducing toil 14:17 challenges 16:36 thank you

Watch
SRE 2.0: Amplifying Reliability with GenAI | Indika Wimalasuriya | Conf42 SRE 2024

Read the abstract ➤ [abstract link] Other sessions at this event ➤ https://www.conf42.com/sre2024 Support our mission ➤ https://www.conf42.com/support Join Discord ➤ https://discord.gg/DnyHgrC7jC Chapters 0:00 intro 0:26 preamble 2:01 sre 2.0 : amplifying reliability with genai 2:31 agenda 2:52 quick intro about myself 3:26 gartner sre hype cycle 4:24 sre 9:10 navigating digital transformation: managing ever-growing complexity 10:36 operations is a software problem 13:10 genai emerges: unveiling the power of next-gen artificial intelligence 13:40 unveiling the potential: the capabilityies of llm 15:15 navigating challenges: risks associated with llms 16:15 addressing model challenges: finding effective solutions 16:38 retrieval-augmented generation (rag) / knowledge bases 18:50 llm agents 20:57 prompt engineering best practices 21:43 prompt engineering properties 21:59 sre 2.0 23:33 genai in observability 26:17 use case - analyze log data to automatically identify root causes of performance issues 27:37 genai in sli, slo, and error budgets 29:43 use case - recommend optimal error budget allocations based on business priorities and user expectations 30:46 genai in system architecture and recovery objectives 32:33 use case - predict the impact of different failure scenarios on system availability and performance 33:23 genai in release & incident engineering 35:45 use case - provide real-time incident response recommendations based on the current situation and historical data 36:52 genai in automation 39:23 use case - analyze the effectiveness of automation workflows and recommend improvements bases on performance metrics 40:22 genai in genai in resilience engineering 41:43 use case - automate the execution of chaos experiments based on identified risk factors and failure scenarios 42:32 genai in genai in blameless postmortems 44:07 use case - analyze historical post-mortem data to identify recurring patterns and trends in incidents 45:02 measure progress with business outcomes 46:15 best practices 47:20 pitfalls to avoid 49:13 thank you.

Watch
Stress-testing Azure Resources using Chaos Studio | Peter De Tender | Conf42 SRE 2024

Read the abstract ➤ https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Peter_De_Tender_stresstesting_azure_chaos Other sessions at this event ➤ https://www.conf42.com/sre2024 Support our mission ➤ https://www.conf42.com/support Join Discord ➤ https://discord.gg/DnyHgrC7jC Chapters 0:00 intro 0:26 preamble 0:34 peter de tender 1:29 what is sre 2:40 the role of an sre? 3:19 what is chaos engineering 5:07 the curious case of cpu pressure 9:29 is chaos engineering - devops 3.0? 10:38 welcome to azure chaos studio 12:23 chaos experiments 14:28 azure chaos studio demo 32:26 resources 32:59 thank you!

Watch
Chaos Engineering in Action: Building Fault-Tolerant Systems | Hareesh Iyer | Conf42 SRE 2024

Read the abstract ➤ https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Hareesh_Iyer_chaos_engineering_practical Other sessions at this event ➤ https://www.conf42.com/sre2024 Support our mission ➤ https://www.conf42.com/support Join Discord ➤ https://discord.gg/DnyHgrC7jC Chapters 0:00 intro 0:26 preamble 0:53 agenda 1:15 why chaos engineering? 3:19 update about the october 4 outage 5:00 business impact of resilience is bigger than ever 5:19 why are these issues not surfaced during testing? 5:42 testing 7:19 what is chaos engineering? 10:03 testing vs experiments 11:37 chaos engineering: how to 13:09 #1: observe steady state 14:02 #2: plan hypothesis around the steady state 14:54 #3: run experiments 17:15 #4: verify and act 17:51 chaos engineering tools 18:59 thank you!

Watch
Building Secure Multi-Cloud Images with Multi-Boot Mode | Ederson Brilhante | Conf42 SRE 2024

Read the abstract ➤ https://www.conf42.com/Site_Reliability_Engineering_SRE_2024_Ederson_Brilhante_building_secure_flexible Other sessions at this event ➤ https://www.conf42.com/sre2024 Support our mission ➤ https://www.conf42.com/support Join Discord ➤ https://discord.gg/DnyHgrC7jC Chapters 0:00 intro 0:26 preamble 0:38 whoami 0:54 what will we see in this session? 1:14 general cicd overview 3:06 our context 7:49 our approach overview 9:56 project structure 10:39 builder suite structure 13:08 tester suite structure why this stack? 14:52 github actions 16:11 terraform 19:36 packer 21:30 ansible workflows 22:46 build image pipeline 23:30 test image pipeline 23:40 full pipeline workflow for regression tests: examples 24:14 regression tests: use case workflow for promotion image: example 27:24 promotion image: use case dev builds, vm debug and manual tests: examples 29:46 dev builds: use case 30:53 manual tests: use case code samples 32:47 github action - repos structure samples 35:48 composite action - aws build sample 37:43 reusable workflow - build sample 39:12 calling reusable workflow - build sample 40:59 regression tests in base image - pipeline sample 41:56 debug vm - pipeline sample 42:33 crucial takeaways 44:34 more about the stack 45:00 where to find me? 45:32 thank you

Watch