SRE 2.0: Amplifying Reliability with GenAI | Indika Wimalasuriya | Conf42 SRE 2024
Read the abstract ➤ [abstract link] Other sessions at this event ➤ https://www.conf42.com/sre2024 Support our mission ➤ https://www.conf42.com/support Join Discord ➤ https://discord.gg/DnyHgrC7jC Chapters 0:00 intro 0:26 preamble 2:01 sre 2.0 : amplifying reliability with genai 2:31 agenda 2:52 quick intro about myself 3:26 gartner sre hype cycle 4:24 sre 9:10 navigating digital transformation: managing ever-growing complexity 10:36 operations is a software problem 13:10 genai emerges: unveiling the power of next-gen artificial intelligence 13:40 unveiling the potential: the capabilityies of llm 15:15 navigating challenges: risks associated with llms 16:15 addressing model challenges: finding effective solutions 16:38 retrieval-augmented generation (rag) / knowledge bases 18:50 llm agents 20:57 prompt engineering best practices 21:43 prompt engineering properties 21:59 sre 2.0 23:33 genai in observability 26:17 use case - analyze log data to automatically identify root causes of performance issues 27:37 genai in sli, slo, and error budgets 29:43 use case - recommend optimal error budget allocations based on business priorities and user expectations 30:46 genai in system architecture and recovery objectives 32:33 use case - predict the impact of different failure scenarios on system availability and performance 33:23 genai in release & incident engineering 35:45 use case - provide real-time incident response recommendations based on the current situation and historical data 36:52 genai in automation 39:23 use case - analyze the effectiveness of automation workflows and recommend improvements bases on performance metrics 40:22 genai in genai in resilience engineering 41:43 use case - automate the execution of chaos experiments based on identified risk factors and failure scenarios 42:32 genai in genai in blameless postmortems 44:07 use case - analyze historical post-mortem data to identify recurring patterns and trends in incidents 45:02 measure progress with business outcomes 46:15 best practices 47:20 pitfalls to avoid 49:13 thank you.