What is Data Reliability Engineering? Why it is Crucial? | Miriah Peterson | Conf42 SRE 2022

Software practitioners work to make their systems reliable. We hear teams boasting of having four or five 9s of uptime. This is not the case for Data Services. Data is not often 99.999% reliable. Systems are often out of date or out of sync. Pipelines and automated jobs fail to run. And, sometimes, the data sources are just not accurate. All these situations are examples of Data Downtime and lead to misleading results and false reporting. Data Reliability Engineering is the practice of building resilient systems. By treating data systems as an engineering problem we can borrow tools and practices from SRE to build better systems. Together let’s explore how to take this natural extension of data engineering to make our data systems stronger and more reliable. We will explore three major topics to strengthen any pipeline: - Data Downtime: We will talk about what is Data Downtime? How does it affect your bottom line? And How to minimize it? - Data Service Level Metrics: We will talk about metadata for your Data pipeline? How to report on pipeline transactions that can lead to preventative data engineering practices. - Data monitoring: What to look out for and how to be aware of system failure verse data failures. Other talks at this conference 🚀🪐 https://www.conf42.com/sre2022 — 0:00 Intro 2:13 Talk