Monitorama PDX 2023 - Have you tried not storing all those metrics
Tony Rippy's session from Monitorama PDX 2023. When running services in production, the temptation is to keep all the metrics you can. It is difficult to decide in advance what metrics are needed to debug problems, or what system performance metrics will be important in the future. As the saying goes: “It’s better to have it and not need it, than need it and not have it.” There are several problems with this. First, it is wasteful; most monitoring data is written once and never read. Second, it can be wildly expensive! I have heard stories of big tech companies that end up spending millions of dollars a year on their monitoring bills, enough to impact the bottom line of the business. It also causes scalability problems, as high-cardinality data sets are notoriously difficult to store and query efficiently. But… What if you didn’t need to store this data? What if there was a way to reduce the amount of data but continue to meet your day-to-day requirements? This is the idea behind this talk. It covers experiments that use empirical distributions and samples in place of time series. We will discuss the pros & cons of this approach, and lessons learned trying to apply this in practice.