A Brief History Of Time (And Monitoring)
Do you feel like you know everything going on in the fast-moving field of monitoring, instrumentation and observability? Of course you don't, nobody does! We'll do a whirlwind tour of the major categories (APM, log aggregation and metrics) and discuss their use cases, origin history and who uses them, as well as some true stories about ancient design choices and their ripple effects to this day.
On the technical side, we'll get into the nitty gritty of storage engines, indexing and scaling horizontally. Ever wondered why there aren't very many database performance monitoring query tools, and why the ones that exist are lacking critical features? (The answer may surprise you!) Ever wondered why splunk is so gosh darn expensive? What's the difference between ticks and counters, interval rollups and time series databases? Can't any db be a time series? What's a column store and why should you care?
We will outline and skim through modern best practices when it comes to service ownership, oncall, and instrumentation over the lifecycle of the service. Things like, how do you choose what to use when starting out, and how do you keep up with requirements as it evolves and hits crisis points? What are some different considerations for monitoring stateful vs stateless services? Should you have just one monitoring system, or multiple monitoring products that are optimized for different products or workloads? Should you really monitor *everything*? (NO.)
And what about containers in a host-centric world, and what about newer tools for distributed systems and microservices? (My specialty.) Come and bring all your questions!
Outline/Structure of the Talk