7 Deadly Sins of Monitoring
It's 2017, and everybody now knows how important it is to monitor their services. But less attention gets paid to whether we're monitoring it *well* or not. In this talk we'll cover several extremely common failures and problems that can make your monitoring ineffective, or even worse than no monitoring at all.
- Psychic monitoring. Does your system only let you check for things you predicted might break, or can you ask new questions?
- Gatekeepers. Do you have to understand an entire massive infra stack to add a single metric?
- Fractured reality. Do your teams share tools, or does everyone have a different source of truth?
- Brittle, cobwebby checks. Is it too hard to make changes?
- Intuition. Do you have the information you need to make data-driven debugging decisions, or do you rely on intuition and smells?
- Meta-monitoring. Is anyone keeping an eye on how often people are getting paged or woken up, and noticing when it changes?
All of these things can be overcome with not a terrible amount of effort, but most of us don't learn to avoid it until we've suffered many outages and scars. Let's try to accelerate your experience substantially!
Outline/Structure of the Talk