location_on United States
Member since 3 years
Specialises In (based on submitted proposals)
CEO/cofounder of honeycomb.io. Previously ran operations at Parse/Facebook, managing a massive fleet of MongoDB replica sets as well as Redis, Cassandra, and MySQL. Worked closely with the RocksDB team at Facebook to develop and roll out the world's first Mongo+Rocks deployment using the pluggable storage engine API. Experienced engineering manager, DBA and operations engineer. Likes single malt scotch.
A Brief History Of Time (And Monitoring)
Do you feel like you know everything going on in the fast-moving field of monitoring, instrumentation and observability? Of course you don't, nobody does! We'll do a whirlwind tour of the major categories (APM, log aggregation and metrics) and discuss their use cases, origin history and who uses them, as well as some true stories about ancient design choices and their ripple effects to this day.
On the technical side, we'll get into the nitty gritty of storage engines, indexing and scaling horizontally. Ever wondered why there aren't very many database performance monitoring query tools, and why the ones that exist are lacking critical features? (The answer may surprise you!) Ever wondered why splunk is so gosh darn expensive? What's the difference between ticks and counters, interval rollups and time series databases? Can't any db be a time series? What's a column store and why should you care?
We will outline and skim through modern best practices when it comes to service ownership, oncall, and instrumentation over the lifecycle of the service. Things like, how do you choose what to use when starting out, and how do you keep up with requirements as it evolves and hits crisis points? What are some different considerations for monitoring stateful vs stateless services? Should you have just one monitoring system, or multiple monitoring products that are optimized for different products or workloads? Should you really monitor *everything*? (NO.)
And what about containers in a host-centric world, and what about newer tools for distributed systems and microservices? (My specialty.) Come and bring all your questions!
7 Deadly Sins of Monitoring
It's 2017, and everybody now knows how important it is to monitor their services. But less attention gets paid to whether we're monitoring it *well* or not. In this talk we'll cover several extremely common failures and problems that can make your monitoring ineffective, or even worse than no monitoring at all.
- Psychic monitoring. Does your system only let you check for things you predicted might break, or can you ask new questions?
- Gatekeepers. Do you have to understand an entire massive infra stack to add a single metric?
- Fractured reality. Do your teams share tools, or does everyone have a different source of truth?
- Brittle, cobwebby checks. Is it too hard to make changes?
- Intuition. Do you have the information you need to make data-driven debugging decisions, or do you rely on intuition and smells?
- Meta-monitoring. Is anyone keeping an eye on how often people are getting paged or woken up, and noticing when it changes?
All of these things can be overcome with not a terrible amount of effort, but most of us don't learn to avoid it until we've suffered many outages and scars. Let's try to accelerate your experience substantially!
"OpsDev": it's time to put the "ops" in "dev"
For years the cornerstone of the "DevOps" movement has been lots of people urging sysadmins and operations engineers to get better at writing code. "You can't be a real devops engineer unless you write software and tests!", they say. Well, they were right, and message has been received. Writing software is now a part of every engineering interview.
But ... is that where it stops? What about all the developers who barely know how to log in and debug their own software, or who have no idea what happens to their code after they deploy it? They need DevOps too ... or rather, they need "OpsDev". They need the critical skills that have long been associated with operations and systems engineering: reproducing complex bugs, optimizing queries, designing and building resilient infrastructure, deploying their own services, instrumenting them for debuggability, scaling and owning and monitoring high quality services.
This is not just a fuzzy culture talk, it's a very practical talk filled with real tips and stories from teams that have undergone the hard switch. Lots of software engineers dread this transition, but 1) you aren't really going to have a choice before long, since dedicated ops teams to clean up messes are fading away, and 2) this is a better world and you should want it! Having solid ops skills makes you a superhero. Empathy and shared crossfunctional skill sets create a better world for all of us.
No more submissions exist.
No more submissions exist.