Automating Operations with Machine Learning

location_city Brisbane schedule Dec 10th 11:30 AM - 12:20 PM place Green Room people 212 Interested

How much money would you save if AI could detect and fix your outages as soon as they happen? In a multi-billion dollar business, outages are very expensive. MTTR has a direct effect on the bottom-line, so every second count in resolving issues. But with millions of metrics being generated by thousands of microservices, how do you choose which metrics to pay attention to? How do you make your alerts meaningful to avoid alert fatigue and desensitisation? How do you respond to those alerts in a timely manner?

In this talk, Matt covers how Expedia is using Machine Learning to "close the loop" involved in detecting, diagnosing and remediating outages post-release. You will learn about how to use ML to build models for anomaly detection in metrics. You will also learn about "ML-Ops" and how to build a platform for training and deploying ML models.


Target Audience


schedule Submitted 10 months ago