SLA is for lawyers, SLO is where the money hides
We have thousands of frontend servers in 7 data centers serving over 500k HTTP requests per second. They all expect to answer as quickly as possible to meet our SLA
Having said that, not breaking the SLA is one thing, but how to define the SLO is another. Let's say our SLA has a response time of p99 < 1000ms. This gives us a wide range where we can determine the SLO.
It may seem logical to set the SLO as low as possible. This way, we are less likely to break our SLA. But what if I tell our customer that I can return him a response on 400ms or I can return him a response on 800ms that will boost his revenue?
Should we then define a different SLO? Maybe we should embrace the risk of breaking SLA from time to time but to have bigger revenue most of the time?
In my lecture I’ll describe three systems we developed to utilize our system dynamically to gain an RPM-oriented SLO. While processing requests, we evaluate the value of each feature and determine if we have the time and resources to utilize it for revenue generation.
Those are Java infrastructures we use internally to provide the most valuable responses to our customers within the limits of our Service Level Agreement.
Outline/Structure of the Talk
- SLA Vs. SLO
- It is important to measure what our services do in terms of revenue
- 2 Java - infrastructures developed to increase revenue from our resource
- Changing the point of view from SLA to SLO
- We can sometimes increase revenue by taking risk to break the SLA
- We shouldn't say that keeping the SLA means being as far from the SLA as possible
- What are the ways we are able to earn more revenue using dedicated infrastructures even though we are more likely to break SLAs?
Developers, infrastructure developers, and whoever wants to develop a code that will make his company more profitable
Prerequisites for Attendees