Designing for Failure: Scaling Uber’s Backend by Breaking Everything

As Uber scales its business to new products in new cities, the requirements for high availability and scalability increase. As the engineering team scales, doubling every 6 months, the challenges of building a reliable system grow with it. At our current scale, even brief outages in the service are very costly, both in dollars to the company and with real world impact on people’s lives.

To get better at handling failure and design for it, we’ve had to make failures more common. Every new system that we build is subjected to regular failure testing, even databases. This requires some new technology choices from the more comfortable ones that worked when we were smaller.

The shift from a smaller service with a few hardened components to a global operation with hundreds of services is as much cultural as it is technical. This talk will cover the Uber architecture and how it handles every failure we can think of. It’ll also cover some real outages and how they’ve influenced our new design.

 
 

Target Audience

developers, Technical leads and Architects,programmers, testers, business analysts and product owners,programmers, testers, business analysts and product ownersts and product owners

schedule Submitted 2 months ago

Public Feedback

comment Suggest improvements to the Speaker