Achieving High Uptime with Erlang's OTP

Few technical metrics are more closely watched in the world of online service delivery than a system's “uptime”. Response speed, number of concurrent users and data throughput are all forgotten about very quickly when users can't even get onto your system.

The Erlang language was designed with fault tolerance as a core principle. Fault tolerance describes the ability of a system to keep running in the event of some kind of unexpected problem, without external assistance (in other words, without waking up your sysadmins) and with minimal impact to users. Consider a single monolithic C++ program, handling 10,000 concurrent users all with persistent TCP connections. If one user does something unusual and hits a poorly tested code path, causing a segfault, then 10,000 people see their connection drop. That's (obviously) not fault tolerant. If just that one user's connection dies and everyone else's carries on, that's fault tolerance (and also far preferable!).

While Erlang's basic features give a degree of fault tolerance, they're not a silver bullet. The share-nothing memory model and ability to separate your system into thousands of isolated threads are extremely useful when building robust systems, but they're a foundation rather than the whole solution.

The other killer for uptime is rolling out upgrades and fixes. Without some clever infrastructure, sooner or later your customers will see “We're down for scheduled maintenance – come back in 30 minutes”. One solution to this is “hot upgrades” whereby code can be upgraded on the fly with literally no downtime or interruption to ongoing services.

This tutorial covers Erlang's Open Telephony Platform (which has almost nothing to do with telephony per se) and how it delivers on Erlang's promise of robust, fault tolerant, hot-upgradable software.

 
1 favorite thumb_down thumb_up 0 comments visibility_off  Remove from Watchlist visibility  Add to Watchlist
 

Outline/structure of the Session

  • The basic OTP building blocks and concepts – applications, supervisors, gen_server, gen_fsm and gen_event.
  • What a supervision tree is, how it provides fault tolerance and how to design one.
  • A simple example of how to convert a basic Erlang program into a fault-tolerant OTP one.
  • A quick demonstration of hot code loading.

Learning Outcome

Participants will learn how to make use of the OTP framework building blocks to construct robust, fault tolerant Erlang applications.

Target Audience

Developers interested in learning about Erlang's application framework. Prio knowledge of Erlang is helpful but not required.

schedule Submitted 2 years ago

Comments Subscribe to Comments

comment Comment on this Proposal