Digging to the Roots
Whether it's a minor typo on a page, a major failure causing a severe outage of a system or anything in between, the software industry is fertile ground for examining problems and their causes. From the problems that plagued HealthCare.gov to defects that allowed some lucky people to purchase airline tickets for almost nothing from United airlines, we hear a constant stream of issues with software systems.
With our society becoming increasingly dependent on software, we need to "up our game" with respect to tracking down problems with they happen, ensuring that defects are caught before entering the wild, and are prevented from occurring in the first place.
Root Cause Analysis is a process that enables this form of continuous improvement and uses techniques borrowed from other engineering disciplines. The aviation industry, for example, constantly seeks to improve due to the dire consequences of any failures in that domain.
This interactive workshop will explain when and how to use Root Cause Analysis (RCA) to investigate problems and determine actions that will ensure that those problems can never happen again. Using real world examples the attendees will explore simple, lightweight RCA practices as well as a more involved example using fault tree analysis.
Outline/Structure of the Workshop
- 10 minutes - Introduction
- 15 minutes - Split into groups and perform RCA with a simple software defect
- 5 minutes - Debrief
- 20 minutes - Facilitated fault tree analysis of larger example failure
- 10 minutes - Debrief and Questions
As an attendee of this workshop, you will learn the purpose and mechanics of conducting and leading sessions to determine the causes of significant issues that have affected your work.
The workshop will focus on using simple root cause analysis for smaller problems, and the fault tree analysis technique for a larger, more complex example. You will explore as many aspects of a failure as practical, and will identify possible corrective actions that need to be taken in order to prevent a similar failure in the future.