Taming the Beast: Automated Testing for Complex Data Pipelines

Massive datasets. Complex data pipelines. Machine learning. When faced with such a beast, how do you test it effectively? When your tests results are less "pass" and "fail", and more "sort of" and "not really", how do you automate testing?

Trish Khoo draws upon her experience in testing complex data systems to demonstrate proven strategies for testing in this field. Her experience working on ultra-large-scale systems at Google in Mountain View, California shaped her technical approach to testing which she applies in her work as a consultant today.


Outline/Structure of the Talk

Trish will first explain examples of the type of systems she worked on previously and how the solution applied to them. Then she will go through an detailed technical example of the solution in action, as applied to a fictional system (as all the real examples are restricted by NDA). Lastly, she shall summarise and propose future solutions.

Learning Outcome

The audience will learn a proven approach to automated testing for systems with complex data pipelines, large datasets and machine learning.

Target Audience

Technical folks working on complex data systems

