Lake, Swamp or Puddle: Data Quality at Scale
Data is a powerful tool. Data-driven systems leveraging modern analytical and predictive techniques can offer significant improvements over static or heuristic driven systems. The question is: how much can you trust your data?
Data collection, processing and aggregation is a challenging task. How do we build confidence in our data? Where did the data come from? How was it generated? What checks have or should be applied? What is affected when it all goes wrong?
This talk looks at the mechanics of maintaining data-quality at scale. Firstly looking at bad-data, what it is and where it comes from. Then diving into the techniques required to detect, avoid and ultimately deal with bad-data. At the end of this talk the audience should come away with an idea of how to design quality data-driven systems that ultimately build confidence and trust rather than inflate expectations.