Unit Testing Data
“Can I trust this data?” When asked this question it can be a difficult task to objectively measure and answer. Similar to how unit tests have provided metrics for code coverage and bug regressions, this talk aims to show techniques and recipes developed to quantify data sanitisation and coverage. It also demonstrates an extensible design pattern that allows further tests to be developed.
If you can write a query, you can write data unit tests. These strategies have been implemented at Invoice2go in their ETL pipeline for the last 2 years to detect data regressions in their Amazon Redshift data warehouse.