A Spurious Outlier Detection System For High Frequency Time Series Data
As we are living in the age of IoT, more and more processes are using information gathered from well placed sensors to infer and predict better about their businesses. These sensor data are typically continuous and of enormous volume. Like any other data sources, they are also contaminated by noise (outliers) which may or may not be preventable. Presence of these outlier points will adversely affect the performance of any analytical model. Note that we are differentiating between contextual anomalies and noisy outliers. Former is of importance to us to build predictive models. Here we propose an integrated and scalable approach to detect spurious outliers. The main modules of this proposed system are taken from the literature. But to our knowledge, no such concerted approach exists where an end-to-end robust system is proposed like here. Even though this method was developed specifically using manufacturing IoT data, this is equally applicable for any domain dealing with time series data like CPG, Retail, Healthcare, Agrotech etc.
Outline/Structure of the Talk
- Introduction - 1min
- Problem Statement and Objectives - 2min
- Basic Thresholding and Transformations - 3min
- EWMA based approach - 2min
- Basic Framework and its components with results - 8min
- Updated Framework (work in progress) - 1min
- Summary and scope of improvements - 1min
- Q&A with suggestions - 2min
- Appreciate the challenges while dealing with high volume time series data and how we at Noodle are trying to solve these every day.
- Hopefully will make some members of the audience curious about time series analysis.
Data Scientists, Business Analysts, Product Managers