In the fast moving world today, rare events are becoming increasingly common. Ranging from studying incidents of safety hazards to identifying transaction fraud, they all fall under the radar of rare events. Identifying and studying rare events become of crucial importance, particularly when the underlying event conforms to a sensitive or an adverse issue. The thing to note here is, despite the probability of occurrence being very close to zero, the potential specification of the rare event could be quite extensive. For example, within the parent rare event of Product Safety, there could be multiple types of potential hazard (Fire, Electrical, Pharmaceutical, etc.), rendering the sub-classes rarer still. In this talk, we are going to discuss a novel algorithm designed to study a rare event and its sub-classes over time with primary focus on forecast and detecting anomalies.

The anomalies studied here are relative anomalies i.e., they may not contribute to the long-term trend of the rare time series but represent deviation from the base state as seen in the immediate past.

Keywords:

Product Safety

Fire Hazards

Transaction Fraud

Malware Detection

Relevant Sister Classes

Count Time Series

Non Homogeneous Poisson Process

Discrete Space Optimization

Text Mining

GloVe

Sparsity Treatment in Text

Dynamic Time Warping

Discords

Anomaly Detection

Deviation Score

Relative Local Density

Density Based Clustering

 
 

Outline/Structure of the Talk

In the fast moving world today, rare events are becoming increasingly common. Ranging from studying incidents of safety hazards to identifying transaction fraud, they all fall under the radar of rare events. Identifying and studying rare events become of crucial importance, particularly when the underlying event conforms to a sensitive or an adverse issue. The thing to note here is, despite the probability of occurrence being very close to zero, the potential specification of the rare event could be quite extensive. For example, within the parent rare event of Product Safety, there could be multiple types of potential hazard (Fire, Electrical, Pharmaceutical, etc.), rendering the sub-classes rarer still. In this talk, we are going to discuss a novel algorithm designed to study a rare event and its sub-classes over time with primary focus on forecast and detecting anomalies.

Proper forecast of expected cases of the rare event and its sub-classes help prepare for the upcoming challenges and enforce resources accordingly. For example, based on the historic data, if we can predict that the customer complaints regarding skin irritation will be higher than usual for a certain product in a given month, we can enforce stricter quality control on the said product or, even recall it altogether. Likewise, we need to continuously study the evolving patterns of the time series. Depending on the rarity, the complexity of identifying anomalies will vary. The rare events here will be studied over shorter periods in time. Hence, the anomalies will be relative anomalies that may not eventually contribute to the longer term trend. But identification of any deviation compared to recent states can be of significance in the industry. For instance, for seller fraud detection, a fraudulent seller will not continue to be a part of the marketplace over longer periods in time. In such cases, the overall longevity of the rare event, i.e., fraud, is very low, giving rise to shorter time series.

In this exercise, we will discuss Non Homogeneous Count Process to forecast cases of the rare event. For the deviation study, we introduce a novel technique that computes a deviation score for each unit of time. This score is a composite of two component scores- Intra & Inter DScore, each capturing a different aspect of anomaly detection in time series. The threshold for the deviation score, beyond which the unit is flagged as an anomaly, is computed via a unique clustering technique.

Traditional time series forecast divides the time series into deterministic and random components. The deterministic components are removed and the random component is modeled using stochastic regressors. In most of the traditional models, the random component is assumed to follow a Normal Distribution. In recent times, more computationally expensive methods have given rise to techniques such as XgBoost and LSTM. These techniques, although fairly accurate, require larger sets of training data. However, our research relates to forecast of rare events to be trained on shorter periods of time. Due to the low probability associated with the underlying event, the magnitude of the values of the time series will be very low. Any continuous distribution to these low-order count values will be a very poor fit. Ideally, the count values are expected to follow a Poisson or, a Negative Binomial Distribution. Hence, studying them as a Count Process is more accurate. Additionally, due to the high sparsity in the data, we will allow higher freedom in choice of stochastic regressors. The deviation detection method used here is built to suit the underlying use case. Most anomaly detection techniques in time series focus on finding anomalies within a given time series. In our case, we will try to utilize all the information available to us, even outside of the time series. Since, anomalies are to be identified over shorter periods of time, critical values will be dynamic and need to be recomputed periodically. All time series comparisons will be between warped set of points (using Dynamic Time Warping).

The session will include a brief introduction on the concepts of Rare Events, Count Processes, Poisson and Negative Binomial Distribution Origins and Dynamic Time Warping. Following this, we will be discussing one area of application for this paper in great depth - starting from the problem formulation up to the final business impact + caveats. The entire algorithm will then be explained in the light of the problem statement (generalized alternates will also be given at every point). Since the algorithm is math heavy, we will be taking small detours to explain certain key concepts (such as Concords & Discords in time series, Relative Local Density Clustering, Outlierness, Correlation Mutation, Sister Classes, etc.) used in the solution; the audience will not be expected to know advance concepts before-hand - basic idea of time series forecast and distance measures would suffice. Finally, once the solution is laid down in a manner that can be easily replicated for other business solutions - we will discuss limitations and future work + (if time permits, a few more example business problems where the solution can be applied). This is a very generic algorithm applicable to cases in multiple disciplines: Retail, IT, Finance - basically anywhere a Rare Event is feasible in the form of a Time Series!

Here's an overview of the flow.

1. What are rare events + Caveats associated with forecast of a rare time series (2 min)

3. Count Process (Poisson): Motivation, Origin + Time Series Prediction Equation + Seasonality Induction (5 mins)

4. Breaking a time series into subsequences + choice of Subsequence length (2 mins)

6. Discords in Time Series + Dynamic Time Warping + Relevant Sister Classes (5 mins)

8. DScore: Intra DScore + Inter DScore + Thresholding(Relative Density Based Clustering) (5 mins)

9. Application of the proposed algorithm for at least one real dataset (3 mins)

Learning Outcome

At the end of the talk, the audience is expected to have learnt a new algorithm for anomaly detection in time series which has widespread application in every industry. They will also gain exposure to the advance concepts of Relative density, Concords-Discords, Dynamic Time Warping used in the algorithm. Poisson Process is another very cool time series technique that not many people seem to have explored! This is particularly important to rare events.

The key idea is to remind the audience how much fun the core math behind an algorithm can be. In this era where everyone just wants to use pre-built packages, the beauty of AI Research goes unexplored to the lot of applied scientists out there - application is where the need for research is born and the talk aims at breaking the myth that mutating the math behind pre-built packages takes a lot of work!

Target Audience

Anyone with a strong interest in core Data Sciences and a mathematical/statistical background. Enthusiasm in the underlying math for data science is crucial.

Prerequisites for Attendees

Most of the concepts used in the talk will be base concepts that a data science professional will be familiar with. Having said that, there will be certain aspects such as Poisson Processes, Dynamic Time Warping, Time Series Discords, Relative Density based Clustering, Concepts of Direct Reachability/Rank (from graph theory), Matrix Factorization & Word Embedding that the audience may not be readily familiar with. We will give overviews for such concepts but since time will be limited, the audience will need to be able to get onboard with core statistical concepts pretty fast. As long as they have a sound base knowledge on the Poisson Distribution, Forecasting, Clustering and how density based distances work, they should be good to go.

schedule Submitted 3 years ago
help