In the fast moving world today, rare events are becoming increasingly common. Ranging from studying incidents of safety hazards to identifying transaction fraud, they all fall under the radar of rare events. Identifying and studying rare events become of crucial importance, particularly when the underlying event conforms to a sensitive or an adverse issue. The thing to note here is, despite the probability of occurrence being very close to zero, the potential specification of the rare event could be quite extensive. For example, within the parent rare event of Product Safety, there could be multiple types of potential hazard (Fire, Electrical, Pharmaceutical, etc.), rendering the sub-classes rarer still. In this talk, we are going to discuss a novel algorithm designed to study a rare event and its sub-classes over time with primary focus on forecast and detecting anomalies.

The anomalies studied here are relative anomalies i.e., they may not contribute to the long-term trend of the rare time series but represent deviation from the base state as seen in the immediate past.

Keywords:

Product Safety

Fire Hazards

Transaction Fraud

Malware Detection

Relevant Sister Classes

Count Time Series

Non Homogeneous Poisson Process

Discrete Space Optimization

Text Mining

GloVe

Sparsity Treatment in Text

Dynamic Time Warping

Discords

Anomaly Detection

Deviation Score

Relative Local Density

Density Based Clustering

 
 

Outline/Structure of the Talk

In the fast moving world today, rare events are becoming increasingly common. Ranging from studying incidents of safety hazards to identifying transaction fraud, they all fall under the radar of rare events. Identifying and studying rare events become of crucial importance, particularly when the underlying event conforms to a sensitive or an adverse issue. The thing to note here is, despite the probability of occurrence being very close to zero, the potential specification of the rare event could be quite extensive. For example, within the parent rare event of Product Safety, there could be multiple types of potential hazard (Fire, Electrical, Pharmaceutical, etc.), rendering the sub-classes rarer still. In this talk, we are going to discuss a novel algorithm designed to study a rare event and its sub-classes over time with primary focus on forecast and detecting anomalies.

Proper forecast of expected cases of the rare event and its sub-classes help prepare for the upcoming challenges and enforce resources accordingly. For example, based on the historic data, if we can predict that the customer complaints regarding skin irritation will be higher than usual for a certain product in a given month, we can enforce stricter quality control on the said product or, even recall it altogether. Likewise, we need to continuously study the evolving patterns of the time series. Depending on the rarity, the complexity of identifying anomalies will vary. The rare events here will be studied over shorter periods in time. Hence, the anomalies will be relative anomalies that may not eventually contribute to the longer term trend. But identification of any deviation compared to recent states can be of significance in the industry. For instance, for seller fraud detection, a fraudulent seller will not continue to be a part of the marketplace over longer periods in time. In such cases, the overall longevity of the rare event, i.e., fraud, is very low, giving rise to shorter time series.

In this exercise, we will discuss Non Homogeneous Count Process to forecast cases of the rare event. For the deviation study, we introduce a novel technique that computes a deviation score for each unit of time. This score is a composite of two component scores- Intra & Inter DScore, each capturing a different aspect of anomaly detection in time series. The threshold for the deviation score, beyond which the unit is flagged as an anomaly, is computed via a unique clustering technique.

Traditional time series forecast divides the time series into deterministic and random components. The deterministic components are removed and the random component is modeled using stochastic regressors. In most of the traditional models, the random component is assumed to follow a Normal Distribution. In recent times, more computationally expensive methods have given rise to techniques such as XgBoost and LSTM. These techniques, although fairly accurate, require larger sets of training data. However, our research relates to forecast of rare events to be trained on shorter periods of time. Due to the low probability associated with the underlying event, the magnitude of the values of the time series will be very low. Any continuous distribution to these low-order count values will be a very poor fit. Ideally, the count values are expected to follow a Poisson or, a Negative Binomial Distribution. Hence, studying them as a Count Process is more accurate. Additionally, due to the high sparsity in the data, we will allow higher freedom in choice of stochastic regressors. The deviation detection method used here is built to suit the underlying use case. Most anomaly detection techniques in time series focus on finding anomalies within a given time series. In our case, we will try to utilize all the information available to us, even outside of the time series. Since, anomalies are to be identified over shorter periods of time, critical values will be dynamic and need to be recomputed periodically. All time series comparisons will be between warped set of points (using Dynamic Time Warping).

The session will include a brief introduction on the concepts of Rare Events, Count Processes, Poisson and Negative Binomial Distribution Origins and Dynamic Time Warping. Following this, we will be discussing one area of application for this paper in great depth - starting from the problem formulation up to the final business impact + caveats. The entire algorithm will then be explained in the light of the problem statement (generalized alternates will also be given at every point). Since the algorithm is math heavy, we will be taking small detours to explain certain key concepts (such as Concords & Discords in time series, Relative Local Density Clustering, Outlierness, Correlation Mutation, Sister Classes, etc.) used in the solution; the audience will not be expected to know advance concepts before-hand - basic idea of time series forecast and distance measures would suffice. Finally, once the solution is laid down in a manner that can be easily replicated for other business solutions - we will discuss limitations and future work + (if time permits, a few more example business problems where the solution can be applied). This is a very generic algorithm applicable to cases in multiple disciplines: Retail, IT, Finance - basically anywhere a Rare Event is feasible in the form of a Time Series!

Here's an overview of the flow.

1. What are rare events + Caveats associated with forecast of a rare time series (2 min)

3. Count Process (Poisson): Motivation, Origin + Time Series Prediction Equation + Seasonality Induction (5 mins)

4. Breaking a time series into subsequences + choice of Subsequence length (2 mins)

6. Discords in Time Series + Dynamic Time Warping + Relevant Sister Classes (5 mins)

8. DScore: Intra DScore + Inter DScore + Thresholding(Relative Density Based Clustering) (5 mins)

9. Application of the proposed algorithm for at least one real dataset (3 mins)

Learning Outcome

At the end of the talk, the audience is expected to have learnt a new algorithm for anomaly detection in time series which has widespread application in every industry. They will also gain exposure to the advance concepts of Relative density, Concords-Discords, Dynamic Time Warping used in the algorithm. Poisson Process is another very cool time series technique that not many people seem to have explored! This is particularly important to rare events.

The key idea is to remind the audience how much fun the core math behind an algorithm can be. In this era where everyone just wants to use pre-built packages, the beauty of AI Research goes unexplored to the lot of applied scientists out there - application is where the need for research is born and the talk aims at breaking the myth that mutating the math behind pre-built packages takes a lot of work!

Target Audience

Anyone with a strong interest in core Data Sciences and a mathematical/statistical background. Enthusiasm in the underlying math for data science is crucial.

Prerequisites for Attendees

Most of the concepts used in the talk will be base concepts that a data science professional will be familiar with. Having said that, there will be certain aspects such as Poisson Processes, Dynamic Time Warping, Time Series Discords, Relative Density based Clustering, Concepts of Direct Reachability/Rank (from graph theory), Matrix Factorization & Word Embedding that the audience may not be readily familiar with. We will give overviews for such concepts but since time will be limited, the audience will need to be able to get onboard with core statistical concepts pretty fast. As long as they have a sound base knowledge on the Poisson Distribution, Forecasting, Clustering and how density based distances work, they should be good to go.

schedule Submitted 8 months ago

Public Feedback

comment Suggest improvements to the Author
  • Ashay Tamhane
    By Ashay Tamhane  ~  7 months ago
    reply Reply

    Thanks Debanjana for an interesting proposal. Could you clarify if you are going to discuss practical challenges+results from your own experience (your own industry use case) for rare events?

    • Debanjana Banerjee
      By Debanjana Banerjee  ~  6 months ago
      reply Reply

      Hi Ashay, Apologies for the delayed response on this.

      CRESST was used in Walmart Labs to identify changing patterns of Product Safety related customer reviews and also forecast of expected count of true Product Safety cases. I would like to show the audience my iterations using XgBoost and like algorithms and why such techniques did not work for a count time series as rare as that of Product Safety. In terms of time series anomaly detection too, I plan to go into why the Inter Deviation Score was important (why comparing the time series with its own past trend was not enough and why we needed to go the extra mile to compare it with its sister classes in the current/test state). For the results, I plan to show the performance metrics - and give an overview of the emergent changing patterns in customer reviews for Product Safety and how CRESST was able to capture it.

  • Sujoy Roychowdhury
    By Sujoy Roychowdhury  ~  8 months ago
    reply Reply

    I like these aspects of the submission, and they should be retained:

    • Very interesting and relevant topic. Also less explored

    I think the submission could be improved by:

    • This is a topic where most people would be unfamiliar with the maths / solution approaches. Can you detail how you will break up the 20 mins talk ? 
    • You have mentioned a host of techniques . Given the short talk can we have a breakup of what you will cover , how long . The talk should not turn out to be just a high level overview. On the other hand getting into so many techniques ( Relative density, Concords-Discords, Dynamic Time Warping, Poisson Process etc. - which most are unaware of in a DL-crazy world) is not feasible. So the structure needs to be defined a bit more properly. Please scope out the inclusions and exclusions of the talk carefully.
    • Debanjana Banerjee
      By Debanjana Banerjee  ~  8 months ago
      reply Reply

      Hi Sujoy, 

      Thank you so much for your feedback. I understand the talk being math heavy, the time allotted to each segment is crucial.

      Here's an overview of the flow as you requested.

      1. What are rare events + Caveats associated with forecast of a rare time series (2 min)

      2. Count Process (Poisson): Motivation, Origin + Time Series Prediction Equation + Seasonality Induction (5 mins)

      3. Breaking a time series into subsequences + choice of Subsequence length (2 mins) 

      4. Discords in Time Series + Dynamic Time Warping + Relevant Sister Classes (5 mins)

      5. DScore: Intra DScore + Inter DScore + Thresholding(Relative Density Based Clustering) (5 mins) 

      6. Application of the proposed algorithm for at least one real dataset  (3 mins)

      Certain segments such (Seasonality Induction, Choice of Subsequence Length) can be pushed to appendix for the interested few. The estimated time is based some e-boardwork that I may be doing. The audio presentation for this (linked above) takes about 12-15 mins w/out Q&A. But in a face-to-face session, we may want to go a little deeper on the math - hence 20 mins. Let me know your thoughts. If you feel it is getting too cluttered, we can decide to drop the Forecast part altogether and concentrate on just the Pattern Deviation bit. The Forecast bit uses standard Poisson Process equations and the audience might be able to replicate it themselves if we provide links to the open source packages and some papers to understand the underlying math (in which case, I will only explain the motivation behind Count Processes and leave links for the math).

      Looking forward to your reply.

      Thanks,

      Debanjana 

      • Sujoy Roychowdhury
        By Sujoy Roychowdhury  ~  8 months ago
        reply Reply

        Thanks. I just went through your talk video - it is good but a bit difficult to follow for a newbie without examples on the metrics etc. Please remember very few, if any, would have experience on predicting rare events. I would also suggest if you can include some examples of the order of magnitude of rarity ... and at what level of rarities would your methods apply better. 

        • Debanjana Banerjee
          By Debanjana Banerjee  ~  8 months ago
          reply Reply

          Hi Sujoy,  Apologies for the delay in response. I appreciate your feedback on the flow and based on that, I will add more detailed slides with examples on the features used as well the level of rarity. Given it is a 20 minute talk, would you suggest I drop the prediction bit and fashion the talk around the anomaly detection in rare time series (because I believe there is more novelty there)? I understand how explaining Poisson Process would be time taking if the audience is not so familiar with statistical approaches.

          • Sujoy Roychowdhury
            By Sujoy Roychowdhury  ~  8 months ago
            reply Reply

            My suggestion is focus on why a rare event prediction is not a traditional time series modelling task and from there on talk about the approaches and the scoring approaches. Poisson process may be a bit difficult to follow in a 20 minute talk.

            • Debanjana Banerjee
              By Debanjana Banerjee  ~  8 months ago
              reply Reply

              Sure. So just to clarify I would be explaining in details how rare event time series modelling would be different from standard time series approaches (use case specific examples to demonstrate that). For forecast, maybe I can drop the term Poisson Process but not go into the theory. From there I directly talk about the Deviation Score and the metrics behind it. Please let me know if I understood it correctly. I'll try to share a first cut ppt on this with the suggested updates. Does that work?