iCASSTLE: Imbalanced Classification Algorithm for Semi Supervised Text Learning
Customer Reviews flow in across multiple sources in thousands per day. These reviews could be in the form of a transcribed telephone conversation, an online review or, a survey. We want to extract cases of Product Safety from this huge amount of data. Product Safety issues cover less than 1% of the total reviews which may be as high as 70K per day coming from a singular data source. Manual review of these cases to identify potential product-safety threats poses the risk of missing out on such rare events, which may later back-fire on the industry jeopardizing its reputation and liability. Preventive measures, under such circumstances, are extremely crucial and time-sensitive.
Unlike simple cases of imbalanced binary classification, the challenge here is to gather training data. Available training data consists of very few Reportable cases, pertaining to Product Safety, and no Non-Reportable case. This is an extreme case of PU Classification where the class, for which instances are available, is highly imbalanced. However, every PU algorithm would require a lot of instances of the Positive Class, in our case, Reportables. In real life, instances of such rare events are difficult to come by and hence, obtaining enough training data to run a PU Classification is a luxury. In iCASSTLE, we propose a two-staged classification where Stage I leverages three unique components of text mining to capture a representative training data containing instances of both classes in the right proportions, and Stage II leverages results from Stage I to be fed into a semi-supervised set-up. The final Classification is given by Stage II and a top percentage of Stage I results. Furthermore, Reportables are arranged in order of a metric indicating the degree of Reportability. We applied this to multiple datasets differing in nature of Reportables as well as nature of imbalance. iCASSTLE was able to ensure recall maximisation of all such diverse datasets and therein lies its applicability.
Outline/Structure of the Talk
1. Definition of Rare Events (Probability Space) (1 min)
2. Difference b/w rare events & anomalies (why you shouldn't use anomaly detection for rare event classification) (1-2mins)
3. Introduction to Positive Unlabeled (PU) Learning (2 mins)
4. Problem Formulation - Detecting Product Safety Issues (a rare event) from Customer Reviews (2 mins)
5. iCASSTLE Overview (algorithm overview) (2 mins)
6. Metric Formulation for Stage I + Choice of Stage I parameters (5 mins)
7. Stage I Classification and forming the training data (1 min)
8. Stage II Classification: why semi-supervised + Entropy Regularized Logistic Regression (5 mins)
9. Comparison of performance with alternate techniques (1 min)
10. Closure (1 min)
Learning Outcome
In rare event detection, PU Learning has a natural application since one class (the rare class) is usually more important than the other class/classes (the everything-else class). Owing to the positive label hungry nature of PU Classification, we tend to use One Class Classification or, Anomaly Detection techniques instead. While One Class classifiers tend to give good results, using anomaly detection for rare events is wrong at its source (fundamental aspects are different for an anomaly vs a rare event) - this difference is one of the key learnings of the talk.
The market is flooded with Rare Events use cases but supervised algorithms demand a sizeable training set with right induction of imbalance to perform well. Also, we will stress how recall or precision (depending on use case) is a better validation metric than accuracy for rare cases. We will try to move beyond the world of standard deep learning techniques which are forever data hungry and learn how to crack the code when training data is minimal. Can our math look beyond the data provided? Well, it can if we train it to do so.
The talk allows us to build our own PU classifier capable of building its own training data from the unlabeled set using generic attributes available across most use cases. It aims at inspiring professionals to step out of the chains of pre-built packages and learn to experiment with set algorithms. Outside of tuning, can you combine flavours of existing algorithms and create a new one that helps your use case - you can if you put in a little effort to understand the math behind, and trust me, math is fun!
Target Audience
Industry Data Scientist who work with Rare Event Identification/NLP. Professionals interested on how to identify a rare event in the text space in absence of adequate training data
Prerequisites for Attendees
The talk is a somewhat advanced talk and given it is a 20 minute session, the audience will need to be familiar with the concepts of Sentiment Analysis, Text Embedding, Supervised Regression etc. I will discuss Entropy Regularized Logistic Regression in detail (but they need to understand the concept of Entropy beforehand). Other than that, the audience may want to read up a little on standard Positive Unlabeled (PU) Classification techniques (this is not a requisite but if the audience is familiar with PU Classification, they will be able to better appreciate why such PU techniques will fall short for a rare event).
Links
https://ieeexplore.ieee.org/document/8614190 (Link to IEEE Xplore site for the paper published in ICMLA'18)
https://drive.google.com/file/d/11HKO3CrAevYcrlUMwNgs9C9vltBmnijJ/view?usp=sharing (Link to the published paper)
https://drive.google.com/file/d/1ijsSMrUbLUVBmSMT_wIQlHk2jCwlrAG_/view?usp=sharing (Link to posted slides)