Pre-trained language models such as BERT have performed very well for various NLP tasks like text classification, question answering etc. Given BERT success, industry practitioners are actively experimenting with fine-tuning BERT to build NLP applications for solving their use cases like search recommendation, sentiment analysis, opinion mining etc. As compared to the benchmark datasets, datasets used to build industrial NLP applications are often much more noisy. While BERT has performed exceedingly well for transferring the learnings from one use case to another, it remains unclear how BERT performs when fine tuned on non-canonical text.
In this talk, we systematically dig deeper into BERT architecture and show the effect of noisy text data on final outcome. We systematically show that when the text data is noisy (spelling mistakes, typos), there is a significant degradation in the performance of BERT. We further analyze the reasons and shortcomings in the existing BERT pipeline that are responsible for this drop in performance.
This work is motivated from the business use case where we are building a dialogue system over WhatsApp to screen candidates for blue collar jobs. Our candidate user base often comes from underprivileged backgrounds, hence most of them are unable to complete college graduation. This coupled with fat finger problem over a mobile keypad leads to a lot of typos and spelling mistakes in the responses received by our dialogue system.
While this work is motivated from our business use case, our findings are applicable across various use cases in industry that deal with non-canonical text - from sentiment classification on twitter data to entity extraction over text collected from discussion forums.