Short Abstract

It is a well known fact that the more data we have, the better performance ML models can achieve. However, getting a large amount of training data annotated is a luxury most practitioners cannot afford. Computer vision has circumvented this via data augmentation techniques and has reaped rich benefits. Can NLP not do the same? In this talk we will look at various techniques available for practitioners to augment data for their NLP application and various bells and whistles around these techniques.

 

Long Abstract

In the area of AI, it is a well established fact that data beats algorithms i.e. large amounts of data with a simple algorithm often yields far superior results as compared to the best algorithm with little data. This is especially true for Deep learning algorithms that are known to be data guzzlers. Getting data labeled at scale is a luxury most practitioners cannot afford. What does one do in such a scenario?

 

This is where Data augmentation comes into play. Data augmentation is a set of techniques to increase the size of datasets and introduce more variability in the data. This helps to train better and more robust models. Data augmentation is very popular in the area of computer vision. From simple techniques like rotation, translation, adding salt etc to GANs, we have a whole range of techniques to augment images. It is a well known fact that augmentation is one of the key anchors when it comes to success of computer vision models in industrial applications.

 

Most natural language processing (NLP) projects in industry still suffer from data scarcity. This is where recent advances in data augmentation for NLP can come very helpful. When it comes to NLP, data augmentation is not that straight forward. You want to augment data while keeping the syntactic and semantic properties of the text. In this talk we will take a deep dive into the world of various techniques that are available to practitioners to augment data for NLP. The talk is meant for Data Scientists, NLP engineers, ML engineers and industry leaders working on NLP problems.

 
 

Outline/Structure of the Talk

  • What is data augmentation
  • Why data augmentation is tricky in NLP
  • Recent advances in data augmentation in NLP
    • Deep Dive: various techniques for data augmentation in NLP
    • Pros and cons of various techniques
    • Practical tips
  • Concrete case study of applying these techniques to our work.

Learning Outcome

  • What is data augmentation
  • Why data augmentation is tricky in NLP
  • Recent advances in data augmentation in NLP
    • Deep Dive: various techniques for data augmentation in NLP
    • Pros and cons of various techniques
    • Practical tips
  • Concrete case study of applying these techniques to our work.

Target Audience

The talk is meant for Data Scientists, NLP engineers, ML engineers and industry leaders working on NLP problems.

Prerequisites for Attendees

None

Slides


schedule Submitted 3 years ago
help