Natural Language Processing Bootcamp - Zero to Hero
Data is the new oil and unstructured data, especially text, images and videos contain a wealth of information. However, due to the inherent complexity in processing and analyzing this data, people often refrain from spending extra time and effort in venturing out from structured datasets to analyze these unstructured sources of data, which can be a potential gold mine. Natural Language Processing (NLP) is all about leveraging tools, techniques and algorithms to process and understand natural language based unstructured data - text, speech and so on.
Being specialized in domains like computer vision and natural language processing is no longer a luxury but a necessity which is expected of any data scientist in today’s fast-paced world! With a hands-on and interactive approach, we will understand essential concepts in NLP along with extensive case- studies and hands-on examples to master state-of-the-art tools, techniques and frameworks for actually applying NLP to solve real- world problems. We leverage Python 3 and the latest and best state-of- the-art frameworks including NLTK, Gensim, SpaCy, Scikit-Learn, TextBlob, Keras and TensorFlow to showcase our examples. You will be able to learn a fair bit of machine learning as well as deep learning in the context of NLP during this bootcamp.
In our journey in this field, we have struggled with various problems, faced many challenges, and learned various lessons over time. This workshop is our way of giving back a major chunk of the knowledge we’ve gained in the world of text analytics and natural language processing, where building a fancy word cloud from a bunch of text documents is not enough anymore. You might have had questions like ‘What is the right technique to solve a problem?’, ‘How does text summarization really work?’ and ‘Which are the best frameworks to solve multi-class text categorization?’ among many other questions! Based on our prior knowledge and learnings from publishing a couple of books in this domain, this workshop should help readers avoid some of the pressing issues in NLP and learn effective strategies to master NLP.
The intent of this workshop is to make you a hero in NLP so that you can start applying NLP to solve real-world problems. We start from zero and follow a comprehensive and structured approach to make you learn all the essentials in NLP. We will be covering the following aspects during the course of this workshop with hands-on examples and projects!
- Basics of Natural Language and Python for NLP tasks
- Text Processing and Wrangling
- Text Understanding - POS, NER, Parsing
- Text Representation - BOW, Embeddings, Contextual Embeddings
- Text Similarity and Content Recommenders
- Text Clustering
- Topic Modeling
- Text Summarization
- Sentiment Analysis - Unsupervised & Supervised
- Text Classification with Machine Learning and Deep Learning
- Multi-class & Multi-Label Text Classification
- Deep Transfer Learning and it's promise
- Applying Deep Transfer Learning - Universal Sentence Encoders, ELMo and BERT for NLP tasks
- Generative Deep Learning for NLP
- Next Steps
With over 10 hands-on projects, the bootcamp will be packed with plenty of hands-on examples for you to go through, try out and practice and we will try to keep theory to a minimum considering the limited time we have and the amount of ground we want to cover. We hope at the end of this workshop you can takeaway some useful methodologies to apply for solving NLP problems in the future. We will be using Python to showcase all our examples.
Outline/Structure of the Demonstration
The following is the rough structure of the workshop subject to some minor changes.
- Introduction to Natural Language Processing
- Python for NLP
- Text pre-processing and Wrangling
- Removing HTML tags\noise
- Removing accented characters
- Removing special characters\symbols
- Handling contractions
- Stemming
- Lemmatization
- Stop word removal
- Hands-on Project: Building a text pre-processor with multi-threading
- Text Understanding
- POS (Parts of Speech) Tagging
- Text Parsing (Shallow, Dependency, Constituency)
- NER (Named Entity Recognition) Tagging
- Hands-on Project: Build your own NER Tagger - Statistical Models & Deep Learning Models
- Text Representation – Feature Engineering
- Traditional Statistical Models – BOW, TF-IDF
- Newer Deep Learning Models for word embeddings – Word2Vec, GloVe, FastText
- Contextual word embeddings - ELMo, BERT
- Hands-on Project: Interactive exploration of Word Embeddings
- Hands-on Project: Similarity and Movie Recommendations with different text representations
- Hands-on Project: Sentiment Analysis using unsupervised learning & supervised learning
- Hands-on Project: Text Clustering of Movies
- Hands-on Project: Text Summarization Methods - Statistical & Deep Learning
- Hands-on Project: Topic Modeling - explore current research trends in AI
- Hands-on Project: Text Classification Models
- Traditional Machine Learning Models
- Deep Neural Nets
- Convolutional Neural Networks (CNNs)
- Long-Short Term Memory Networks (LSTMs)
- Bi-directional LSTMs \ GRUs
- Deep Transfer Learning Models
- Promise of Deep Transfer Learning for NLP
- Hands-on Project: Deep Transfer Learning with ELMo, BERT, Universal Sentence Embeddings
- Hands-on Project: Generative Deep Learning for NLP
- Conclusion and Next Steps
Learning Outcome
- Learn and understand popular NLP workflows with interactive examples
- Covers concepts and interactive projects on cleaning and handling noisy unstructured text data including duplicate checks, spelling corrections and text wrangling
- Build Parsers and NER taggers and parse text data to understand it better
- Understand, build and explore text semantics and representations with traditional statistical models and newer word embedding and contextual embedding models based on deep learning
- Projects on popular NLP tasks including text classification, sentiment analysis, text clustering, summarization, topic models and recommendations
- Recent state-of-the-art cutting edge research implementation on deep transfer learning and generative deep learning for NLP
- Learn and implement the latest and best in state-of-art-models in NLP including ELMo, BERT and so on.
- Learn best practices and robust methodologies for NLP with the entire codebase shared with the workshop participants to take home even after the workshop
- Over 10 Hands-on Projects showcasing the best in NLP
Target Audience
Data Scientists, Engineers, Developers, AI Enthusiasts, Linguistic Experts, NLP teams
Prerequisites for Attendees
Basic knowledge of Python and Machine Learning \ Deep Learning helps.
All the examples will be covered in Python.
You dont need a system with a GPU as we will be using Google Collab
Links
This is based on my workshop last year with Dipanjan
We will be adding in new content also around topic models, deep transfer learning for NLP also.
I am co-authoring a book on NLP. You can checkout details here: http://www.practicalnlp.ai/
Speaker at
- ODSC India 2018 : https://www.youtube.com/watch?v=FoR_-ELAcfE
- ODSC India 2019: https://www.youtube.com/watch?v=pI5l4YJn3EE
- ODSC East 2020: