A Hands-on Introduction to Natural Language Processing

Data is the new oil and unstructured data, especially text, images and
videos contain a wealth of information. However, due to the inherent
complexity in processing and analyzing this data, people often refrain
from spending extra time and effort in venturing out from structured
datasets to analyze these unstructured sources of data, which can be a
potential gold mine. Natural Language Processing (NLP) is all about
leveraging tools, techniques and algorithms to process and understand
natural language-based data, which is usually unstructured like text,
speech and so on. In this workshop, we will be looking at tried and tested
strategies, techniques and workflows which can be leveraged by
practitioners and data scientists to extract useful insights from text data.


Being specialized in domains like computer vision and natural language
processing is no longer a luxury but a necessity which is expected of
any data scientist in today’s fast-paced world! With a hands-on and interactive approach, we will understand essential concepts in NLP along with extensive case-
studies and hands-on examples to master state-of-the-art tools,
techniques and frameworks for actually applying NLP to solve real-
world problems. We leverage Python 3 and the latest and best state-of-
the-art frameworks including NLTK, Gensim, SpaCy, Scikit-Learn,
TextBlob, Keras and TensorFlow to showcase our examples.


In my journey in this field so far, I have struggled with various problems,
faced many challenges, and learned various lessons over time. This
workshop will contain a major chunk of the knowledge I’ve gained in the world
of text analytics and natural language processing, where building a
fancy word cloud from a bunch of text documents is not enough
anymore. Perhaps the biggest problem with regard to learning text
analytics is not a lack of information but too much information, often
called information overload. There are so many resources,
documentation, papers, books, and journals containing so much content
that they often overwhelm someone new to the field. You might have
had questions like ‘What is the right technique to solve a problem?’,
‘How does text summarization really work?’ and ‘Which are the best
frameworks to solve multi-class text categorization?’ among many other
questions! Based on my prior knowledge and learnings from publishing a couple of books in this domain, this workshop should help readers avoid the pressing
issues I’ve faced in my journey so far and learn the strategies to master NLP.


This workshop follows a comprehensive and structured approach. First it
tackles the basics of natural language understanding and Python for
handling text data in the initial chapters. Once you’re familiar with the
basics, we cover text processing, parsing and understanding. Then, we
address interesting problems in text analytics in each of the remaining
chapters, including text classification, clustering and similarity analysis,
text summarization and topic models, semantic analysis and named
entity recognition, sentiment analysis and model interpretation. The last
chapter is an interesting chapter on the recent advancements made in
NLP thanks to deep learning and transfer learning and we cover an
example of text classification with universal sentence embeddings.

 
1 favorite thumb_down thumb_up 0 comments visibility_off  Remove from Watchlist visibility  Add to Watchlist
 

Outline/Structure of the Workshop

The following is the rough structure of the workshop

  1. Introduction to Natural Language Processing
  2. Text pre-processing and Wrangling
    • Removing HTML tags\noise
    • Removing accented characters
    • Removing special characters\symbols
    • Handling contractions
    • Stemming
    • Lemmatization
    • Stop word removal
  3. Project: Build a duplicate character removal module
  4. Project: Build a spell-check and correction module
  5. Project: Build an end-to-end text pre-processor
  6. Text Understanding
    • POS (Parts of Speech) Tagging
    • Text Parsing
      • Shallow Parsing
      • Dependency Parsing
      • Constituency Parsing
    • NER (Named Entity Recognition) Tagging
  7. Project: Build your own POS Tagger
  8. Project: Build your own NER Tagger
  9. Text Representation – Feature Engineering
    • Traditional Statistical Models – BOW, TF-IDF
    • Newer Deep Learning Models for word embeddings – Word2Vec, GloVe, FastText
  10. Project: Similarity and Movie Recommendations
  11. Project: Interactive exploration of Word Embeddings
  12. Case Studies for other common NLP Tasks
    • Project: Sentiment Analysis using unsupervised learning and supervised learning (machine and deep learning)
    • Project: Text Clustering (grouping similar movies)
    • Project: Text Summarization and Topic Models
  13. Promise of Deep Learning for NLP, Transfer and Generative Learning
  14. Hands-on with universal sentence embeddings in deep learning

Learning Outcome

  • Learn and understand popular NLP workflows with interactive examples
  • Covers concepts and interactive projects on cleaning and handling noisy unstructured text data including duplicate checks, spelling corrections and text wrangling
  • Build your own POS and NER taggers and parse text data to understand it better
  • Understand, build and explore text semantics and representations with traditional statistical models and newer word embedding models
  • Projects on popular NLP tasks including text classification, sentiment analysis, text clustering, summarization, topic models and recommendations
  • Recent state-of-the-art cutting edge research implementation on deep transfer learning for NLP

Target Audience

Data Scientists, Engineers, Developers, AI Enthusiasts, Linguistic Experts

Prerequisite

Basic knowledge of Python and Machine Learning.

All the examples will be covered in Python

schedule Submitted 1 month ago

Public Feedback

comment Suggest improvements to the Speaker