Data is the new oil and unstructured data, especially text, images and videos contain a wealth of information. However, due to the inherent complexity in processing and analyzing this data, people often refrain from spending extra time and effort in venturing out from structured datasets to analyze these unstructured sources of data, which can be a potential gold mine. Natural Language Processing (NLP) is all about leveraging tools, techniques, and algorithms to process and understand natural language-based unstructured data - text, speech and so on.
Being specialized in domains like computer vision and natural language processing is no longer a luxury but a necessity that is expected of any data scientist in today’s fast-paced world! With a hands-on and interactive approach, we will understand essential concepts in NLP along with the extensive case- studies and hands-on examples to master state-of-the-art tools, techniques, and frameworks for actually applying NLP to solve real-world problems. We leverage Python 3 and the latest and best state-of-the-art frameworks including NLTK, Gensim, SpaCy, Scikit-Learn, TextBlob, Keras and TensorFlow to showcase our examples. You will learn a fair bit of machine learning as well as deep learning in the context of NLP during this bootcamp.
In our journey in this field, we have struggled with various problems, faced many challenges, and learned various lessons over time. This workshop is our way of giving back a major chunk of the knowledge we’ve gained in the world of text analytics and natural language processing, where building a fancy word cloud from a bunch of text documents is not enough anymore. You might have had questions like ‘What is the right technique to solve a problem?’, ‘How does text summarization really work?’ and ‘Which are the best frameworks to solve multi-class text categorization?’ among many other questions! Based on our prior knowledge and learnings from publishing a couple of books in this domain, this workshop should help readers avoid some of the pressing issues in NLP and learn effective strategies to master NLP.
The intent of this workshop is to make you a hero in NLP so that you can start applying NLP to solve real-world problems. We start from zero and follow a comprehensive and structured approach to make you learn all the essentials in NLP. We will be covering the following aspects during the course of this workshop with hands-on examples and projects!
- Basics of Natural Language and Python for NLP tasks
- Text Processing and Wrangling
- Text Understanding - POS, NER, Parsing
- Text Representation - BOW, Embeddings, Contextual Embeddings
- Text Similarity and Content Recommenders
- Text Clustering
- Topic Modeling
- Text Summarization
- Sentiment Analysis - Unsupervised & Supervised
- Text Classification with Machine Learning and Deep Learning
- Multi-class & Multi-Label Text Classification
- Deep Transfer Learning and it's promise
- Applying Deep Transfer Learning - Universal Sentence Encoders, ELMo and BERT for NLP tasks
- Generative Deep Learning for NLP
- Next Steps
With over 10 hands-on projects, the bootcamp will be packed with plenty of practical examples for you to go through, try out and practice and we will try to keep theory to a minimum considering the limited time we have and the amount of ground we want to cover. We hope at the end of this workshop you can take away some useful methodologies to apply for solving NLP problems in the future. We will be using Python to showcase all our examples.
Dipanjan (DJ) Sarkar is a Data Scientist at Red Hat, a published author, and a consultant and trainer. He has consulted and worked with several startups as well as Fortune 500 companies like Intel. He primarily works on leveraging data science, advanced analytics, machine learning and deep learning to build large- scale intelligent systems. He holds a master of technology degree with specializations in Data Science and Software Engineering. He is also an avid supporter of self-learning and massive open online courses. He has recently ventured into the world of open-source products to improve the productivity of developers across the world.
Dipanjan has been an analytics practitioner for several years now, specializing in machine learning, natural language processing, statistical methods, and deep learning. Having a passion for data science and education, he also acts as an AI Consultant and Mentor at various organizations like Springboard, where he helps people build their skills in areas like Data Science and Machine Learning. He also acts as a key contributor and Editor for Towards Data Science, a leading online journal focusing on Artificial Intelligence and Data Science. Dipanjan has also authored several books on R, Python, Machine Learning, Social Media Analytics, Natural Language Processing, and Deep Learning.
Dipanjan's interests include learning about new technology, financial markets, disruptive start-ups, data science, artificial intelligence, and deep learning. In his spare time, he loves reading, gaming, watching popular sitcoms and football and writing interesting articles on https://firstname.lastname@example.org and https://www.linkedin.com/in/dipanzan. He is also a strong supporter of open-source and publishes his code and analyses from his books and articles on GitHub at https://github.com/dipanjanS.
I am part of the Intuit AI team. Prior to this, I was heading ML efforts for Huawei Technologies, Freshworks, Chennai and Airwoot, Delhi. I did my masters in theoretical computer science from IIIT Hyderabad and I dropped out of my Ph.D. from IIT Delhi to work with startups.
I am a regular speaker at ML conferences like Pydata, Nvidia forums, Fifth Elephant, Anthill. I have also conducted a bunch of workshops attended by machine learning practitioners. I am also the co-organizer for one of the early Deep Learning meetups in Bangalore.
Zeta, part of the Directi Group, is a FinTech leader offering a wide range of digitised solutions for corporates, such as digital employee tax benefits, rewards and recognition, and automated cafeteria solutions. Zeta's innovative cloud-based smart benefits suite called Zeta Tax Benefits aims to digitise all forms of tax-saving reimbursements for employees, such as mobile reimbursements, fuel reimbursements, gadget reimbursements, gift card, and LTA. One of Zeta's unique offerings includes the Zeta Express, an end-to-end digital cafeteria solution that is revolutionising the corporate cafeteria space. At Zeta, we also aim to empower HR managers to help them engage their employees in a better way. Spotlight, our digitised solution for rewards and recognition, offers a choice of multiple gifting and incentive solutions for employee appreciation, tailored to all the needs for employee gifting, channel partners and vendors.
Pre-requisites for Attendees
- Basic knowledge of Python and Machine Learning \ Deep Learning helps.
- All the examples will be covered in Python.
- Having a system with a GPU or access to a GPU helps since then you can run all the examples during the workshop itself. We will walk through everything anyway during the workshop.
The following is the rough structure of the workshop subject to some minor changes.
- Introduction to Natural Language Processing
- Python for NLP
- Text pre-processing and Wrangling
- Removing HTML tags\noise
- Removing accented characters
- Removing special characters\symbols
- Handling contractions
- Stop word removal
- Hands-on Project: Building a text pre-processor with multi-threading
- Text Understanding
- POS (Parts of Speech) Tagging
- Text Parsing (Shallow, Dependency, Constituency)
- NER (Named Entity Recognition) Tagging
- Hands-on Project: Build your own NER Tagger - Statistical Models & Deep Learning Models
- Text Representation – Feature Engineering
- Traditional Statistical Models – BOW, TF-IDF
- Newer Deep Learning Models for word embeddings – Word2Vec, GloVe, FastText
- Contextual word embeddings - ELMo, BERT
- Hands-on Project: Interactive exploration of Word Embeddings
- Hands-on Project: Similarity and Movie Recommendations with different text representations
- Hands-on Project: Sentiment Analysis using unsupervised learning & supervised learning
- Hands-on Project: Text Clustering of Movies
- Hands-on Project: Text Summarization Methods - Statistical & Deep Learning
- Hands-on Project: Topic Modeling - explore current research trends in AI
- Hands-on Project: Text Classification Models
- Traditional Machine Learning Models
- Deep Neural Nets
- Convolutional Neural Networks (CNNs)
- Long-Short Term Memory Networks (LSTMs)
- Bi-directional LSTMs \ GRUs
- Deep Transfer Learning Models
- Promise of Deep Transfer Learning for NLP
- Hands-on Project: Deep Transfer Learning with ELMo, BERT, Universal Sentence Embeddings
- Hands-on Project: Generative Deep Learning for NLP
- Conclusion and Next Steps
- Learn and understand popular NLP workflows with interactive examples
- Covers concepts and interactive projects on cleaning and handling noisy unstructured text data including duplicate checks, spelling corrections, and text wrangling
- Build Parsers and NER taggers and parse text data to understand it better
- Understand, build and explore text semantics and representations with traditional statistical models and newer word embedding and contextual embedding models based on deep learning
- Projects on popular NLP tasks including text classification, sentiment analysis, text clustering, summarization, topic models and recommendations
- Recent state-of-the-art cutting edge research implementation on deep transfer learning and generative deep learning for NLP
- Learn and implement the latest and best in state-of-art-models in NLP including ELMo, BERT and so on.
- Learn best practices and robust methodologies for NLP with the entire codebase shared with the workshop participants to take home even after the workshop
- Over 10 Hands-on Projects showcasing the best in NLP