Member since 8 months
Dipanjan (DJ) Sarkar is a Data Scientist at Red Hat, a published author and a consultant
and trainer. He has consulted and worked with several startups as well as Fortune 500
companies like Intel. He primarily works on leveraging data science, advanced analytics,
machine learning and deep learning to build large- scale intelligent systems. He holds a
master of technology degree with specializations in Data Science and Software
Engineering. He is also an avid supporter of self-learning and massive open online
courses. He has recently ventured into the world of open-source products to improve
the productivity of developers across the world.
Dipanjan has been an analytics practitioner for several years now, specializing in
machine learning, natural language processing, statistical methods and deep learning.
Having a passion for data science and education, he also acts as an AI Consultant and
Mentor at various organizations like Springboard, where he helps people build their
skills on areas like Data Science and Machine Learning. He also acts as a key
contributor and Editor for Towards Data Science, a leading online journal focusing on
Artificial Intelligence and Data Science. Dipanjan has also authored several books on R, Python, Machine Learning, Social Media Analytics, Natural Language Processing and
Dipanjan's interests include learning about new technology, financial markets, disruptive
start-ups, data science, artificial intelligence and deep learning. In his spare time he
loves reading, gaming, watching popular sitcoms and football and writing interesting
articles on https://email@example.com and https://www.linkedin.com/in/dipanzan. He is also a strong supporter of open-source and publishes his code and analyses from his
books and articles on GitHub at https://github.com/dipanjanS.
A Hands-on Introduction to Natural Language ProcessingDipanjan SarkarData ScientistRed Hat
schedule 2 months agoSold Out!
Data is the new oil and unstructured data, especially text, images and
videos contain a wealth of information. However, due to the inherent
complexity in processing and analyzing this data, people often refrain
from spending extra time and effort in venturing out from structured
datasets to analyze these unstructured sources of data, which can be a
potential gold mine. Natural Language Processing (NLP) is all about
leveraging tools, techniques and algorithms to process and understand
natural language-based data, which is usually unstructured like text,
speech and so on. In this workshop, we will be looking at tried and tested
strategies, techniques and workflows which can be leveraged by
practitioners and data scientists to extract useful insights from text data.
Being specialized in domains like computer vision and natural language
processing is no longer a luxury but a necessity which is expected of
any data scientist in today’s fast-paced world! With a hands-on and interactive approach, we will understand essential concepts in NLP along with extensive case-
studies and hands-on examples to master state-of-the-art tools,
techniques and frameworks for actually applying NLP to solve real-
world problems. We leverage Python 3 and the latest and best state-of-
the-art frameworks including NLTK, Gensim, SpaCy, Scikit-Learn,
TextBlob, Keras and TensorFlow to showcase our examples.
In my journey in this field so far, I have struggled with various problems,
faced many challenges, and learned various lessons over time. This
workshop will contain a major chunk of the knowledge I’ve gained in the world
of text analytics and natural language processing, where building a
fancy word cloud from a bunch of text documents is not enough
anymore. Perhaps the biggest problem with regard to learning text
analytics is not a lack of information but too much information, often
called information overload. There are so many resources,
documentation, papers, books, and journals containing so much content
that they often overwhelm someone new to the field. You might have
had questions like ‘What is the right technique to solve a problem?’,
‘How does text summarization really work?’ and ‘Which are the best
frameworks to solve multi-class text categorization?’ among many other
questions! Based on my prior knowledge and learnings from publishing a couple of books in this domain, this workshop should help readers avoid the pressing
issues I’ve faced in my journey so far and learn the strategies to master NLP.
This workshop follows a comprehensive and structured approach. First it
tackles the basics of natural language understanding and Python for
handling text data in the initial chapters. Once you’re familiar with the
basics, we cover text processing, parsing and understanding. Then, we
address interesting problems in text analytics in each of the remaining
chapters, including text classification, clustering and similarity analysis,
text summarization and topic models, semantic analysis and named
entity recognition, sentiment analysis and model interpretation. The last
chapter is an interesting chapter on the recent advancements made in
NLP thanks to deep learning and transfer learning and we cover an
example of text classification with universal sentence embeddings.
Explainable Artificial Intelligence - Demystifying the HypeDipanjan SarkarData ScientistRed Hat
schedule 2 months agoSold Out!
The field of Artificial Intelligence powered by Machine Learning and Deep Learning has gone through some phenomenal changes over the last decade. Starting off as just a pure academic and research-oriented domain, we have seen widespread industry adoption across diverse domains including retail, technology, healthcare, science and many more. More than often, the standard toolbox of machine learning, statistical or deep learning models remain the same. New models do come into existence like Capsule Networks, but industry adoption of the same usually takes several years. Hence, in the industry, the main focus of data science or machine learning is more ‘applied’ rather than theoretical and effective application of these models on the right data to solve complex real-world problems is of paramount importance.
A machine learning or deep learning model by itself consists of an algorithm which tries to learn latent patterns and relationships from data without hard-coding fixed rules. Hence, explaining how a model works to the business always poses its own set of challenges. There are some domains in the industry especially in the world of finance like insurance or banking where data scientists often end up having to use more traditional machine learning models (linear or tree-based). The reason being that model interpretability is very important for the business to explain each and every decision being taken by the model.However, this often leads to a sacrifice in performance. This is where complex models like ensembles and neural networks typically give us better and more accurate performance (since true relationships are rarely linear in nature).We, however, end up being unable to have proper interpretations for model decisions.
To address and talk about these gaps, I will take a conceptual yet hands-on approach where we will explore some of these challenges in-depth about explainable artificial intelligence (XAI) and human interpretable machine learning and even showcase with some examples using state-of-the-art model interpretation frameworks in Python!
Human Interpretable Machine Learning — The Need and Importance of Model Interpretation (with hands-on examples)
The field of Machine Learning has gone through some phenomenal changes over the last decade. In the industry, the main focus of data science or machine learning is more ‘applied’ rather than theoretical and effective application of these models on the right data to solve complex real-world problems is of paramount importance.
A machine learning model by itself consists of an algorithm which tries to learn latent patterns and relationships from data without hard-coding fixed rules. Hence, explaining how a model works to the business always poses its own set of challenges. In this talk, I will be covering the need and importance of human interpretable machine learning approaches, look at effective strategies for model interpretation and several hands-on examples. Detailed coverage of open-source frameworks for machine learning model interpretation will also be one of the major focus areas. Examples will be showcased in Python.
Unleash the Power of Deep Learning with Transfer Learning
Transfer learning is a machine learning \ deep learning technique where knowledge gained during training in one set of machine learning problem can be used to train other similar types of problems. This is an extremely useful approach to leveraging pre-trained models to solve real-world problems having constraints and limitations of less data availability.
This talk will cover essentials around deep learning and transfer learning concepts. The various methodologies of transfer learning. We will then look at diverse ways of how transfer learning can be applied in the real-world on complex problems around the following areas.
- Computer Vision
- Natural Language Processing
- Audio Categorization
We will briefly look at a multitude of real-world case studies and problems around the preceding areas like text classification, image classification, image captioning, style transfer and audio classification.
The Art of Effective Visualization of Multi-dimensional Data - A hands-on Approach
Descriptive Analytics is one of the core components of any analysis life-cycle pertaining to a data science project or even specific research. Data aggregation, summarization and visualization are some of the main pillars supporting this area of data analysis. However, dealing with multi-dimensional datasets with typically more than two attributes start causing problems, since our medium of data analysis and communication is typically restricted to two dimensions. We will explore some effective strategies of visualizing data in multiple dimensions (ranging from 1-D up to 6-D) using a hands-on approach with Python and popular open-source visualization libraries like matplotlib and seaborn. We will also do a brief coverage on excellent R visualization libraries like ggplot if we have time.
BONUS: We will also look at ways to visualize unstructured data with several dimensions including text, images and audio!
No more submissions exist.
No more submissions exist.