Data Science Lead
Member since 2 years
Dipanjan (DJ) Sarkar is a Data Science Lead at Applied Materials, leading advanced analytics efforts around computer vision, natural language processing and deep learning. He is also a Google Developer Expert in Machine Learning. He has consulted and worked with several startups as well as Fortune 500 companies like Intel and Open Source organizations like Red Hat. He primarily works on leveraging data science, machine learning and deep learning to build large- scale intelligent systems. He holds a master of technology degree with specializations in Data Science and Software Engineering. He is also an avid supporter of self-learning and massive open online courses.
Dipanjan has been an analytics practitioner for several years now, specializing in machine learning, natural language processing, computer vision and deep learning. Having a passion for data science and education, he also acts as an AI Consultant and Mentor at various organizations like Springboard, where he helps people build their skills on areas like Data Science and Machine Learning. He also acts as a key contributor and Editor for Towards Data Science, a leading online journal focusing on Artificial Intelligence and Data Science. Dipanjan is also a published author, having authored several books on R, Python, Machine Learning, Social Media Analytics, Natural Language Processing, and Deep Learning.
Dipanjan's interests include learning about new technology, financial markets, disruptive start-ups, data science, artificial intelligence and deep learning. In his spare time he loves reading, gaming, watching popular sitcoms and football and writing interesting articles on https://firstname.lastname@example.org and https://www.linkedin.com/in/dipanzan. He is also a strong supporter of open-source and publishes his code and analyses from his books and articles on GitHub at https://github.com/dipanjanS.
Natural Language Processing Bootcamp - Zero to HeroAnuj GuptaHead of Machine Learning & Data ScienceVahanDipanjan SarkarData Science LeadApplied Materials
schedule 1 year agoSold Out!
Data is the new oil and unstructured data, especially text, images and videos contain a wealth of information. However, due to the inherent complexity in processing and analyzing this data, people often refrain from spending extra time and effort in venturing out from structured datasets to analyze these unstructured sources of data, which can be a potential gold mine. Natural Language Processing (NLP) is all about leveraging tools, techniques and algorithms to process and understand natural language based unstructured data - text, speech and so on.
Being specialized in domains like computer vision and natural language processing is no longer a luxury but a necessity which is expected of any data scientist in today’s fast-paced world! With a hands-on and interactive approach, we will understand essential concepts in NLP along with extensive case- studies and hands-on examples to master state-of-the-art tools, techniques and frameworks for actually applying NLP to solve real- world problems. We leverage Python 3 and the latest and best state-of- the-art frameworks including NLTK, Gensim, SpaCy, Scikit-Learn, TextBlob, Keras and TensorFlow to showcase our examples. You will be able to learn a fair bit of machine learning as well as deep learning in the context of NLP during this bootcamp.
In our journey in this field, we have struggled with various problems, faced many challenges, and learned various lessons over time. This workshop is our way of giving back a major chunk of the knowledge we’ve gained in the world of text analytics and natural language processing, where building a fancy word cloud from a bunch of text documents is not enough anymore. You might have had questions like ‘What is the right technique to solve a problem?’, ‘How does text summarization really work?’ and ‘Which are the best frameworks to solve multi-class text categorization?’ among many other questions! Based on our prior knowledge and learnings from publishing a couple of books in this domain, this workshop should help readers avoid some of the pressing issues in NLP and learn effective strategies to master NLP.
The intent of this workshop is to make you a hero in NLP so that you can start applying NLP to solve real-world problems. We start from zero and follow a comprehensive and structured approach to make you learn all the essentials in NLP. We will be covering the following aspects during the course of this workshop with hands-on examples and projects!
- Basics of Natural Language and Python for NLP tasks
- Text Processing and Wrangling
- Text Understanding - POS, NER, Parsing
- Text Representation - BOW, Embeddings, Contextual Embeddings
- Text Similarity and Content Recommenders
- Text Clustering
- Topic Modeling
- Text Summarization
- Sentiment Analysis - Unsupervised & Supervised
- Text Classification with Machine Learning and Deep Learning
- Multi-class & Multi-Label Text Classification
- Deep Transfer Learning and it's promise
- Applying Deep Transfer Learning - Universal Sentence Encoders, ELMo and BERT for NLP tasks
- Generative Deep Learning for NLP
- Next Steps
With over 10 hands-on projects, the bootcamp will be packed with plenty of hands-on examples for you to go through, try out and practice and we will try to keep theory to a minimum considering the limited time we have and the amount of ground we want to cover. We hope at the end of this workshop you can takeaway some useful methodologies to apply for solving NLP problems in the future. We will be using Python to showcase all our examples.
Leveraging AI to Enhance Developer Productivity & ConfidenceAvishkar GuptaData ScientistRed HatDipanjan SarkarData Science LeadApplied Materials
schedule 1 year agoSold Out!
A major approach to the application of AI is leveraging it to create a safer world around us, as well as that of helping people make choices. With the open source revolution having taken the world by a storm and developers relying on various upstream third party dependencies (too many to chose from!:http://www.modulecounts.com/) to develop applications moving petabytes of sensitive data and mission critical code that can lead to disastrous failures, it is required now more than ever to build better developer tooling to help developers make safer, better choices in terms of their dependencies as well as providing them with more insights around the code they are using. Thanks to deep learning, we are able to tackle these complex problems and this talk would be covering two diverse and interesting problems we have been trying to solve leveraging deep learning models (recommenders and NLP).
Though we are data scientists, at heart we are also developers building intelligent systems powered by AI. We, the Redhat developer group through our “Dependency Analytics” platform and extension, seek to do the same. We call this, 'AI-based insights for developers by developers'!
In this session we would be going into the details of the deep learning models we have implemented and deployed to solve two major problems:
- Dependency Recommendations: Recommend dependencies to a user for their specific application stack by trying to guess their intent by leveraging deep learning based recommender models.
- Pro-active Security and Vulnerability Analysis: We would also touch upon how our platform aims to make developer applications safer by way of CVE (Common Vulnerabilities and Exposures) analyses and the experimental deep learning models we have built to proactively identify potential vulnerabilities. We will talk about how we leveraged deep learning models for NLP to tackle this problem.
This shall be followed by a short architectural overview of the entire platform.
If we have enough time, we intend to showcase some sample code as a part of a tutorial of how we built these deep learning models and do a walkthrough of the same!
Explainable Artificial Intelligence - Demystifying the Hype
The field of Artificial Intelligence powered by Machine Learning and Deep Learning has gone through some phenomenal changes over the last decade. Starting off as just a pure academic and research-oriented domain, we have seen widespread industry adoption across diverse domains including retail, technology, healthcare, science and many more. More than often, the standard toolbox of machine learning, statistical or deep learning models remain the same. New models do come into existence like Capsule Networks, but industry adoption of the same usually takes several years. Hence, in the industry, the main focus of data science or machine learning is more ‘applied’ rather than theoretical and effective application of these models on the right data to solve complex real-world problems is of paramount importance.
A machine learning or deep learning model by itself consists of an algorithm which tries to learn latent patterns and relationships from data without hard-coding fixed rules. Hence, explaining how a model works to the business always poses its own set of challenges. There are some domains in the industry especially in the world of finance like insurance or banking where data scientists often end up having to use more traditional machine learning models (linear or tree-based). The reason being that model interpretability is very important for the business to explain each and every decision being taken by the model.However, this often leads to a sacrifice in performance. This is where complex models like ensembles and neural networks typically give us better and more accurate performance (since true relationships are rarely linear in nature).We, however, end up being unable to have proper interpretations for model decisions.
To address and talk about these gaps, I will take a conceptual yet hands-on approach where we will explore some of these challenges in-depth about explainable artificial intelligence (XAI) and human interpretable machine learning and even showcase with some examples using state-of-the-art model interpretation frameworks in Python!
Human Interpretable Machine Learning — The Need and Importance of Model Interpretation (with hands-on examples)
The field of Machine Learning has gone through some phenomenal changes over the last decade. In the industry, the main focus of data science or machine learning is more ‘applied’ rather than theoretical and effective application of these models on the right data to solve complex real-world problems is of paramount importance.
A machine learning model by itself consists of an algorithm which tries to learn latent patterns and relationships from data without hard-coding fixed rules. Hence, explaining how a model works to the business always poses its own set of challenges. In this talk, I will be covering the need and importance of human interpretable machine learning approaches, look at effective strategies for model interpretation and several hands-on examples. Detailed coverage of open-source frameworks for machine learning model interpretation will also be one of the major focus areas. Examples will be showcased in Python.
Unleash the Power of Deep Learning with Transfer Learning
Transfer learning is a machine learning \ deep learning technique where knowledge gained during training in one set of machine learning problem can be used to train other similar types of problems. This is an extremely useful approach to leveraging pre-trained models to solve real-world problems having constraints and limitations of less data availability.
This talk will cover essentials around deep learning and transfer learning concepts. The various methodologies of transfer learning. We will then look at diverse ways of how transfer learning can be applied in the real-world on complex problems around the following areas.
- Computer Vision
- Natural Language Processing
- Audio Categorization
We will briefly look at a multitude of real-world case studies and problems around the preceding areas like text classification, image classification, image captioning, style transfer and audio classification.
The Art of Effective Visualization of Multi-dimensional Data - A hands-on Approach
Descriptive Analytics is one of the core components of any analysis life-cycle pertaining to a data science project or even specific research. Data aggregation, summarization and visualization are some of the main pillars supporting this area of data analysis. However, dealing with multi-dimensional datasets with typically more than two attributes start causing problems, since our medium of data analysis and communication is typically restricted to two dimensions. We will explore some effective strategies of visualizing data in multiple dimensions (ranging from 1-D up to 6-D) using a hands-on approach with Python and popular open-source visualization libraries like matplotlib and seaborn. We will also do a brief coverage on excellent R visualization libraries like ggplot if we have time.
BONUS: We will also look at ways to visualize unstructured data with several dimensions including text, images and audio!
No more submissions exist.
No more submissions exist.