The Art of Effective Visualization of Multi-dimensional Data - A hands-on Approach
Descriptive Analytics is one of the core components of any analysis life-cycle pertaining to a data science project or even specific research. Data aggregation, summarization and visualization are some of the main pillars supporting this area of data analysis. However, dealing with multi-dimensional datasets with typically more than two attributes start causing problems, since our medium of data analysis and communication is typically restricted to two dimensions. We will explore some effective strategies of visualizing data in multiple dimensions (ranging from 1-D up to 6-D) using a hands-on approach with Python and popular open-source visualization libraries like matplotlib and seaborn. We will also do a brief coverage on excellent R visualization libraries like ggplot if we have time.
BONUS: We will also look at ways to visualize unstructured data with several dimensions including text, images and audio!
Outline/Structure of the Tutorial
The talk is usually a 90 minutes session but we will be covering it in the scheduled 45 minute session focusing on the main aspects of effective data visualization with the grammar of graphics, leveraging popular open-source frameworks in Python and also as a bonus cover visualization in unstructured data including text, audio and images.
Note: All the code and resources will be shared and open-sourced for your benefit! So you don't need to take extensive notes and can focus on the presentation\talk.
Outline:
- Introduction
- What is Data Visualization?
- Why Data Visualization?
- Motivation
- Why Effective Data Visualization
- Effective Multi-dimensional Data Visualization
- Whirlwind tour of the grammar of graphics
- Visualization tools and frameworks
- General tools & frameworks
- Python visualization frameworks
- R visualization frameworks
- Visualizing Structured Data
- Univariate analysis and visualizations
- Multivariate analysis and visualizations
- Visualizing from 1-D up to 6-D
- BONUS: Visualizing Unstructured Data
- Text
- Images
- Audio
- Final words
Learning Outcome
- Take a glance at the major data visulization frameworks
- Get a clear understanding of univariate and multi-variate visualization
- Learn effective strategies for visualizing data using the grammar of graphics
- Get a clear perspective on which visualization techniques work best based on specific scenarios
- Strategies for visualizing structured and unstructured data with actual examples
Target Audience
Data Enthusiasts, BI Developers, Data Scientists, Data Analysts
Prerequisites for Attendees
Knowledge of Python basics and data visualization techniques might be good but not essential since we will cover them during this session.
Video
Links
Please go to https://github.com/dipanjanS/art_of_data_visualization for all the resources for this talk
schedule Submitted 4 years ago
People who liked this proposal, also liked:
-
keyboard_arrow_down
Jared Lander - Machine Learning with R
480 Mins
Workshop
Beginner
Modern statistics has become almost synonymous with machine learning - a collection of techniques that utilize today's incredible computing power. Jared Lander walks you through the available methods for implementing machine learning algorithms in R and explores underlying theories such as the elastic net and boosted trees.
- Building the design matrix
- Penalized regression with the lasso and ridge methods
- Fitting models with glmnet
- Interactive visualization of the coefficient path
- Use cross-validation to choose the optimal lambda
- Visualize coefficients with coefplot
- Perform binary classification with a single tree with xgboost
- Train a boosted tree
- Tune xgboost hyperparameters
- Use validation data to understand performance
- Visualize variable importance
- Train a boosted random forest with xgboost
-
keyboard_arrow_down
Dipanjan Sarkar - Human Interpretable Machine Learning — The Need and Importance of Model Interpretation (with hands-on examples)
45 Mins
Talk
Beginner
The field of Machine Learning has gone through some phenomenal changes over the last decade. In the industry, the main focus of data science or machine learning is more ‘applied’ rather than theoretical and effective application of these models on the right data to solve complex real-world problems is of paramount importance.
A machine learning model by itself consists of an algorithm which tries to learn latent patterns and relationships from data without hard-coding fixed rules. Hence, explaining how a model works to the business always poses its own set of challenges. In this talk, I will be covering the need and importance of human interpretable machine learning approaches, look at effective strategies for model interpretation and several hands-on examples. Detailed coverage of open-source frameworks for machine learning model interpretation will also be one of the major focus areas. Examples will be showcased in Python.
-
keyboard_arrow_down
Dipanjan Sarkar - Unleash the Power of Deep Learning with Transfer Learning
45 Mins
Talk
Intermediate
Transfer learning is a machine learning \ deep learning technique where knowledge gained during training in one set of machine learning problem can be used to train other similar types of problems. This is an extremely useful approach to leveraging pre-trained models to solve real-world problems having constraints and limitations of less data availability.
This talk will cover essentials around deep learning and transfer learning concepts. The various methodologies of transfer learning. We will then look at diverse ways of how transfer learning can be applied in the real-world on complex problems around the following areas.
- Computer Vision
- Natural Language Processing
- Audio Categorization
We will briefly look at a multitude of real-world case studies and problems around the preceding areas like text classification, image classification, image captioning, style transfer and audio classification.
-
keyboard_arrow_down
joydeep bhattacharjee - Cutting edge NLP with fastText
45 Mins
Talk
Intermediate
FastText has been open-sourced by Facebook in 2016 and with its release, it became the fastest and most cutting edge library for text classification and word representation. It includes the implementation of two extremely important methodologies in NLP i.e Continuous Bag of Words and Skip-gram model. FastText performs exceptionally well with supervised as well as unsupervised learning.