Guided Analytics - Building Applications for Automated Machine Learning
In recent years, a wealth of tools has appeared that automate the machine learning cycle inside a black box. We take a different stance. Automation should not result in black boxes, hiding the interesting pieces from everyone. Modern data science should allow automation and interaction to be combined flexibly into a more transparent solution.
In some specific cases, if the analysis scenario is well defined, then full automation might make sense. However, more often than not, these scenarios are not that well defined and not that easy to control. In these cases, a certain amount of interaction with the user is highly desirable.
By mixing and matching interaction with automation, we can use Guided Analytics to develop predictive models on the fly. More interestingly, by leveraging automated machine learning and interactive dashboard components, custom Guided Analytics Applications, tailored to your business needs, can be created in a few minutes.
We'll build an application for automated machine learning using KNIME Software. It will have an input user interface to control the settings for data preparation, model training (e.g. using deep learning, random forest, etc.), hyperparameter optimization, and feature engineering. We'll also create an interactive dashboard to visualize the results with model interpretability techniques. At the conclusion of the workshop, the application will be deployed and run from a web browser.
Outline/Structure of the Tutorial
- Building Applications for Automated Machine Learning (20 min)
- Introduction to KNIME Analytics Platform (15 min)
- Let's build a Guided Analytics Workflow (50 min)
- Live Demo of the Solutions (5 min)
Learning Outcome
In this workshop you will learn the following things:
- easily automate your own machine learning cycle with KNIME components;
- setting up custom dashboards with interactive data visualizations to explore your data;
- understanding machine learning interpretability visualizations (shapley, partial dependence, ect.);
- creating a KNIME workflow that is remotely accessible via a user friendly and customized web-based application;
Target Audience
Data Scientists and Data Analysts, interested in automating parts of their machine learning cycle
Prerequisites for Attendees
This is a hands on workshop. Please bring your laptop, ideally with KNIME Analytics Platform pre-installed.
A precise step by step installation guide and the required workshop materials are available here:
Video
Links
knime.com/blog/how-to-automate-machine-learning
knime.com/blog/guided-automation-for-machine-learning-part-ii
knime.com/blog/intelligently-automating-machine-learning-artificial-intelligence-and-data-science
knime.com/blog/principles-of-guided-analytics
schedule Submitted 3 years ago
People who liked this proposal, also liked:
-
keyboard_arrow_down
Viral B. Shah - Models as Code Differentiable Programming with Julia
45 Mins
Keynote
Intermediate
Since we originally proposed the need for a first-class language, compiler and ecosystem for machine learning (ML) - a view that is increasingly shared by many, there have been plenty of interesting developments in the field. Not only have the tradeoffs in existing systems, such as TensorFlow and PyTorch, not been resolved, but they are clearer than ever now that both frameworks contain distinct "static graph" and "eager execution" interfaces. Meanwhile, the idea of ML models fundamentally being differentiable algorithms – often called differentiable programming – has caught on.
Where current frameworks fall short, several exciting new projects have sprung up that dispense with graphs entirely, to bring differentiable programming to the mainstream. Myia, by the Theano team, differentiates and compiles a subset of Python to high-performance GPU code. Swift for TensorFlow extends Swift so that compatible functions can be compiled to TensorFlow graphs. And finally, the Flux ecosystem is extending Julia’s compiler with a number of ML-focused tools, including first-class gradients, just-in-time CUDA kernel compilation, automatic batching and support for new hardware such as TPUs.
This talk will demonstrate how Julia is increasingly becoming a natural language for machine learning, the kind of libraries and applications the Julia community is building, the contributions from India (there are many!), and our plans going forward.
-
keyboard_arrow_down
Naresh Jain - Ethical AI - Fishbowl
45 Mins
Keynote
Beginner
There has been a lot of concerns about the black-box nature of AI. People have been asking for a sensible AI guideline with the weight of Law behind it. In April 2019, the EU released it's Ethics guidelines for trustworthy AI. Before that during the Obama administration, the National Science and Technology Council came up with its own set of broad guidelines called "Preparing for the Future of Artificial Intelligence."
Most of these cover an impressive amount of ground in several major categories:
- Transparency: Any time an AI system makes decisions on a user's behalf, that person should be aware of it. The reasoning behind decisions should be easily explainable.
- Safety: AI systems should be designed to withstand attempted hijacking and other attacks performed by hackers.
- Fairness: Decisions made by AI systems should not be influenced by gender, race or other personal identifiers. They should be as impartial as possible and not reflect human biases.
- Environmental stewardship: Not all the stakeholders in AI development are human. The development of these platforms and the implications of their decision-making and sustainability should take into account the needs of the larger environment and other forms of life.
- And so on...
At this conference, we would like to bring our experts together to hear their views/concerns on this topic.
-
keyboard_arrow_down
Vivek Singhal / Shreyas Jagannath - Training Autonomous Driving Systems to Visualize the Road ahead for Decision Control
Vivek SinghalCo-Founder & Chief Data ScientistCellStratShreyas JagannathAI researcherCellstratschedule 3 years ago
90 Mins
Workshop
Intermediate
We will train the audience how to develop advanced image segmentation with FCN/DeepLab algorithms which can help visualize the driving scenarios accurately, so as to allow the autonomous driving system to take appropriate action considering the obstacle views.
-
keyboard_arrow_down
Govind Chada - Using 3D Convolutional Neural Networks with Visual Insights for Classification of Lung Nodules and Early Detection of Lung Cancer
45 Mins
Case Study
Intermediate
Lung cancer is the leading cause of cancer death among both men and women in the U.S., with more than a hundred thousand deaths every year. The five-year survival rate is only 17%; however, early detection of malignant lung nodules significantly improves the chances of survival and prognosis.
This study aims to show that 3D Convolutional Neural Networks (CNNs) which use the full 3D nature of the input data perform better in classifying lung nodules compared to previously used 2D CNNs. It also demonstrates an approach to develop an optimized 3D CNN that performs with state of art classification accuracies. CNNs, like other deep neural networks, have been black boxes giving users no understanding of why they predict what they predict. This study, for the first time, demonstrates that Gradient-weighted Class Activation Mapping (Grad-CAM) techniques can provide visual explanations for model decisions in lung nodule classification by highlighting discriminative regions. Several CNN architectures using Keras and TensorFlow were implemented as part of this study. The publicly available LUNA16 dataset, comprising 888 CT scans with candidate nodules manually annotated by radiologists, was used to train and test the models. The models were optimized by varying the hyperparameters, to reach accuracies exceeding 90%. Grad-CAM techniques were applied to the optimized 3D CNN to generate images that provide quality visual insights into the model decision making. The results demonstrate the promise of 3D CNNs as highly accurate and trustworthy classifiers for early lung cancer detection, leading to improved chances of survival and prognosis.
-
keyboard_arrow_down
Kathrin Melcher / Paolo Tamagnini - The Magic of Many-To-Many LSTMs: Codeless Product Name Generation and Neural Machine Translation
45 Mins
Case Study
Intermediate
What do product name generation and neural machine translation have in common?
Both involve sequence analysis which can be implemented via recurrent neural networks (RNN) with LSTM layers.
LSTM Neural Networks are the state of the art technique for sequence analysis. In this presentation, we find out what LSTM layers are, learn about the difference between many-to-one, many-to-many, and one-to many-structures, and train many-to-many LSTM networks for both use cases.
-
keyboard_arrow_down
Aditya Singh Tomar - Building Your Own Data Visualization Platform
45 Mins
Demonstration
Beginner
Ever thought about having a mini interactive visualization tool that caters to your specific requirements. That is the product I created when I started independent consulting. 2 years since, and I have now decided to make it public – even the source code.
This session will give you an overview about creating a custom, personalized version of a visualization platform built on R and Shiny. We will focus on a mix of structure and flexibility to address the varying requirements. We will look at the code itself and the various components involved while exploring the customization options available to ensure that the outcome is truly a personal product.
-
keyboard_arrow_down
Anuj Gupta - Continuous Learning Systems: Building ML systems that keep learning from their mistakes
45 Mins
Talk
Beginner
Won't it be great to have ML models that can update their “learning” as and when they make mistake and correction is provided in real time? In this talk we look at a concrete business use case which warrants such a system. We will take a deep dive to understand the use case and how we went about building a continuously learning system for text classification. The approaches we took, the results we got.
For most machine learning systems, “train once, just predict thereafter” paradigm works well. However, there are scenarios when this paradigm does not suffice. The model needs to be updated often enough. Two of the most common cases are:
- When the distribution is non-stationary i.e. the distribution of the data changes. This implies that with time the test data will have very different distribution from the training data.
- The model needs to learn from its mistakes.
While (1) is often addressed by retraining the model, (2) is often addressed using batch update. Batch updation requires collecting a sizeable number of feedback points. What if you have much fewer feedback points? You need model that can learn continuously - as and when model makes a mistake and feedback is provided. To best of our knowledge there is a very limited literature on this.
-
keyboard_arrow_down
Deepak Mukunthu - Automated Machine Learning
45 Mins
Talk
Beginner
Intelligent experiences powered by AI can seem like magic to users. Developing them, however, is pretty cumbersome involving a series of sequential and interconnected decisions along the way that is pretty time-consuming. What if there was an automated service that identifies the best machine learning pipelines for a given problem/data? Automated Machine Learning does exactly that!
With the goal of accelerating AI for data scientists by improving their productivity and democratizing AI for other data personas who want to get into machine learning, Automated ML comes in many different flavors and experiences. Automated ML is one of the top 5 AI trends this year. This session will cover concepts of Automated ML, how it works, different variations of it and how you can use it for your scenarios.
-
keyboard_arrow_down
Arpit Agarwal - Practitioner's Perspective : How do you accelerate innovation and deliver faster time-to-value for your AI initiative
20 Mins
Experience Report
Beginner
Machine Learning (ML) offers innovation for every business and with the advancement in the ML technology we are solving ambitious problems using Machine Learning. In this session we will learn how Amazon Sagemaker helped Zoomcar in their ML journey by providing a scalable platform for doing exploratory analysis on the vast amount of data they have and running multiple ML model before finalizing on the best fit model for solving business-critical problem of car damage analysis assessment
-
keyboard_arrow_down
Madhan Rajasekkharan - Data Augmentation using GAN to improve Risk Models for New Credit Card customers
20 Mins
Experience Report
Beginner
In recent times, with the advent of advanced machine learning techniques especially neural networks, decision trees, etc., the hunger for data has increased dramatically. Several thousand, if not millions, of observations, are required to make a satisfactory model using these algorithms. But due to several reasons like operational challenges, cost considerations, time paucity, etc. we may not have enough number of observations. In such cases, we are either forced to use other statistical models or are forced to collect more data (which usually is time infeasible and expensive). Coming to one’s rescue, Generative Adversarial Networks (GAN-a class of neural networks) provide a method of creating synthetic data by learning the distribution of the smaller data you already have.
GANs have been very popular in creating synthetic images and other unstructured data. But little success has been seen in working with structured datasets. At American Express, we gather multiple data points about our customers in a structured format – used widely for assessing the credit risk of the customer. One of the key issues we face is lack of data – in quantity and quality - about our newly acquired customers as they are low tenured, and we can’t know their historical behavior to assess their risk better. GAN offers an interesting solution to this problem – Can we use GAN to create synthetic customers who look like our newly acquired portfolio and use this to augment our datasets and build superior and stable credit risk models?
We have seen interesting and promising results in this application and in this talk, we will share our story of how to work with GAN on structured data in Financial services domain – data pipelines, architectures, key changes needed, etc. as well as delve into the application of applying GAN for risk model advancement.
-
keyboard_arrow_down
Dr. Om Deshmukh - Key Principles to Succeed in Data Science
90 Mins
Tutorial
Beginner
Building a successful career in the field of data science needs a lot more than just a thorough understanding of the various machine learning models. One has to also undergo a paradigm shift with regards to how s/he would typically approach any technical problems. In particular, patterns and insights unearthed from the data analysis have to be the guiding North Star for the next best action rather than the path of action implied by the data scientist's or his/her superior's intuition alone. One of the things that makes this shift tricker, in reality, is the 'confirmation bias': Confirmation bias is defined as a cognitive bias to interpret information in such a way that it further’s our pre-existing notions.
In this session, we will discuss how the seemingly disjoint components of the digital ecosystem are working in tandem to make data-driven decisioning central to every functional aspect of every business vertical. This centrality accorded to the data makes it imperative that
- (a) the data integrity is maintained across the lifetime of the data,
- (b) the insights generated from the data are interpreted in the holistic context of the sources of the data and the data processing techniques, and
- (c) human experts are systematically given an opportunity to overwrite any purely-data-driven-decisions, especially when such decisions may have far-reaching consequences.
We will discuss these aspects using three case studies from three different business verticals (financial sector, logistics sector and the third one selected by popular vote). For each of these three case studies, the "traditional" way of solving the problem will be contrasted with the data-driven approach of solving. The participants will be split into three groups and each group will be asked to present the best data-driven approaches to solve one of the case studies. The other two groups can critique the presentation/approach. The winning group will be picked based on the presentation and the proposed approach.
At the end of the session, the attendees should be able to work through any new case study to
- (a) translate a business problem into an appropriate data-driven problem,
- (b) formulate strategies to capture and access relevant data,
- (c) shortlist relevant data modelling techniques to unearth the hidden patterns, and
- (d) tie back the value of the findings to the business problem.
-
keyboard_arrow_down
Ashish Vikram / Kuldeep Yadav - Lighting up the Blackhole of the Internet using AI
20 Mins
Experience Report
Beginner
Videos account for about 75% of Internet traffic today. Enterprises are creating more and more videos and using them for various informational purposes, including marketing, training of customers, partners & employees and internal communications. However, video
s are considered as the blackholes o f the Internet because it is very hard to see what’s inside them. The opaque nature of videos equally impacts end users who spend a lot of time navigating to their point of interest, leading to severe underutilization of videos as a powerful medium of information. In this talk, we will describe visual processing pipeline of VideoKen platform which includes
- Graph-based algorithm along with deep scene text detection to identify key visual frames in the video,
- FCN-based algorithm for semantic segmentation of screen content in visual frames,
- Transfer-learning based visual classifier to categorize screen content into different categories such as slides, code walkthrough, demo, handwritten, etc. and
- Algorithm to detect visual coherency and select indices from the video.
We will discuss challenges and experiences in implementing/iterating on these algorithms using our experience with processing 100K+ video hours of content.
-
keyboard_arrow_down
Bargava Subramanian - Anomaly Detection for Cyber Security using Federated Learning
20 Mins
Experience Report
Beginner
In a network of connected devices, there are two critical aspects of the system to succeed:
- Security – with a number of internet-connected devices, securing the network from cyber threats is very important.
- Privacy - The devices capture business sensitive data that the Organisation has to safeguard to maintain their differentiation.
I've used Federated learning to build anomaly detection models that monitor data quality and cybersecurity – while preserving data privacy.
Federated learning enables Edge devices to collaboratively learn deep learning models but keeping all of the data on the device itself. Instead of moving data to the cloud, the models are trained on the device and only the updates of the model are shared across the network.
Using federated learning gave me the following advantages:- Ability to build more accurate models faster
- Low latency during inference
- Privacy-preserving
- Improved energy efficiency of the devices
I built deep learning models using tensorflow and deployed using uTensor. uTensor is a light-weight ML inference framework built on Mbed and Tensorflow.
In this talk, I will discuss in detail on how I built federated learning models on the edge devices. -
keyboard_arrow_down
Rahul Agarwal - Continuous Data Integrity Tracking
20 Mins
Experience Report
Beginner
"In God we trust; all others must bring data." - W. E. Deming, Author & Professor
This philosophy is imbibed in the very core of American Express being a data-driven company makes all strategic decisions based on numbers. But who ensures that numbers are correct? That is the work of Data Quality and Governance. Given the dependency on Data, ensuring Data quality is one of our prime responsibilities
At American Express, we have Data getting generated and stored across multiple platforms. For example, in a market like the US, we process more than ~200 transactions every second and make an authorization decision. Given this speed and scale of data generation, ensuring Data quality becomes imperative and a unique challenge in itself. There are hundreds of models running in production platforms within AMEX having thousands of variables. Many variables are created/populated originally in legacy systems (or have components derived from there) which are then passed onto downstream systems for manipulation and creating new attributes. A tech glitch or a logic issue could impact any variable at any point of this process resulting in disastrous consequences in model outputs which can get transformed into real-world customer impact leading to financial and reputational risk for the bank. So how do we catch these anomalies before they adversely impact processes?
Traditional approaches to anomaly detection have relied on measuring the deviation from the mean of the variable. The more fancy ones employ time series based forecasting. But both these approaches are fraught with high levels of false positives. When every alert generated has to be analyzed by the business which has a cost, high levels of accuracy is desired. In this talk, we will discuss how AMEX has approached and solved this problem.
-
keyboard_arrow_down
Amit Doshi - Integrating Digital Twin and AI for Smarter Engineering Decisions
45 Mins
Talk
Intermediate
With the increasing popularity of AI, new frontiers are emerging in predictive maintenance and manufacturing decision science. However, there are many complexities associated with modeling plant assets, training predictive models for them, and deploying these models at scale for near real-time decision support. This talk will discuss these complexities in the context of building an example system.
First, you must have failure data to train a good model, but equipment failures can be expensive to introduce for the sake of building a data set! Instead, physical simulations can be used to create large, synthetic data sets to train a model with a variety of failure conditions.
These systems also involve high-frequency data from many sensors, reporting at different times. The data must be time-aligned to apply calculations, which makes it difficult to design a streaming architecture. These challenges can be addressed through a stream processing framework that incorporates time-windowing and manages out-of-order data with Apache Kafka. The sensor data must then be synchronized for further signal processing before being passed to a machine learning model.
As these architectures and software stacks mature in areas like manufacturing, it is increasingly important to enable engineers and domain experts in this workflow to build and deploy the machine learning models and work with system architects on the system integration. This talk also highlights the benefit of using apps and exposing the functionality through API layers to help make these systems more accessible and extensible across the workflow.
This session will focus on building a system to address these challenges using MATLAB, Simulink. We will start with a physical model of an engineering asset and walk through the process of developing and deploying a machine learning model for that asset as a scalable and reliable cloud service.
-
keyboard_arrow_down
Kathrin Melcher / Paolo Tamagnini - Deep Dive into Data Science with KNIME Analytics Platform
480 Mins
Workshop
Beginner
In this course we will cover the major steps in a data science project. From data access, data pre-processing, and data visualization, to machine learning, model optimization, and deployment using KNIME Analytics Platform.
-
keyboard_arrow_down
Anil Arora - Building Machine Learning models from scratch and Deploying in downstream Applications
45 Mins
Demonstration
Beginner
The session would start with a brief introduction of the evolutionary transformation of SAS platform for about 5-7 min. Followed by a jump right into the more exciting part of the session with a demo on how to build machine learning models right from scratch. This session would also emphasize and cover the need for feature engineering before building any Machine Learning models. Many organizations still face resistance in building ML models due to loss of model interpretation, hence we will see how can ML models be interpreted in SAS with various out of the box statistics. The demo would also cover the AutoML functionality to give a kickstart for data scientist for developing and refining (if needed) the ML models. At the end, the demo will cover how to consume or deploy the models in downstream applications like mobile, websites, etc. along with model governance. For the pure open source data science people the demo would conclude with how they can embrace and extend the power of open source with SAS
-
keyboard_arrow_down
Rudrani Ghosh / Rachna Gothi - Marketing Response Modeling – Business considerations in model development process
60 Mins
Workshop
Beginner
American Express, with its global footprint, has several hundred products catering to different types of customers – individuals, small and large businesses and merchants. We have over 100M Card Members and an ever-growing prospect base. Therefore, one of our biggest challenges is creating solutions to reach customers and prospects with the right product/offer through the right channels. Today, hundreds of response models in production determine our strategy for distributing various offers across channel combinations in different segments and markets.
As a part of the workshop, we will address the following in the response model development process:
- Procuring and curating the appropriate data
- Developing the best analytical solution that makes 'sense' to our colleagues and also regulators
Implementing the solution, monitoring its performance, and updating the solution based on learnings
-
keyboard_arrow_down
Rishu Gupta / Amit Doshi - Addressing Deep Learning Challenges
90 Mins
Tutorial
Intermediate
Deep learning is getting lots of attention lately and for good reason. It's achieving results that were not possible before. Though, getting started might not always be easy. MATLAB being an integrated framework allows you to accelerate building consumer and industrial applications while utilizing the capabilities of open-source frameworks like TensorFlow to train the deep learning networks.
Join us for a hands-on MATLAB workshop, in which you will explore and learn about deep learning workflow in MATLAB while working on problem of Speech command recognition and tackling key concepts and challenges such as
- Accelerating/Automating ground truth labeling for data
- Designing and Validating deep neural networks
- Training and tuning deep learning algorithms
Also, we will be talking about the interoperability with different frameworks and workflow for deploying your deep learning algorithms to embedded targets.
-
keyboard_arrow_down
Anuj Gupta - NLP Bootcamp
480 Mins
Workshop
Beginner
Recent advances in machine learning have rekindled the quest to build machines that can interact with outside environment like we human do - using visual clues, voice and text. An important piece of this trilogy are systems that can process and understand text in order to automate various workflows such as chat bots, named entity recognition, machine translation, information extraction, summarization, FAQ system, etc.
A key step towards achieving any of the above task is - using the right set of techniques to represent text in a form that machine can understand easily. Unlike images, where directly using the intensity of pixels is a natural way to represent the image; in case of text there is no such natural representation. No matter how good is your ML algorithm, it can do only so much unless there is a richer way to represent underlying text data. Thus, whatever NLP application you are building, it’s imperative to find a good representation for your text data.
In this bootcamp, we will understand key concepts, maths, and code behind the state-of-the-art techniques for text representation. We will cover mathematical explanations as well as implementation details of these techniques. This bootcamp aims to demystify, both - Theory (key concepts, maths) and Practice (code) that goes into building these techniques. At the end of this bootcamp participants would have gained a fundamental understanding of these schemes with an ability to implement them on datasets of their interest.
This would be a 1-day instructor-led hands-on training session to learn and implement an end-to-end deep learning model for natural language processing.