location_city Bengaluru schedule Aug 10th 10:00 AM - 06:00 PM IST place Mars people 39 Interested add_circle_outline Notify

Modern statistics has become almost synonymous with machine learning, a collection of techniques that utilize today's incredible computing power. This two-part course focuses on the available methods for implementing machine learning algorithms in R, and will examine some of the underlying theory behind the curtain. We start with the foundation of it all, the linear model. We look how to assess model quality with traditional measures and cross-validation and visualize models with coefficient plots. Next we turn to penalized regression with the Elastic Net. After that we turn to Boosted Decision Trees utilizing xgboost. Along the way we learn modern techniques for preprocessing data.


Outline/Structure of the Workshop

Linear Models
Learn about the best fit line
Understand the formula interface in R
Understand the design matrix
Fit Models with lm
Visualize the coefficients with coefplot

Model Assessment

Elastic Net
Learn about penalized regression with the Lasso and Ridge
Fit models with glmnet
Understand the coefficient path
View coefficients with coefplot

Boosted Decision Trees
Learn how to make classifications (and regression) using recursive partitioning
Fit models with xgboost

Learning Outcome

  • Understand how to quickly and properly build design matrices for model training and prediction
  • Learn how to fitted lasso and ridge penalized regression for automated feature selection with glmnet
  • Learn how to fitted boosted trees and forests with xgboost
  • Score new data
  • Visualize your models

Target Audience

Data Science Enthusiasts who want to explore the power of R for ML

Prerequisites for Attendees

Attendees should have a good understanding of linear models and classification and should have R and RStudio installed, along with the tidymodels, glmnet, xgboost, boot, ggplot2, UsingR and coefplot packages.

schedule Submitted 4 years ago

  • Jared Lander

    Jared Lander - Making Sense of AI, ML and Data Science

    45 Mins

    When I was in grad school it was called statistics. A few years later I told people I did machine learning and after seeing the confused look on their face I changed that to data science which excited them. More years passed, and without changing anything I do, I now practice AI, which seems scary to some people and somehow involves ML. During this talk we will demystify buzzwords, technical terms and overarching ideas. We'll touch upon key concepts and see a little bit of code in action to get a sense of what is happening in ML, AI or whatever else we want to call the field.

  • jyotsna mehta

    jyotsna mehta - Creating alerts for asthma patients using a machine learning model

    jyotsna mehta
    jyotsna mehta
    Keva Health
    schedule 4 years ago
    Sold Out!
    45 Mins

    Digital health platforms help to create personalized care experiences for patients with chronic diseases. Patient apps can be created as a customizable mobile application with an AI enabled user interface that keeps patients engaged. It provides an intelligent decision support engine that helps patients follow physician recommended guidelines. The platform provides cloud based data driven machine learning models, powerful data analytics, real time insights via dashboards to optimize remote patient monitoring.

    Use cases and examples will be shown to the audience for chronic diseases such as asthma. First, dummy data for Asthma app mobile users will be shown. Next, use of external data from other sources will be explained and described. Finally, use of Machine learning approaches will be explained in Predicting risk of asthma attack.

    Each example will highlite different datasets and variables used, analytic approaches considered and its pros and cons, and how machine learning can predict and help in reducing visits to the emergency room or hospital for severe asthma patients.

    Data collection:

    Data includes patients peakflow, zones for asthma (yellow, green or red), symptoms, medications, symptom severity and answers to a 6 question survey (including number of hospital visits, medication change reason etc) all entered by the patient. External data includes air quality data , geographical location and pollen data.


    Machine learning methods work by uncovering hidden relationships between the target and features that classify or predict a particular outcome. In the context of telemonitoring via an app, supervised classification algorithms can be used to yield a classifier that distinguishes between a stable disease state and disease trajectory that it is indicative of incipient exacerbation on the basis of patient characteristics collected during a predefined time frame.

    Thus, from a machine learning prospective, telemonitoring data collected on a daily basis may be considered as features and each corresponding day's disposition with regard to exacerbation status (yes or no) can be considered as an outcome for predictive modeling. Within this framework, an initial predictive model can be continuously improved with increased numbers of cases.

    Examples of classification algorithms for building classification models include: adaptive Bayesian network, naive Bayesian classifier, and support vector machines. The naive Bayesian classifier looks at historical data and calculates conditional probabilities for the class values by observing the frequency of attribute values and of combinations of attribute values. The second algorithm used, adaptive Bayesian network, was based on Bayesian networks, which use a directed acyclic graph consisting of nodes, where each node represents an attribute. Corresponding to each node are instances with conditional probabilities. The conditional probability of an instance is calculated by the relative frequencies of the associated attributes in the training data. The third algorithm we used is a support vector machine algorithm which uses a subset of training data as support vectors. The support vectors are the closest instances to the maximum margin hyperplane, which provides the greatest separation between the classes. The support vectors are determined by constrained quadratic optimization.

    A receiver operating characteristic (ROC) will be shown to characterize comparative performance of classifying algorithms for asthma exacerbation prediction resulting from different training data sets . Our study demonstrates significant potential of machine learning approaches using telemonitoring data for early prediction of acute exacerbations of chronic health conditions.

  • Nirav Shah

    Nirav Shah / Ananth Bala - Data Analysis, Dashboards and Visualization - How to create powerful visualizations like a Zen Master

    90 Mins

    In today’s data economy and disruptive business environment, data is the new oil and data analysis with data visualization is vital for professionals and companies to stay competitive. Data Analysis and developing useful and interactive visualizations which provide insights may seem complex for a non -data professional. That should not be the case, thanks to various BI & data visualization tools. Tableau is one of the most popular one and widely used in various industries by individual users to enterprise roll out.

    In this hands-on training session you will learn to turn your data into interactive dashboards, how to create stories with data and share these dashboards with your internal/external stakeholders. We will begin with practices for creating charts and storytelling utilizing best visual practices. Whether your goal is to explain an insight or let your audience explore data insights, using Tableau’s simple drag-and-drop user interface makes the task easy and enjoyable.

    You will learn to use functionalities like Table Calculations, Sets, Filters, Parameters and predictive analytics using forecast functions . You will also learn mapping and other visual functionalities. We will demo few charts such as Waterfall charts, Pareto charts, Gantt Charts, Control Charts and Box and Whisker’s plot.

    We will focus on data Visualization workflows and best practices using zen master techniques.