Telematics data have a potential to unlock revenue of 1.5 trillion. Unfortunately this data has not been tapped by many users.

In this case study Karthik Thirumalai would discuss how we can use telematics data to identify driver behaviour and do preventive maintenance in automobile.

 
 

Outline/Structure of the Case Study

- Outline of the problem

- Data Analytics Framework

- Exploratory Data Analysis

- Modelling

- Outcomes

Learning Outcome

How to use telematics data to infer driver behaviour and prevent failure using it. The presentation would cover in-depth analysis and visualization of the data. Also explain about different models that were applied to the model and the outcomes of the solution. The case study would touch upon data visualization, exploratory data analysis, machine learning

Target Audience

All

schedule Submitted 3 years ago

  • Kabir Rustogi
    keyboard_arrow_down

    Kabir Rustogi - Generation of Locality Polygons using Open Source Road Network Data and Non-Linear Multi-classification Techniques

    Kabir Rustogi
    Kabir Rustogi
    Head - Data Sciences
    Delhivery
    schedule 3 years ago
    Sold Out!
    45 Mins
    Case Study
    Intermediate

    One of the principal problems in the developing world is the poor localization of its addresses. This inhibits discoverability of local trade, reduces availability of amenities such as creation of bank accounts and delivery of goods and services (e.g., e-commerce) and delays emergency services such as fire brigades and ambulances. In general, people in the developing World identify an address based on neighbourhood/locality names and points of interest (POIs), which are neither standardized nor any official records exist that can help in locating them systematically. In this paper, we describe an approach to build accurate geographical boundaries (polygons) for such localities.

    As training data, we are provided with two pieces of information for millions of address records: (i) a geocode, which is captured by a human for the given address, (ii) set of localities present in that address. The latter is determined either by manual tagging or by using an algorithm which is able to take a raw address string as input and output meaningful locality information present in that address. For example, for the address, “A-161 Raheja Atlantis Sector 31 Gurgaon 122002”, its geocode is given as (28.452800, 77.045903), and the set of localities present in that address is given as (Raheja Atlantis, Sector 31, Gurgaon, Pin-code 122002). Development of this algorithm are part of any other project we are working on; details about the same can be found here.

    Many industries, such as the food-delivery industry, courier-delivery industry, KYC (know-your-customer) data-collection industry, are likely to have huge amounts of such data. Such crowdsourced data usually contain large a amount of noise, acquired either due to machine/human error in capturing the geocode, or due to error in identifying the correct set of localities from a poorly written address. For example, for the address, “Plot 1000, Sector 31 opposite Sector 40 road, Gurgaon 122002”, a machine may output the set of localities present in this address as (Sector 31, Sector 40, Gurgaon, Pin-code 122002), even though it is clear that the address does not lie in Sector 40.

    The solution described in this paper is expected to consume the provided data and output polygons for each of the localities identified in the address data. We assume that the localities for which we must build polygons are non-overlapping, e.g., this assumption is true for pin-codes. The problem is solved in two phases.

    In the first phase, we separate the noisy points from the points that lie within a locality. This is done by formulating the problem as a non-linear multi-classification problem. The latitudes and longitudes of all non-overlapping localities act as features, and their corresponding locality name acts as a label, in the training data. The classifier is expected to partition the 2D space containing the latitudes and longitudes of the union of all non-overlapping localities into disjoint regions corresponding to each locality. These partitions are defined as non-linear boundaries, which are obtained by optimizing for two objectives: (i) the area enclosed by the boundaries should maximize the number of points of the corresponding locality and minimize the number of points of other localities, (ii) the separation boundary should be smooth. We compare two algorithms, decision trees and neural networks for creating such partitions.

    In the second phase, we extract all the points that satisfy the partition constraints, i.e., lie within the boundary of a locality L, as candidate points, for generating the polygon for locality L. The resulting polygon must contain all candidate points and should have the minimum possible area while maintaining the smoothness of the polygon boundary. This objective can be achieved by algorithms such as concave hull. However, since localities are always bounded by roads, we have further enhanced our locality polygons by leveraging open source data of road networks. To achieve this, we solve a non-linear optimisation problem which decides the set of roads to be selected, so that the enclosed area is minimized, while ensuring that all the candidate points lie within the enclosed area. The output of this optimisation problem is a set of roads, which represents the boundary of a locality L.

  • 45 Mins
    Case Study
    Intermediate

    Imitation Learning has been the backbone of Robots learning from demonstrator's behavior. Join us to know more about How to train a robot to perform task like acrobatics etc.

    Two branches of AI - Deep Learning, and Reinforcement Learning are now responsible for many real-world applications. Machine Translation, Speech Recognition, Object Detection, Robot Control, and Drug Discovery - are some of the numerous examples.

    Both approaches are data-hungry - DL requires many examples of each class, and RL needs to play through many episodes to learn a policy. Contrast this to human intelligence. A small child can typically see an image just once, and instantly recognize it in other contexts and environments. We seem to possess an innate model/representation of how the world works, which helps us grasp new concepts and adapt to new situations fast. Humans are excellent one/few shot learners. We are able to learn complex tasks by observing and imitating other humans (eg: cooking, dancing or playing soccer) - despite having a different point of view, sense modalities, body structure, mental facility.

    Humans may be very good at picking up novel tasks, but Deep RL agents surpass us in performance. Once a Deep RL has learned a good representation [1], it is easy to surpass human performance in complex tasks like Go[2], Dota 2[3], and Starcraft[4]. We are biologically limited by time, memory and computation (A computer can be made to simulate thousands of plays in a minute).

    RL struggles with tasks that have sparse rewards. Take an example of a soccer playing robot - controlled by applying a torque to each one of its joints. The environment rewards you when it scores a goal. If the policy is initialized randomly (we apply a random torque to each joint, every few milliseconds) the probability of the robot scoring a goal is negligible - it won't even be able to learn how to stand up. In tasks requiring long term planning or low-level skills, getting to that initial reward can prove impossible. These situations have the potential to greatly benefit from a demonstration - in this case showing the robot how to walk and kick - and then letting it figure out how to score a goal.

    We have an abundance of visual data on humans performing various tasks, in the public domain, in the form of videos from sources like YouTube. In Youtube alone, 400 hours of videos are uploaded every minute, and it is easy to find demonstration videos for any skill imaginable. What if we could harness this by designing agents that could learn how to perform tasks - just by watching a video clip?

    Imitation Learning, also known as apprenticeship learning, teaches an agent a sequence of decisions through demonstration, often by a human expert. It has been used in many applications such as teaching drones how to fly[5] and autonomous cars how to drive[6] - It relies on domain engineered features - or extremely precise representations such as mocap [7]. Directly applying imitation learning to learn from videos proves challenging, there is a misalignment of representation between the demonstrations and the agent’s environment. For example: How can a robot sensing its world through a 3d point cloud - learn from a noisy 2d video clip of a soccer player dribbling?

    Leveraging recent advances in Reinforcement Learning, Self Supervised Learning and Imitation Learning [8] [9] [10], We present a technical deep dive into an end to end framework which:

    1) Has prior knowledge about the world intelligence through Self-Supervised Learning - A relatively new area which seeks to build efficient deep learning representations from unlabelled data but training on a surrogate task. The surrogate task can be rotating an image and predicting the rotation angle or cropping two patches of the image, and predicting their relative tasks - or a combination of several such objectives.

    2) Has the ability to align the representation of how it senses the world, with that of the video - allowing it to learn diverse tasks from video clips.

    3) Has the ability to reproduce a skill, from only a single demonstration - using applied techniques from imitation learning

    [1] https://www.cse.iitb.ac.in/~shivaram/papers/ks_adprl_2011.pdf

    [2] https://ai.google/research/pubs/pub44806

    [3] https://openai.com/five/

    [4] https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/

    [5] http://cs231n.stanford.edu/reports/2017/pdfs/614.pdf

    [6] https://arxiv.org/pdf/1709.07174.pdf

    [7] https://en.wikipedia.org/wiki/Motion_capture

    [8] https://arxiv.org/pdf/1704.06888v3.pdf

    [9] https://bair.berkeley.edu/blog/2018/06/28/daml/

    [10] https://arxiv.org/pdf/1805.11592v2.pdf

  • Suvro Shankar Ghosh
    keyboard_arrow_down

    Suvro Shankar Ghosh - Learning Entity embedding’s form Knowledge Graph

    45 Mins
    Case Study
    Intermediate
    • Over a period of time, a lot of Knowledge bases have evolved. A knowledge base is a structured way of storing information, typically in the following form Subject, Predicate, Object
    • Such Knowledge bases are an important resource for question answering and other tasks. But they often suffer from their incompleteness to resemble all the data in the world, and thereby lack of ability to reason over their discrete Entities and their unknown relationships. Here we can introduce an expressive neural tensor network that is suitable for reasoning over known relationships between two entities.
    • With such a model in place, we can ask questions, the model will try to predict the missing data links within the trained model and answer the questions, related to finding similar entities, reasoning over them and predicting various relationship types between two entities, not connected in the Knowledge Graph.
    • Knowledge Graph infoboxes were added to Google's search engine in May 2012

    What is the knowledge graph?

    â–¶Knowledge in graph form!

    â–¶Captures entities, attributes, and relationships

    â–¶More specifically, the “knowledge graph” is a database that collects millions of pieces of data about keywords people frequently search for on the World wide web and the intent behind those keywords, based on the already available content

    â–¶In most cases, KGs is based on Semantic Web standards and have been generated by a mixture of automatic extraction from text or structured data, and manual curation work.

    â–¶Structured Search & Exploration
    e.g. Google Knowledge Graph, Amazon Product Graph

    â–¶Graph Mining & Network Analysis
    e.g. Facebook Entity Graph

    â–¶Big Data Integration
    e.g. IBM Watson

    â–¶Diffbot, GraphIQ, Maana, ParseHub, Reactor Labs, SpazioDati

  • Favio Vázquez
    keyboard_arrow_down

    Favio Vázquez - Complete Data Science Workflows with Open Source Tools

    90 Mins
    Tutorial
    Beginner

    Cleaning, preparing , transforming, exploring data and modeling it's what we hear all the time about data science, and these steps maybe the most important ones. But that's not the only thing about data science, in this talk you will learn how the combination of Apache Spark, Optimus, the Python ecosystem and Data Operations can form a whole framework for data science that will allow you and your company to go further, and beyond common sense and intuition to solve complex business problems.

  • Aakash Goel
    keyboard_arrow_down

    Aakash Goel / Ankit Kalra - Detect Workout Pose for Virtual Gym using CNN

    45 Mins
    Talk
    Beginner

    Approximately 80% of the people across globe do not use gym, yet they pay $30 to $125/month.Attrition from gym is linked with discouraging results and lack of engagement. Traditional gym users don’t know proper exercise regimen and users prefer workout regimens that are fun, customizable and social.

    To combat above problem, we came up with idea to provide customized fitness solutions using Artificial Intelligence. In this talk, we showcase how we can leverage Deep Learning based Architectures like CNN to develop "Workout pose detection" that tracks user movement and classify it corresponding to specific trained workout and will determine whether the performed pose is correct or wrong.


    Keywords: CNN, Deep Learning, Image classification Model, Computer Vision.

  • Indranil Chandra
    keyboard_arrow_down

    Indranil Chandra - Data Science Project Governance Framework

    Indranil Chandra
    Indranil Chandra
    Assistant Manager
    CITI
    schedule 3 years ago
    Sold Out!
    45 Mins
    Talk
    Executive

    Data Science Project Governance Framework is a framework that can be followed by any new Data Science business or team. It will help in formulating strategies around how to leverage Data Science as a business, how to architect Data Science based solutions and team formation strategy, ROI calculation approaches, typical Data Science project lifecycle components, commonly available Deep Learning toolsets and frameworks and best practices used by Data Scientists. I will use an actual use case while covering each of these aspects of building the team and refer to examples from my own experiences of setting up Data Science teams in a corporate/MNC setup.

    A lot of research is happening all around the world in various domains to leverage Deep Learning, Machine Learning and Data Science based solutions to solve problems that would otherwise be impossible to solve using simple rule based systems. All the major players in the market and businesses are also getting started and setting up new Data Science teams to take advantages of modern State-of-the-Art ML/DL techniques. Even though most of the Data Scientists are great at knowledge of mathematical modeling techniques, they lack the business acumen and management knowledge to drive Data Science based solutions in a corporate/MNC setup. On the other hand, management executives in most of the corporates/MNCs do not have first hand knowledge of setting up new Data Science team and approach to solving business problems using Data Science. This session will help bridge the above mentioned gap and help Executives and Data Scientists provide a common ground around which they can easily build any Data Science business/team from ground zero.

    GitHub Link -> https://github.com/indranildchandra/DataScience-Project-Governance-Framework

  • Suvro Shankar Ghosh
    keyboard_arrow_down

    Suvro Shankar Ghosh - Real-Time Advertising Based On Web Browsing In Telecom Domain

    45 Mins
    Case Study
    Intermediate

    The following section describes Telco Domain Real-time advertising based on browsing use case in terms of :

    • Potential business benefits to earn.
    • Functional use case architecture depicted.
    • Data sources (attributes required).
    • Analytic to be performed,
    • Output to be provided and target systems to be integrated with.

    This use case is part of the monetization category. The goal of the use case is to provide a kind of DataMart to either Telecom business parties or external third parties sufficient, relevant and customized information to produce real-time advertising to Telecom end users. The customer targets are all Telecom network end-users.

    The customization information to be delivered to advertise are based on several dimensions:

    • Customer characteristics: demographic, telco profile.
    • Customer usage: Telco products or any other interests.
    • Customer time/space identification: location, zoning areas, usage time windows.

    Use case requirements are detailed in the description below as “ Targeting method”

    1. Search Engine Targeting:

    The telco will use users web history to track what users are looking at and to gather information about them. When a user goes onto a website, their web browsing history will show information of the user, what he or she searched, where they are from, found by the ip address, and then build a profile around them, allowing Telco to easily target ads to the user more specifically.

    1. Content and Contextual Targeting:

    This is when advertisers can put ads in a specific place, based on the relative content present. This targeting method can be used across different mediums, for example in an article online, about purchasing homes would have an advert associated with this context, like an insurance ad. This is achieved through an ad matching system which analyses the contents on a page or finds keywords and presents a relevant advert, sometimes through pop-ups.

    1. Technical Targeting

    This form of targeting is associated with the user’s own software or hardware status. The advertisement is altered depending on the user’s available network bandwidth, for example if a user is on their mobile phone that has a limited connection, the ad delivery system will display a version of the ad that is smaller for a faster data transfer rate.

    1. Time Targeting:

    This type of targeting is centered around time and focuses on the idea of fitting in around people’s everyday lifestyles. For example, scheduling specific ads at a timeframe from 5-7pm, when the

    1. Sociodemographic Targeting:

    This form of targeting focuses on the characteristics of consumers, including their age, gender, and nationality. The idea is to target users specifically, using this data about them collected, for example, targeting a male in the age bracket of 18-24. The telco will use this form of targeting by showing advertisements relevant to the user’s individual demographic profile. this can show up in forms of banner ads, or commercial videos.

    1. Geographical and Location-Based Targeting:

    This type of advertising involves targeting different users based on their geographic location. IP addresses can signal the location of a user and can usually transfer the location through different cells.

    1. Behavioral Targeting:

    This form of targeted advertising is centered around the activity/actions of users and is more easily achieved on web pages. Information from browsing websites can be collected, which finds patterns in users search history.

    1. Retargeting:

    Is where advertising uses behavioral targeting to produce ads that follow you after you have looked or purchased are a particular item. Retargeting is where advertisers use this information to ‘follow you’ and try and grab your attention so you do not forget.

    1. Opinions, attitudes, interests, and hobbies:

    Psychographic segmentation also includes opinions on gender and politics, sporting and recreational activities, views on the environment and arts and cultural issues.

  • Karthik Bharadwaj T
    keyboard_arrow_down

    Karthik Bharadwaj T - 7 Habits to Ethical AI

    Karthik Bharadwaj T
    Karthik Bharadwaj T
    Sr. Data Scientist
    Teradata
    schedule 3 years ago
    Sold Out!
    45 Mins
    Talk
    Beginner

    While AI is been put to use in solving great problems of the world, it is subjected to questions the morality of how it is constructed, used and put into use. Karthik Thirumalai addresses the 7 habits of building ethical AI solutions and how it could be put to use for a better world. These habits Data Governance, Fairness, Privacy and Security, Accountability, Transparency, Education help organizations to successfully implement their AI strategy which reflects fundamental human principles and moral values.

  • Suvro Shankar Ghosh
    keyboard_arrow_down

    Suvro Shankar Ghosh - Attempt of a classification of AI use cases by problem class

    20 Mins
    Talk
    Intermediate

    There are many attempts to classify and structure the various AI techniques in the internet produced by a variety of sources with specific interests in this emerging market and the fact that some new technologies make use of multiple techniques does not make the task easier to provide an easy, top-down access and guideline through AI for business decision makers. Most sources structure the AI techniques by their core ability (e.g. supervised vs. unsupervised learning) but even this sometimes controversial (e.g. genetic algorithms). The approach taken here is to find groups of use cases that represent similar problem-solving strategies (just like distinguishing "search" from "sort" without reference to a particular technique like "Huffman search" or "qsort"). Of course, most AI techniques are combinations but with a different focus.

    There are many different sorting criteria to cluster use cases and these criteria determine how well and if at all the above objectives may be achieved. The target is to find “natural” classes of problems that in an abstract way can be applied to all the corresponding use cases. Since the clustering is used to determine which AI techniques are applicable, the classes should correspond to the typical characteristics of AI technique.

    problem class

    core problem description

    sample use cases

    key measure

    AI techniques

    Normalization

    Pre-process and convert unstructured data into structured data (patterns)

    • Big data pre-processing
    • Sample normalization (sound, face images, …)
    • Triggered time sequences
    • Feature extraction
    • Conversion quality

    Clustering

    detect pattern accumulations in a data set

    • customer segment analysis
    • optical skin cancel analysis
    • music popularity analysis
    • inter- and intracluster resolution

    Feature Extraction

    Detect features within patterns and samples

    • Facial expression analysis (eyes and mouth)
    • Scene analysis & surveillance (people ident.)
    • Accuracy
    • completeness

    Recognition

    detect a pattern in a large set of samples

    • image/face recognition
    • speaker recognition
    • natural language recognition
    • associative memory

    • accuracy
    • recognition speed
    • learning or storage speed
    • capacity

    Generalization

    Interpolation and extrapolation of feature patterns in a pattern space

    • adaptive linear feature interpolation
    • fuzzy robot control/navigation in unknown terrain
    • accuracy
    • prediction pattern range
    • Kohonen maps (SOM, SOFM)
    • any backpropagation NN
    • Fuzzy logic systems

    Prediction

    predict future patterns (e.g. based on past experience, i.e. observed sequences of patterns)

    • stock quote analysis
    • heart attack prevention
    • next best action machines
    • weather/storm forecast
    • pre-fetching in CPU's
    • accuracy
    • prediction time range

    Optimization

    optimize a given structure (pattern) according to a fitness- or energy function

    • (bionic) plane or ship construction
    • agricultural fertilization optimization
    • genetic programming
    • convergence
    • detection of local /global optimum
    • (heaviness) cost of optimization

    Conclusion

    detect or apply a (correlative) rule in a data set

    • QM correlation analysis
    • next best action machines
    • consistency
    • completeness
    • rule-based systems
    • Expert systems

  • 45 Mins
    Demonstration
    Intermediate

    Recent advancements in AI are proving beneficial in development of applications in various spheres of healthcare sector such as microbiological analysis, discovery of drug, disease diagnosis, Genomics, medical imaging and bioinformatics for translating a large-scale data into improved human healthcare. Automation in healthcare using machine learning/deep learning assists physicians to make faster, cheaper and more accurate diagnoses.

    Due to increasing availability of electronic healthcare data (structured as well as unstructured data) and rapid progress of analytics techniques, a lot of research is being carried out in this area. Popular AI techniques include machine learning/deep learning for structured data and natural language processing for unstructured data. Guided by relevant clinical questions, powerful deep learning techniques can unlock clinically relevant information hidden in the massive amount of data, which in turn can assist clinical decision making.

    We have successfully developed three deep learning based healthcare applications using TensorFlow and are currently working on three more healthcare related projects. In this demonstration session, first we shall briefly discuss the significance of deep learning for healthcare solutions. Next, we will demonstrate two deep learning based healthcare applications developed by us. The discussion of each application will include precise problem statement, proposed solution, data collected & used, experimental analysis and challenges encountered & overcame to achieve this success. Finally, we will briefly discuss the other applications on which we are currently working and the future scope of research in this area.

  • SUDIPTO PAL
    keyboard_arrow_down

    SUDIPTO PAL - Use cases of Financial Data Science Techniques in retail

    SUDIPTO PAL
    SUDIPTO PAL
    STAFF DATA SCIENTIST
    Walmart Labs
    schedule 3 years ago
    Sold Out!
    20 Mins
    Talk
    Intermediate

    Financial domains like Insurance and Banking have uncertainty itself as an inherent product feature, and hence makes extensive use of Statistical models to develop, valuate and price their products. This presentation will showcase some of the techniques like Survival models and cashflow prediction models, popularly used in financial products, how can they be used in Retail data science, by showcasing analogies and similarities.

    Survival models were traditionally used for modeling mortality, then got extended to be used for modeling queues, waiting time and attrition. We showcase, 1) How the waiting time aspect can be used to model repeat purchase behaviors of customers, and utilize the same for product recommendation on particular time intervals. 2) How the same survival or waiting time problem can be solved using discrete time binary response survival models (as opposed to traditional proportional hazard and AFT models for survival). 3) Quick coverage of other use cases like attrition, CLTV (customer lifetime value) and inventory management.

    We show a use case where survival models can be used to predict the timing of events (e.g. attrition/renewal, purchase, purchase order for procurement), and use that to predict the timing of cashflows associated with events (e.g. subscription fee received from renewals, procurement cost etc.), which are typically used for capital allocation.

    We also show how the backdated predicted cashflows can be used as baseline to make causal inference about strategic intervention (e.g. campaign launch for containing attritions) by comparing with actual cashflows post-intervention. This can be used to retrospectively evaluate the impact of strategic interventions.

  • Dr Hari Krishna Maram
    keyboard_arrow_down

    Dr Hari Krishna Maram - Future of Technology

    Dr Hari Krishna Maram
    Dr Hari Krishna Maram
    Chairman
    Vision Digital
    schedule 3 years ago
    Sold Out!
    20 Mins
    Talk
    Executive

    Future of Technology covered trends in technology across the globe and innovation changing the future

  • Nishat Korada
    keyboard_arrow_down

    Nishat Korada - Artificial Intelligence and Cybersecurity Whitepaper

    45 Mins
    Case Study
    Intermediate

    Artificial Intelligence has great potential for building a better, smarter world, but at the same time brings significant security risks. Because of the lack of security consideration in the early development of AI algorithms, attackers can manipulate the data results that can lead in a way to misjudgments. In critical areas such as healthcare, transportation and surveillance this gap can be devastating; leading to vulnerabilities and to security risks. Malicious attacks on AI systems or misinformation or mis-judgement in decision-making algorithms can lead to asset loss endangering property loss or even personal safety.

    The main purpose of this paper is to investigate the various security risks and vulnerabilities posed by emerging Artificial Intelligence technologies as well as the ethical dimensions of these as they affect the corporate, industrial, academic and social aspects of society in the present day. We will address the various cybersecurity risks associated with Artificial Intelligence and how to mitigate them.

help