Pre-Conf Workshop

Thu, Aug 30
09:30

    Registration - 30 mins

10:00
  • Added to My Schedule
    keyboard_arrow_down
    Favio Vázquez

    Favio Vázquez - Agile Data Science Workflows with Python, Spark and Optimus

    schedule 10:00 AM - 06:00 PM place Pluto people 25 Interested shopping_cart Sold Out!

    Cleaning, Preparing , Transforming and Exploring Data is the most time-consuming and least enjoyable data science task, but one of the most important ones. With Optimus we’ve solve this problem for small or huge datasets, also improving a whole workflow for data science, making it easier for everyone. You will learn how the combination of Apache Spark and Optimus with the Python ecosystem can form a whole framework for Agile Data Science allowing people and companies to go further, and beyond their common sense and intuition to solve complex business problems.

  • Added to My Schedule
    keyboard_arrow_down
    Kathrin Melcher

    Kathrin Melcher / Vincenzo Tursi - Deep Dive into Data Science with KNIME Analytics Platform

    schedule 10:00 AM - 06:00 PM place Boardroom people 17 Interested shopping_cart Sold Out!

    In this course we cover the major steps in a data science project, from data access, data pre-processing, and data visualization to machine learning, model optimization, and deployment using the KNIME Analytics Platform.

  • Added to My Schedule
    keyboard_arrow_down
    Nirav Shah

    Nirav Shah - Advanced Data Analysis, Dashboards And Visualization

    schedule 10:00 AM - 06:00 PM place Jupiter 1 people 53 Interested shopping_cart Sold Out!

    In these two training sessions ( 4 hours each, 8 hours total), you will learn to use data visualization and analytics software Tableau Public (free to use) and turn your data into interactive dashboards. You will get hands on training on how to create stories with dashboards and share these dashboards with your audience. However, the first session will begin with a quick refresher of basics about design and information literacy and discussions about best practices for creating charts as well as decision making framework. Whether your goal is to explain an insight or let your audience explore data insights, Tableau's simple drag-and-drop user interface makes the task easy and enjoyable. You will learn what's new in Tableau and the session will cover the latest and most advanced features of data preparation.

    In the follow up second session, you will learn to create Table Calculations, Level of Detail Calculations, Animations and understanding Clustering. You will learn to integrate R and Tableau and how to use R within Tableau. You will also learn mapping, using filters / parameters and other visual functionalities.

  • Added to My Schedule
    keyboard_arrow_down
    Drs. Tarry Singh

    Drs. Tarry Singh / Aishwary Patil - Neural Networks Deep Dive

    schedule 10:00 AM - 06:00 PM place Neptune people 95 Interested shopping_cart Sold Out!

    First half of the day we will conduct a full comprehensive CNN theory lecture and discuss in large about what specific Neural Networks frameworks are used mostly such as TensorFow, PyTorch. Then the second half we build our own Neural Network from scratch (In PyTorch or TensorFlow) and if time permits also let learners play with the novel activation function that our researcher wrote a few weeks ago called ARiA. While deepkapha.ai is very busy writing some cool new algorithms, it is very likely that we may reveal deeper insights into our new paper which we are writing currently. Finally, the one-day workshop will end in a full Capsule Network lecture, the new Neural Network that is outperforming the CN (Convolutional Neural Network).

    Student Discount: Students are eligible for a flat 75% discount on this workshop and would also get a participation certificate from deepkapha.ai. To get the discount code, please email indianteam@odsc.com with a copy of your valid student ID card.

  • Added to My Schedule
    keyboard_arrow_down
    Vishal Gokhale

    Vishal Gokhale - Fundamental Math for Data Science

    schedule 10:00 AM - 06:00 PM place Mars people 28 Interested shopping_cart Sold Out!

    By now it is evident that a solid math foundation is indispensable if one has to get into Data science in an honest-to-goodness way. Unfortunately, for many of us math was just a means to get better scores at school-level and never really a means to understand the world around us.
    That systemic failure (education system) causes many of us to feel a “gap” when learning data science concepts. It is high time that we acknowledge that gap and take remedial action.

    The purpose of the workshop is to develop an intuitive understanding of the concepts.
    We let go the fear of rigorous notation and embrace the rationale behind it.
    The intended key take away for participants is confidence to deal with math.

ODSC India 2018 Day 1

Fri, Aug 31
08:30

    Registration - 30 mins

09:00
  • Added to My Schedule
    keyboard_arrow_down
    Dr. Ananth Sankar

    Dr. Ananth Sankar - The Deep Learning Revolution in Automatic Speech Recognition

    schedule 09:00 AM - 09:45 AM place Grand Ball Room people 171 Interested

    In the last decade, deep neural networks have created a major paradigm shift in speech recognition. This has resulted in dramatic and previously unseen reductions in word error rate across a range of tasks. These improvements have fueled products such as voice search and voice assistants like Amazon Alexa and Google Home.

    The main components of a speech recognition system are the acoustic model, lexicon, and language model. In recent years, the acoustic model has evolved from using Gaussian mixture models to deep neural networks, resulting in significant reductions in word error rate. Recurrent neural network language models have also given improvements over the traditional statistical n-gram language models. More recently sequence to sequence recurrent neural network models have subsumed the acoustic model, lexicon, and language model into one system, resulting in a far simpler model that gives comparable accuracy to the traditional systems. This talk will outline this evolution of speech recognition technology, and close with some key challenges and interesting new areas to apply this technology.

09:45
  • Added to My Schedule
    keyboard_arrow_down
    Sheamus McGovern

    Sheamus McGovern / Naresh Jain - Welcome Address

    schedule 09:45 AM - 10:05 AM place Grand Ball Room people 158 Interested

    This talk will help you understand the vision behind ODSC Conference and how it has grown over the years.

10:15

    Coffee/Tea Break - 15 mins

10:30
  • Added to My Schedule
    keyboard_arrow_down
    Dr. Ravi Vijayaraghavan

    Dr. Ravi Vijayaraghavan / Dr. Sidharth Kumar - Analytics and Science for Customer Experience and Growth in E-commerce

    schedule 10:30 AM - 10:50 AM place Grand Ball Room 1 people 64 Interested

    In our talk, we will cover the broad areas where Flipkart leverages Analytics and Sciences to drive both human and machine-driven decisions. We will go deeper into one use case related to pricing in e-commerce.

  • Added to My Schedule
    keyboard_arrow_down
    Rajat Jain

    Rajat Jain - AI and Machine Learning in Fraud Detection

    schedule 10:30 AM - 10:50 AM place Grand Ball Room 2 people 84 Interested

    Machine Learning is transforming the way companies are generating intelligence and making real-time decisions. This talk will cover American Express’s exploration of machine learning and how it was applied to detect and prevent fraud within its global network. You will learn the basics and complexities of fraud and how using credit card spend data can help surgically identify fraud and elevate the payment experience for millions of Card Members across the globe.

  • Added to My Schedule
    keyboard_arrow_down
    Ujjyaini Mitra

    Ujjyaini Mitra - When the Art of Entertainment ties the knot with Science

    schedule 10:30 AM - 10:50 AM place Jupiter people 36 Interested

    Apparently, Entertainment is a pure art form, but there's a huge bit that science can back the art. AI can drive multiple human intensive works in the Media Industry, driving the gut based decision to data-driven-decisions. Can we create a promo of a movie through AI? How about knowing which part of the video causing disengagement among our audiences? Could AI help content editors? How about assisting script writers through AI?

    i will talk about few specific experiments done specially on Voot Original contents- on binging, hooking, content editing, audience disengagement etc.

  • Added to My Schedule
    keyboard_arrow_down
    Samiran Roy

    Samiran Roy / Dr. Om Deshmukh - Reinforcement Learning: Demystifying the hype to successful enterprise applications

    schedule 10:30 AM - 11:15 AM place Neptune people 149 Interested

    In 2014, Google accquired DeepMind, a small, london-based AI startup for $500 million. DeepMind was conducting research on AI that would learn to play computer games in a fashion similar to humans. In 2015, Deepmind published a paper in Nature, describing a learning algorithm called Deep-Q-Learning which was able to achieve superhuman performance on a diverse range of Atari 2600 games[1]. They achieved this without any domain specific engineering - The algorithm took only the raw game images as input, and was guided by the game score. Believed by many to be the first steps in Artificial General Intelligence, DeepMind achieved this by pioneering the fusion of two fields of research - Reinforcement Learning(RL) and Deep Learning.

    RL is a learning paradigm inspired by operant conditioning which closely mimics the human learning process. It shifts focus from ML based pattern recognition solutions to learning through trial and error via interaction with an environment, guided by a reward signal or reinforcement. Imagine an agent teaching itself how to steer by navigating the streets of Grand Theft Auto - and transferring this knowledge to a driverless car[2]. Think of team of autonomous robots collaborating to outwit their opponents in a game of Robot Soccer[3]. Any practical real-world application suffers from the curse of dimensionality (A camera mounted on a robot feeding it a 64*64 grayscale image will have 256^(4096) input possibilities). A Deep Neural Network automatically learns compact and efficient feature representations from noisy, high-dimensional sensory inputs in its hidden layers, giving RL algorithms the edge to scale up and give incredible results in dynamic and complex domains.

    The most notable example of this is AlphaGo Zero[4] - the latest version of AlphaGo, the first computer program to defeat a world champion at the game of Go (Also called Chinese Checkers). AlphaGo Zero uses RL to learn by playing games against itself, starting from completely random play, and quickly surpasses human expert performance. Not only is the game extremely complex (A 19*19 Go board can represent 10^170 states of play), accomplished Go players often struggle to evaluate whether a certain move is good or bad. Most AI researchers were astonished by this feat, as it was speculated that it would take atleast a decade for a computer to play Go at an expert human level.

    RL, which was largely confined to academia for several decades is now beginning to see some successful applications and products in the industry, in fields such as robotics, automated trading systems, manufacturing, energy, dialog systems and recommendation engines. For most companies, it is an exciting prospect due to the AI hype, but very few organizations have identified use cases where RL may play a valuable role. In reality, RL is best suited for a niche class of problems where it can help automate some tasks(or augment a human expert). The focus of this presentation will be to give a practical introduction to the RL Setting, how to formulate problems into RL, and presenting successful use cases in the industry.

    [1] https://web.stanford.edu/class/psych209/Readings/MnihEtAlHassibis15NatureControlDeepRL.pdf
    [2] https://www.technologyreview.com/s/602317/self-driving-cars-can-learn-a-lot-by-playing-grand-theft-auto/
    [3] http://www.robocup.org/
    [4] https://deepmind.com/blog/alphago-zero-learning-scratch/

11:00
  • schedule 11:00 AM - 11:45 AM place Grand Ball Room 1 people 63 Interested

    In evolutionary history, the evolution of sensory organs and brain plays very important role for species to survive and prosper. Extending human’s abilities to achieve a better life, efficient and sustainable world is a goal of artificial intelligence. Although recent advances in machine learning enable machines to perform as good as, or even better than human in many intelligent tasks including automatic speech recognition, there are still many aspects to be addressed to bridge the semantic gap and achieve seamless interaction with machines. Auditory intelligence is a key technology to enable natural man machine interaction and expanding human’s auditory ability. In this talk, I am going to address three aspects of it:

    (1) non-speech audio recognition,

    (2) video highlight detection,

    (3) one technology to surpassing human’s auditory ability, namely source separation.

  • Added to My Schedule
    keyboard_arrow_down
    Dr. Tom Starke

    Dr. Tom Starke - Intelligent Autonomous Trading Systems - Are We There Yet?

    schedule 11:00 AM - 11:45 AM place Grand Ball Room 2 people 91 Interested

    Over the last two decades, trading has seen a remarkable evolution from open-outcry in the Wall Street pits to screen trading all the way to current automation and high-frequency trading (HFT). The success of machine learning and artificial intelligence (AI) seems like natural progression for the evolution of trading. However, unlike other fields of AI, trading has some domain specific problems that project the dream of set-it-and-forget-it money making machines still some way in the future. This talk will describe the current challenges for intelligent autonomous trading systems and provides some practical examples where machine learning is already being used in financial applications.

  • Added to My Schedule
    keyboard_arrow_down
    Atin Ghosh

    Atin Ghosh - AR-MDN - Associative and Recurrent Mixture Density Network for e-Retail Demand Forecasting

    schedule 11:00 AM - 11:45 AM place Jupiter people 45 Interested

    Accurate demand forecasts can help on-line retail organizations better plan their supply-chain processes. The chal- lenge, however, is the large number of associative factors that result in large, non-stationary shifts in demand, which traditional time series and regression approaches fail to model. In this paper, we propose a Neural Network architecture called AR-MDN, that simultaneously models associative fac- tors, time-series trends and the variance in the demand. We first identify several causal features and use a combination of feature embeddings, MLP and LSTM to represent them. We then model the output density as a learned mixture of Gaussian distributions. The AR-MDN can be trained end-to-end without the need for additional supervision. We experiment on a dataset of an year’s worth of data over tens-of-thousands of products from Flipkart. The proposed architecture yields a significant improvement in forecasting accuracy when compared with existing alternatives.

11:30
  • Added to My Schedule
    keyboard_arrow_down
    Saurabh Deshpande

    Saurabh Deshpande - Introduction to reinforcement learning using Python and OpenAI Gym

    schedule 11:30 AM - 01:00 PM place Neptune people 128 Interested

    Reinforcement Learning algorithms becoming more and more sophisticated every day which is evident from the recent win of AlphaGo and AlphaGo Zero (https://deepmind.com/blog/alphago-zero-learning-scratch/ ). OpenAI has provided toolkit openai gym for research and development of Reinforcement Learning algorithms.

    In this workshop, we will focus on introduction to the basic concepts and algorithms in Reinforcement Learning and hands on coding.

    Content

    • Introduction to Reinforcement Learning Concepts and teminologies
    • Setting up OpenAI Gym and other dependencies
    • Introducing OpenAI Gym and its APIs
    • Implementing simple algorithms using couple of OpenAI Gym Environments
    • Demo of Deep Reinforcement Learning using one of the OpenAI Gym Atari game

12:00
  • Added to My Schedule
    keyboard_arrow_down
    Sohan Maheshwar

    Sohan Maheshwar - It's All in the Data: The Machine Learning Behind Alexa's AI Systems

    schedule 12:00 PM - 12:45 PM place Grand Ball Room 1 people 85 Interested

    Amazon Alexa, the cloud-based voice service that powers Amazon Echo, provides access to thousands of skills that enable customers to voice control their world - whether it’s listening to music, controlling smart home devices, listening to the news or even ordering a pizza. Alexa developers use advanced natural language understanding that to use capabilities like built-in slot & intent training, entity resolution, and dialog management. This natural language understanding is powered by advanced machine learning algorithms that will be the focus of this talk.

    This session will tell you about the rise of voice user interfaces and will give an in-depth look into how Alexa works. The talk will delve into the natural language understanding and how utterance data is processed by our systems, and what a developer can do to improve accuracy of their skill. Also, the talk will discuss how Alexa hears and understands you and how error handling works.

  • Added to My Schedule
    keyboard_arrow_down
    Gaurav Godhwani

    Gaurav Godhwani - A Time Series Analysis of District-wise Government Spending in India

    schedule 12:00 PM - 12:45 PM place Grand Ball Room 2 people 67 Interested

    About District Treasuries: District Treasuries are the nodal offices for all financial transactions of the Government within the district, managing both payment and receipts. They also monitor the activities of various sub-treasuries which work as an extension of the Treasuries at the Tehsil/Taluka level. Each district has various Drawing & Disbursing Officers who are authorized to draw money can present their claims in the Treasury which are then accounted for by concerned authorities. Various states in India have developed Integrated Financial Management System which publishes detailed information on daily transactions happening at district treasuries within a state.

    About Time Series Analysis & Inferences: The detailed information of daily transactions at district treasury can help us perform near real-time tracking of flow and utilization of funds. This can be used to track expenditure on various schemes and social sectors, anomalies in fund disbursement, understanding near real-time alerts and predicting timely utilization of budgets. In this talk, we will explore how we can harness time-series modeling and analysis to better understand the functioning of various district treasuries in India.

  • Added to My Schedule
    keyboard_arrow_down
    Mridul Mishra

    Mridul Mishra - Explainable Artificial Intelligence (XAI): Why, When, and How?

    schedule 12:00 PM - 12:45 PM place Jupiter people 73 Interested

    Machine learning models are rapidly conquering uncharted grounds with new solutions by proving themselves to be better than the existing manual or software solutions. This has also given rise to a demand for Explainable Artificial Intelligence (XAI) that can be used by a human to understand the decisions made by the machine learning model. The need for XAI may stem from legal or social reasons, or from the desire to improve the acceptance and adoption of the machine learning model. The extent of explainability desired may vary with the aforementioned reasons and the application domain such as finance, defense, legal, and medical. XAI is achieved by choosing machine learning technique such as decision trees that lends well to explainability but compromise accuracy, or by putting additional efforts to develop a secondary machine model to explain the decisions of the primary model. Essentially this leads to a choice between the desired levels of explainability, accuracy, and development cost. In this talk, we present current thinking, challenges and a framework that can be used to analyze and communicate on the choices related to XAI, and make the decisions that can be used to provide the best XAI solution for the problem in hand.

01:00

    Lunch - 60 mins

02:00
  • Added to My Schedule
    keyboard_arrow_down
    Vincenzo Tursi

    Vincenzo Tursi - Puzzling Together a Teacher-Bot: Machine Learning, NLP, Active Learning, and Microservices

    schedule 02:00 PM - 02:45 PM place Grand Ball Room 1 people 85 Interested

    Hi! My name is Emil and I am a Teacher Bot. I was built to answer your initial questions about using KNIME Analytics Platform. Well, actually, I was built to point you to the right training materials to answer your questions about KNIME.

    Puzzling together all the pieces to implement me wasn't that difficult. All you need are:

    • A user interface - web or speech based - for you to ask questions
    • A text parser for me to understand
    • A brain to find the right training materials to answer your question
    • A user interface to send you the answer
    • A feedback option - nice to have but not a must - on whether my answer was of any help

    The most complex part was, of course, my brain. Building my brain required: a clear definition of the problem, a labeled data set, a class ontology, and finally the training of a machine learning model. The labeled data set in particular was lacking. So, we relied on active learning to incrementally make my brain smarter over time. Some parts of the final architecture, such as understanding and resource searching, were deployed as microservices.

  • Added to My Schedule
    keyboard_arrow_down
    Dr. Manish Gupta

    Dr. Manish Gupta / Radhakrishnan G - Driving Intelligence from Credit Card Spend Data using Deep Learning

    schedule 02:00 PM - 02:45 PM place Grand Ball Room 2 people 80 Interested

    Recently, we have heard success stories on how deep learning technologies are revolutionizing many industries. Deep Learning has proven huge success in some of the problems in unstructured data domains like image recognition; speech recognitions and natural language processing. However, there are limited gain has been shown in traditional structured data domains like BFSI. This talk would cover American Express’ exciting journey to explore deep learning technique to generate next set of data innovations by deriving intelligence from the data within its global, integrated network. Learn how using credit card spend data has helped improve credit and fraud decisions elevate the payment experience of millions of Card Members across the globe.

  • Added to My Schedule
    keyboard_arrow_down
    Dr. Vikas Agrawal

    Dr. Vikas Agrawal - Bring in the Lawyers: Explainable AI Driven Decision-making for the Enterprise

    schedule 02:00 PM - 02:45 PM place Jupiter people 64 Interested

    Daniel Dennett (Tufts University) says “If it can’t do better than us at explaining what it’s doing, then don’t trust it.” Will I believe the machine's recommendation enough to make a serious decision? What if need to explain my decision in court or to my shareholders or to individual customers? Is high precision and recall enough? We will see some examples where integrative AI models get better and better at providing actionable intelligence such that to ignore the advice could be considered irresponsible, reckless or discriminatory. Who would be to blame if the advice given by the AI system is found erroneous or disregarded? Then, the advice given by the AI system itself becomes confidential attorney-client privileged communication, and there are real debates around giving the privilege of plausible deniability to senior leadership of corporations.

    Wouldn't it be better to provide an explanation for the recommendations, and let the humans decide whether the advice makes sense? Moreover, in some geographies like Europe (GPDR), and in industries like banking, credit cards and pharmaceuticals, the explanations for predictions (or decision rules derived from them) are required by regulatory agencies. Therefore, many of these industries limit their models to easily explainable white box algorithms like logistic regression or decision trees. What kind of explanations would it take for regulatory agencies to be willing to accept black-box algorithms such are various types of NNs for detecting fraud or money-laundering? How do we demonstrate to the end-user what the underlying relationships between the inputs and outputs are, for traditionally black-box systems? How could we influence decision-makers enough to place trust in predictions made by a model? We could begin by giving reasons, explanations, substantial insights into why a pump is about to fail in the next three days, or how a sales opportunity is likely to be a win or why an employee is leaving. Yet, if we don't make these relevant to your role, your work context, your interests, what is valuable to you and what might you lose if you make an incorrect decision, then we have not done our job as data scientists.

    Explanations are the core of the evolving relationship between humans and intelligent machines - this fosters trust. We need to be just as cautious of AI explanations as we are of each other’s—no matter how clever a machine seems. This means as a community we need to find ways of reliably explaining black-box models. David Gunning (DARPA) says.“It’s the nature of these machine-learning systems that they produce a lot of false alarms, so an intelligence analyst really needs extra help to understand why a recommendation was made."

    In this talk, we will examine what is required to explain predictions, the latest research in the area, our own findings showing how it is currently being accomplished in practice for multiple real-world use cases in the enterprise

  • Added to My Schedule
    keyboard_arrow_down
    Anuj Gupta

    Anuj Gupta - Sarcasm Detection : Achilles Heel of sentiment analysis

    schedule 02:00 PM - 02:45 PM place Neptune people 74 Interested

    Sentiment analysis has been for long poster boy problem of NLP and has attracted a lot of research. However, despite so much work in this sub area, most sentiment analysis models fail miserably in handling sarcasm. Rise in usage of sentiment models for analysis social data has only exposed this gap further. Owing to the subtilty of language involved, sarcasm detection is not easy and has facinated NLP community.

    Most attempts at sarcasm detection still depend on hand crafted features which are dataset specific. In this talk we see some of the very recent attempts to leverage recent advances in NLP for building generic models for sarcasm detection.

    Key take aways:
    + Challenges in sarcasm detection
    + Deep dive into a end to end solution using DL to build generic models for sarcasm detection
    + Short comings and road forward

02:55
  • Added to My Schedule
    keyboard_arrow_down
    Venkatraman J

    Venkatraman J - Detection and Classification of Fake news using Convolutional Neural networks

    schedule 02:55 PM - 03:15 PM place Grand Ball Room 1 people 108 Interested

    The proliferation of fake news or rumours in traditional news media sites, social media, feeds, and blogs have made it extremely difficult and challenging to trust any news in day to day life. There are wide implications of false information on both individuals and society. Even though humans can identify and classify fake news through heuristics, common sense and analysis there is a huge demand for an automated computational approach to achieve scalability and reliability. This talk explains how Neural probabilistic models using deep learning techniques are used to classify and detect fake news.

    This talk will start with an introduction to Deep learning, Tensor flow(Google's Deep learning framework), Dense vectors (word2vec model) feature extraction, data preprocessing techniques, feature selection, PCA and move on to explain how a scalable machine learning architecture for fake news detection can be built.

  • Added to My Schedule
    keyboard_arrow_down
    Kavita Dwivedi

    Kavita Dwivedi - Social Network Analytics to enhance Marketing Outcomes in Telecom Sector

    schedule 02:55 PM - 03:15 PM place Grand Ball Room 2 people 34 Interested

    This talk will focus on How SNA can help enhance the outcomes of Marketing Campaigns by using social network graphs .

    Social network analytics (SNA) is the process of investigating social structures through the use of network and graph theories. It characterizes networked structures in terms of nodes (individual actors, people, or things within the network) and the ties or edges (relationships or interactions) that connect them. This is emerging as an important tool to understand customer behavior and influencing his behavior. The talk will focus on the mathematics behind SNA and how SNA can help make marketing decisions for telecom operators.

    SNA use case will use telecom consumer data to establish networks based on their calling behavior like frequency, duration of calls, types of connections and thus establish major communities and influencers. By identifying key influencers and active communities marketing campaigns can be made more effective/viral. It helps in improving the adoption rate by targeting influencers with a large degree of followers. It will also touch upon how SNA helps retention rate and spread the impact of marketing campaigns. The tools used for use case is SAS SNA and Node XL for demonstration purpose. It will show how SNA helps in lifting the impact of campaigns.

    This use case will illustrate a project focused on building a SNA model using a combination of demographic/firmographic variables for companies variables and Call frequency details. The dimensions like the company you work with, the place you stay, your professional experience and position, Industry Type etc. helps add a lot more value to the social network graph. With the right combination of the dimensions and problem at hand, in our case, it was more of marketing analytics we can identify the right influencers within a network. The more dimensions we add, the network gets stronger and more effective for running campaigns.

    Looking forward to discussing the outcomes of this project with the audience and fellow speakers

  • Added to My Schedule
    keyboard_arrow_down
    Asha Saini

    Asha Saini - Using Open Data to Predict Market Movements

    schedule 02:55 PM - 03:15 PM place Jupiter people 60 Interested

    As companies progress on their digital transformation journeys, technology becomes a strategic business decision. In this realm, consulting firms such as Gartner exert tremendous influence on technology purchasing decisions. The ability of these firms to predict the movement of market players will provide vendors with competitive benefits.

    We will explore how, with the use of publicly available data sources, IT industry trends can be mimicked and predicted.

    Big Data enthusiasts learned quickly that there are caveats to making Big Data useful:

    • Data source availability
    • Producing meaningful insights from publicly available sources

    Working with large data sets that are frequently changing can become expensive and frustrating. The learning curve is steep and discovery process long. Challenges range from selection of efficient tools to parse unstructured data, to development of a vision for interpreting and utilizing the data for competitive advantages.

    We will describe how the archive of billions of web pages, captured monthly since 2008 and available for free analysis on AWS, can be used to mimic and predict trends reflected in industry-standard consulting reports.

    There could be potential opportunity in this process to apply machine learning to tune the models and to self-learn so they can optimize automatically. There are over 70 topic area reports that Gartner publishes. Having an automated tool that can analyze across all of those topic areas to help us quickly understand major trends across today’s landscape and plan for those to come would be invaluable to many organizations.

  • schedule 02:55 PM - 03:15 PM place Neptune people 62 Interested

    The desire to reduce the cognitive load on human agents for processing swathes of data in natural languages is driving the adoption of machine learning based software solutions for extracting structured information from unstructured text for a variety of use case scenarios such as monitoring Internet sites for potential terror threat and analyzing documents from disparate sources to identify potentially illegal transactions. These aforementioned software solutions for extracting structured information from unstructured text rely on the ability to identify the entities and the relationship between the entities using Natural Language Processing that has benefitted immensely from the progress in deep learning.

    The goal of this talk is to introduce relationship extraction a key plinth stone of natural language understanding, and its use for building knowledge graphs to represent structured information extracted from unstructured text. The talk demonstrates how deep learning lends itself well to the problem of relationship extraction and provides an elegant and simple solution.

03:15

    Coffee/Tea Break - 15 mins

03:30
  • Added to My Schedule
    keyboard_arrow_down
    Dr. Arun Verma

    Dr. Arun Verma - Extracting Embedded Alpha Factors From Alternative Data Using Statistical Arbitrage and Machine Learning

    schedule 03:30 PM - 04:15 PM place Grand Ball Room 1 people 96 Interested

    The high volume and time sensitivity of news and social media stories requires automated processing to quickly extract actionable information. However, the unstructured nature of textual information presents challenges that are comfortably addressed through machine learning techniques.

  • Added to My Schedule
    keyboard_arrow_down
    Mahesh Balaji

    Mahesh Balaji - Deep Learning in Medical Image Diagnostics

    schedule 03:30 PM - 04:15 PM place Grand Ball Room 2 people 56 Interested

    Convolutional Neural Networks are revolutionizing the field of Medical Imaging analysis and Computer Aided Diagnostics. Medical images from X-Rays, CT, MRI, retinal scans to digitized biopsy slides are an integral part of a patient’s EHR. Current manual analysis and diagnosis by human radiologists, pathologists are prone to undue delays, erroneous diagnosis and can therefore benefit from deep learning based AI for quantitative, standardized computer aided diagnostic tools.

    In this session, we will review the state of the art in medical imaging and diagnostics, important tasks like classification, localization, detection, segmentation and registration along with CNN architectures that enable these. Further, we will briefly cover data augmentation techniques, transfer learning and walkthrough two casestudies on Diabetic Retinopathy and Breast Cancer Diagnosis. Finally, we discuss inherent challenges from sourcing training data to model interpretability.

  • Added to My Schedule
    keyboard_arrow_down
    Dr. Savita Angadi

    Dr. Savita Angadi - Connected Vehicle – is far more than just the car…

    schedule 03:30 PM - 04:15 PM place Jupiter people 48 Interested


    For many IoT use cases there is a real challenge in streaming large amounts of data in real time, and the connected vehicle is no exception. Cars and trucks have the ability to generate TB of data daily, and connectivity can be spotty, especially in remote areas. To address this issue companies will want to move the analysis to the edge, on to the device where the data is generated. Will walk through the case in which there is an installed streaming engine on a gateway on a commercial vehicle. Data is analyzed locally on the vehicle, as it is generated, and alerts are communicated via cell connection. Models can be downloaded when a vehicle comes in for service, or over the air. Idea is to use data from the vehicle, like model, horsepower, oil temp, etc, to buid a decision tree to predict our target, turbo fault. Decision trees are nice in that that lay out the rules for you model clearly. In this case the model was predictive for certain engine horsepower ratings, time in service, model, and oil temps. Once this model generated acceptable accuracy with a 30 day window, plenty of time to act on the alert. Now in order to capture the value of this insight, we need to know immediately when a signal is detected, so this model will run natively on the vehicle, in our on board analytics engine.

  • Added to My Schedule
    keyboard_arrow_down
    Gunjan Juyal

    Gunjan Juyal - Building a Case for a Standardized Data Pipeline for All Your Organizational Data

    schedule 03:30 PM - 03:50 PM place Neptune people 50 Interested

    Organizations of all size and domains today face a data explosion problem, driven by a proliferation of data management tools and techniques. A very common scenario is creation of silos of data and data-products which increases the system’s complexity spread across the whole data lifecycle - right from data modeling to storage and processing infrastructure.

    High complexity = high system maintenance overheads = sluggish decision making. Another side-effect of this is divergence of the implemented system’s behaviour from high-level business objectives.

    In this talk we look at Zeta's experience as a case-study for reducing this complexity by defining and tackling various concerns at well-defined stages so as to prevent a build of complexity.

03:55
  • schedule 03:55 PM - 04:15 PM place Neptune people 59 Interested

    Generative Models are important techniques used in computer vision. Unlike other neural networks that are used for predictions from images, generative models can generate new images for specific objectives. This session will review several applications of generative modeling such as artistic style transfer, image generation and image translation using CNNs and GANs.

04:30
  • Added to My Schedule
    keyboard_arrow_down
    Swapan Rajdev

    Swapan Rajdev - Conversational Agents at Scale: Retrieval and Generative approaches

    schedule 04:30 PM - 05:15 PM place Grand Ball Room 1 people 92 Interested

    Conversational Agents (Chatbots) are machine learning programs that are designed to have conversation with a human to help them fulfill a particular task. In recent years people have been using chatbots to communicate with business, help get daily tasks done and many more.

    With the emergence of open source softwares and online platforms building a basic conversational agent has become easier but making them work across multiple domains and handle millions of requests is still a challenge.

    In this talk I am going to talk about the different algorithms used to build good chatbots and the challenges faced to run them at scale in production.

  • Added to My Schedule
    keyboard_arrow_down
    Praveen Srivatsa

    Praveen Srivatsa - Machine Learning for medical rehabilitation

    schedule 04:30 PM - 05:15 PM place Grand Ball Room 2 people 32 Interested

    Walking is one of the most common human activity. But the human gait varies by gender, age, culture etc. How can we use pre-trained models to identify human gait across different images.

    In this session, we take a look at a real world case study where we are using deep learning models and Vision algorithms like DeeperCut and ArtTrack to objectively measure the human pose and gait and use this as a measure to predict their rehabilitation helping them to get back onto their feet in weeks instead of months.

    We will look at how we went about building and training the model to understand the human gait. We will also look at the challenges that we faced when we wanted to use a generic model to understand Indian patients. We will also touch about the importance of trust and accuracy when working with machine learning algorithms in the healthcare space.

  • Added to My Schedule
    keyboard_arrow_down
    Anand Chitipothu

    Anand Chitipothu - DevOps for Data Science: Experiences from building a cloud-based data science platform

    schedule 04:30 PM - 05:15 PM place Jupiter people 89 Interested

    Productionizing data science applications is non trivial. Non optimal practices, the people-heavy way of the traditional approaches, the developers love for complex solutions for the sake of using cool technologies makes the situation even worse.

    There are two key ingredients required to streamline this: “the cloud” and “the right level of devops abstractions”.

    In this talk, I’ll share the experiences of building a cloud-based platform for streamlining data science and how such solutions can greatly simplify building and deploying data science and machine learning applications.

  • Added to My Schedule
    keyboard_arrow_down
    Krishnakumar Shetti

    Krishnakumar Shetti - Computer Vision at the Edge with OpenVINO

    schedule 04:30 PM - 04:50 PM place Neptune people 37 Interested

    OpenVINO™ - Open Visual Inference and Neural Network Optimization toolkit is free software that helps developers and data scientists speed up computer vision workloads, streamline deep learning inference and deployments, and enable easy, heterogeneous execution across Intel® platforms from edge to cloud.

    In this short demo, you'll learn how OpenVINO™ toolkit will fast-track the development and deployment of vision-oriented solutions from smart cameras and video surveillance to robotics, transportation, and more.

04:55
  • Added to My Schedule
    keyboard_arrow_down
    Akshay Bahadur

    Akshay Bahadur - Recognizing Human features using Deep Networks.

    schedule 04:55 PM - 05:15 PM place Neptune people 39 Interested

    This demo would be regarding some of the work that I have already done since starting my journey in Machine Learning. So, there are a lot of MOOCs out there for ML and data science but the most important thing is to apply the concepts learned during the course to solve simple real-world use cases.

    • One of the projects that I did included building state of the art Facial recognition system [VIDEO]. So for that, I referred to several research papers and the foundation was given to me in one of the courses itself, however, it took a lot of effort to connect the dots and that's the fun part.
    • In another project, I made an Emoji Classifier for humans [VIDEO] based on your hand gestures. For that, I used deep learning CNN model to achieve great accuracy. I took reference from several online resources that made me realize that the data science community is very helpful and we must make efforts to contribute back.
    • The other projects that I have done using machine learning:
      1. Handwritten digit recognition [VIDEO],
      2. Alphabet recognition [VIDEO],
      3. Apparel classification [VIDEO],
      4. Devnagiri recognition [VIDEO].

    With each project, I have tried to apply one new feature or the other to make my model a bit more efficient. Hyperparameter tuning or just cleaning the data.

    In this demonstration, I would just like to point out that knowledge never goes to waste. The small computer vision applications that I built in my college has helped me to gain deep learning computer vision task. It's always enlightening and empowering to learn new technologies.

    I recently was part of a session on ‘Solving real world applications from Machine learning’ to Microsoft Advanced Analytics User Group of Belgium as well as broadcasted across the globe (Meetup Link) [Session Recording]

05:30
  • Added to My Schedule
    keyboard_arrow_down
    Dr. Ravi Mehrotra

    Dr. Ravi Mehrotra - Seeking Order amidst Chaos and Uncertainty

    schedule 05:30 PM - 06:15 PM place Grand Ball Room people 134 Interested

    Applying analytics to determine an optimal answer to business decision problems is relatively easy when the future can be predicted accurately. When the business environment is very complex and the future cannot be predicted, the business problem can become intractable using traditional modeling and problem-solving techniques. How do we solve such complex and intractable business problems to find globally optimal answers in highly uncertain business environments? The talk will discuss modeling and solution techniques that allow us to find optimal solutions in highly uncertain business environments without ignoring or underestimating uncertainty for revenue management and dynamic price optimization problems that arise in the airline and hospitality industry.

06:30

    Birds of Feather (BoF) - 45 mins

07:15

    Reception Dinner and Networking - 135 mins

ODSC India 2018 Day 2

Sat, Sep 1
08:30

    Registration - 30 mins

09:00
  • schedule 09:00 AM - 09:45 AM place Grand Ball Room people 141 Interested

    Genomic data is outpacing traditional Big Data disciplines, producing more information than Astronomy, twitter, and YouTube combined. As such, Genomic research has leapfrogged to the forefront of Big Data and Cloud solutions using artificial intelligence and machine learning to generate insights from these unprecedented volumes of data. This talk hence showcases how we find the disease genes responsible for ALS using VariantSpark, which is a custom random forest implementation built on top of Spark to deal with the 80 million columns in genomic data. This talk also outlines how we use a serverless architecture to translate these insights onto the clinical practice by provide a decision support framework for clinicians to find actionable genomic insights and process medical records at a speed fit for point-of-care application. Furthermore, the talk also touches on how to evolve serverless architecture more efficiently through an hypothesis-driven approach to DevOps and how we keep data and functions secure in a serverless environment.

09:45

    Important Announcements - 15 mins

10:00

    Coffee/Tea Break - 15 mins

10:15
  • Added to My Schedule
    keyboard_arrow_down
    Dr. Dakshinamurthy V Kolluru

    Dr. Dakshinamurthy V Kolluru - ML and DL in Production: Differences and Similarities

    schedule 10:15 AM - 11:00 AM place Grand Ball Room 1 people 100 Interested

    While architecting a data-based solution, one needs to approach the problem differently depending on the specific strategy being adopted. In traditional machine learning, the focus is mostly on feature engineering. In DL, the emphasis is shifting to tagging larger volumes of data with less focus on feature development. Similarly, synthetic data is a lot more useful in DL than ML. So, the data strategies can be significantly different. Both approaches require very similar approaches to the analysis of errors. But, in most development processes, those approaches are not followed leading to substantial delay in production times. Hyper parameter tuning for performance improvement requires different strategies between ML and DL solutions due to the longer training times of DL systems. Transfer learning is a very important aspect to evaluate in building any state of the art system whether ML or DL. The last but not the least is understanding the biases that the system is learning. Deeply non-linear models require special attention in this aspect as they can learn highly undesirable features.

    In our presentation, we will focus on all the above aspects with suitable examples and provide a framework for practitioners for building ML/DL applications.

  • Added to My Schedule
    keyboard_arrow_down
    Willem Pienaar

    Willem Pienaar - Building a Feature Platform to Scale Machine Learning at GO-JEK

    schedule 10:15 AM - 11:00 AM place Grand Ball Room 2 people 60 Interested

    Go-Jek, Indonesia’s first billion-dollar startup, has seen an incredible amount of growth in both users and data over the past two years. Many of the ride-hailing company's services are backed by machine learning models. Models range from driver allocation, to dynamic surge pricing, to food recommendation, and process millions of bookings every day, leading to substantial increases in revenue and customer retention.

    Building a feature platform has allowed Go-Jek to rapidly iterate and launch machine learning models into production. The platform allows for the creation, storage, access, and discovery of features. It supports both low latency and high throughput access in serving, as well as high volume queries of historic feature data during training. This allows Go-Jek to react immediately to real world events.

    Find out how Go-Jek implemented their feature platform, and other lessons learned scaling machine learning.

  • Added to My Schedule
    keyboard_arrow_down
    Bargava Subramanian

    Bargava Subramanian / Amit Kapoor - Deep Learning in the Browser: Explorable Explanations, Model Inference, and Rapid Prototyping

    schedule 10:15 AM - 11:00 AM place Jupiter people 56 Interested

    The browser is the most common end-point consumption of deep learning models. It is also the most ubiquitous platform for programming available. The maturity of the client-side JavaScript ecosystem across the deep learning process—Data Frame support (Arrow), WebGL-accelerated learning frameworks (deeplearn.js), declarative interactive visualization (Vega-Lite), etc.—have made it easy to start playing with deep learning in the browser.

    Amit Kapoor and Bargava Subramanian lead three live demos of deep learning (DL) for explanations, inference, and training done in the browser, using the emerging client-side JavaScript libraries for DL with three different types of data: tabular, text, and image. They also explain how the ecosystem of tools for DL in the browser might emerge and evolve.

    Demonstrations include:

    1. Explorable explanations: Explaining the DL model and allowing the users to build intuition on the model helps generate insight. The explorable explanation for a loan default DL model allows the user to explore the feature space and threshold boundaries using interactive visualizations to drive decision making.
    2. Model inference: Inference is the most common use case. The browser allows you to bring your DL model to the data and also allows you test how the model works when executed on the edge. The demonstrated comments sentiment application can identify and warn users about the toxicity of your comments as you type in a text box.
    3. Rapid prototyping: Training DL models is now possible in the browser itself, if done smartly. The rapid prototyping image classification example allows the user to play with transfer learning to build a model specific for a user-generated image input.

    The demos leverage the following libraries in JavaScript:

    • Arrow for data loading and type inference
    • Facets for exploratory data analysis
    • ml.js for traditional machine learning model training and inference
    • deeplearn.js for deep learning model training and inference
    • Vega and Vega-Lite for interactive dashboards

    The working demos will be available on the web and as open source code on GitHub.

  • Added to My Schedule
    keyboard_arrow_down
    Kathrin Melcher

    Kathrin Melcher / Vincenzo Tursi - Sentiment Analysis with Deep Learning, Machine Learning or Lexicon Based

    schedule 10:15 AM - 11:45 AM place Neptune people 82 Interested

    Do you want to know what your customers, users, contacts, or relatives really think? Find out by building your own sentiment analysis application.

    In this workshop you will build a sentiment analysis application, step by step, using KNIME Analytics Platform. After an introduction to the most common techniques used for sentiment analysis and text mining, we will work in three groups, each one focusing on a different technique.

    • Deep Learning: This group will work with the visual Keras deep learning integration available in KNIME (completely code free)
    • Machine Learning: This group will use other machine learning techniques, based on native KNIME nodes
    • Lexicon Based: This group will focus on a lexicon based approach for sentiment analysis
11:15
  • schedule 11:15 AM - 12:00 PM place Grand Ball Room 1 people 33 Interested

    Various Chinese achievements in the field of Artificial Intelligence (AI) has widely been reported in the media recently. Be it a remotely controlled Tank to a demonstration of Swarms of Autonomous UAVs during an Air Show, China is building them fast and possibly in numbers while preparing for 'Intelligized Warfare'. These technology demonstrations are not very far from field deployment and are a cause for worry for several countries. Chinese State Council also released a detailed plan for the development of AI as a tool for the development of the nation, including military application. It is an ambitious plan but China has a strong foundation in its Academia, Public Sector Industry and Startups to make it possible and enable it to become a Global Leader in AI. It is therefore important to examine this foundation of Industry, Startup Ecosystem, Academia and their mutual cooperation to truly understand the potential of China and be able to predict their military deployment of AI.

  • schedule 11:15 AM - 12:00 PM place Grand Ball Room 2 people 116 Interested

    Apache Spark is an amazing framework for distributing computations in a cluster in an easy and declarative way. Is becoming a standard across industries so it would be great to add the amazing advances of Deep Learning to it. There are parts of Deep Learning that are computationally heavy, very heavy! Distributing these processes may be the solution to this problem, and Apache Spark is the easiest way I could think to distribute them. Here I will talk about Deep Learning Pipelines an open source library created by Databricks that provides high-level APIs for scalable deep learning in Python with Apache Spark and how to distribute your DL workflows with Spark.

  • schedule 11:15 AM - 12:45 PM place Jupiter people 93 Interested
    Advancements in Deep Learning seem almost unstoppable and research is the only way to make true improvements. Tarry and his team in deepkapha.ai is working relentlessly to write a few papers pertaining to Capsule Networks, automated swiping functions, and adaptations in optimizers and learning rates. Here in this lecture, we will briefly touch how research is transforming the field of AI and finally reveal two papers namely, Neuroscience and impact of Deep Learning and ARiA, a novel new NN activation function that has already proven its dominance over ReLU and Sigmoid.
11:45
  • Added to My Schedule
    keyboard_arrow_down
    Harish Kashyap K

    Harish Kashyap K / Ria Aggarwal - Probabilistic Graphical Models, HMMs using PGMPY

    schedule 11:45 AM - 01:15 PM place Neptune people 79 Interested

    PGMs are generative models that are extremely useful to model stochastic processes. I shall talk about how fraud models, credit risk models can be built using Bayesian Networks. Generative models are great alternatives to deep neural networks, which cannot solve such problems. This talk focuses on Bayesian Networks, Markov Models, HMMs and their applications. Many areas of ML need to explain causality. PGMs offer nice features that enable causality explanations. This will be a hands-on workshop where attendees shall learn about basics of graphical models along with HMMs with the open source library, pgmpy for which we are contributors. HMMs are generative models that are extremely useful to model stochastic processes. This is an advanced area of ML that is helpful to most researchers and ML community who are looking for solutions in state-space problems. This workshop shall have students learn basics needed to learn about HMMs including advanced probability, generative models, markov theory and HMMs. Students shall build various interesting models using pgmpy.

12:15
  • Added to My Schedule
    keyboard_arrow_down
    Dr. Jennifer Prendki

    Dr. Jennifer Prendki - Recognition and Localization of Parking Signs using Deep Learning

    schedule 12:15 PM - 01:00 PM place Grand Ball Room 1 people 60 Interested
    Drivers in large cities such as San Francisco are often the cause for a lot of traffic jams when they slow down and circle around the streets in order to attempt to decipher the meaning of the parking signs and avoid tickets. This endangers the safety of pedestrians and harms the overall transportation environment.

    In this talk, I will present an automated model developed by the Machine Learning team at Figure Eight which exploits multiple Deep Learning techniques to predict the presence of parking signs from street-level imagery and find their actual location on a map. Multiple APIs are then applied to read and extract the rules from the signs. The obtained map of the digitized parking rules along with the GPS information of a driver can be ultimately used to build functional products to help drive and park more safely.
  • Added to My Schedule
    keyboard_arrow_down
    Kuldeep Jiwani

    Kuldeep Jiwani - Topological space creation and Clustering at BigData scale

    schedule 12:15 PM - 01:00 PM place Grand Ball Room 2 people 42 Interested

    Every data has an inherent natural geometry associated with it. We are generally influenced by how the world visually appears to us and apply the same flat Euclidean geometry to data. The data geometry could be curved, may have holes, distances cannot be defined in all cases. But if we still impose Euclidean geometry on it, then we may be distorting the data space and also destroying the information content inside it.

    In the space of BigData world we have to regularly handle TBs of data and extract meaningful information from it. We have to apply many Unsupervised Machine Learning techniques to extract such information from the data. Two important steps in this process is building a topological space that captures the natural geometry of the data and then clustering in that topological space to obtain meaningful clusters.

    This talk will walk through "Data Geometry" discovery techniques, first analytically and then via applied Machine learning methods. So that the listeners can take back, hands on techniques of discovering the real geometry of the data. The attendees will be presented with various BigData techniques along with showcasing Apache Spark code on how to build data geometry over massive data lakes.

01:00

    Lunch - 60 mins

02:00
  • Added to My Schedule
    keyboard_arrow_down
    Joy Mustafi

    Joy Mustafi - The Artificial Intelligence Ecosystem driven by Data Science Community

    schedule 02:00 PM - 02:45 PM place Grand Ball Room 1 people 24 Interested

    Cognitive computing makes a new class of problems computable. To respond to the fluid nature of users understanding of their problems, the cognitive computing system offers a synthesis not just of information sources but of influences, contexts, and insights. These systems differ from current computing applications in that they move beyond tabulating and calculating based on pre-configured rules and programs. They can infer and even reason based on broad objectives. In this sense, cognitive computing is a new type of computing with the goal of more accurate models of how the human brain or mind senses, reasons, and responds to stimulus. It is a field of study which studies how to create computers and computer software that are capable of intelligent behavior. This field is interdisciplinary, in which a number of sciences and professions converge, including computer science, electronics, mathematics, statistics, psychology, linguistics, philosophy, neuroscience and biology. Project Features are Adaptive: They MUST learn as information changes, and as goals and requirements evolve. They MUST resolve ambiguity and tolerate unpredictability. They MUST be engineered to feed on dynamic data in real time; Interactive: They MUST interact easily with users so that those users can define their needs comfortably. They MUST interact with other processors, devices, services, as well as with people; Iterative and Stateful: They MUST aid in defining a problem by asking questions or finding additional source input if a problem statement is ambiguous or incomplete. They MUST remember previous interactions in a process and return information that is suitable for the specific application at that point in time; Contextual: They MUST understand, identify, and extract contextual elements such as meaning, syntax, time, location, appropriate domain, regulation, user profile, process, task and goal. They may draw on multiple sources of information, including both structured and unstructured digital information, as well as sensory inputs (visual, gestural, auditory, or sensor-provided). {A set of cognitive systems is implemented and demonstrated as the project J+O=Y}

  • Added to My Schedule
    keyboard_arrow_down
    Dr. Om Deshmukh

    Dr. Om Deshmukh - Moving from prototypes to products: How to build and deploy at-scale Data Science driven products

    schedule 02:00 PM - 02:45 PM place Grand Ball Room 2 people 75 Interested

    Have you ever wondered what it takes to demonstrate the efficacy of a data-driven approach to solve a particular problem and then build it into a full-fledged product that can cater at the big-data scale? Although frameworks like Six Sigma exist for software development life cycle, there are no standard best practices for data science driven product development and continuous improvement lifecycle. There is tremendous amount of literature available on internet regarding the machine learning model training and inference, yet there is a lack of publicly accessible know-how on how to structure machine learning projects.

    At Envestnet | Yodlee, we have deployed several advanced state-of-the-art Machine Learning solutions which process millions of data points on a daily basis with very stringent service level commitments.

    In this session, we will elaborate our framework for developing and deploying data-science driven products in the context of our latest product: Transaction Data Enrichment (TDE). TDE caters to millions of financial transaction requests daily which originate from thousands of different sources. Learnings from this ML-driven product deployment will help ML practitioners and data science leaders on how to structure their ML projects, how to build an optimal prototype, and how to quickly scale from the prototype stage to the production deployment stage.

    Specifically, we will talk about best practices and challenges for each stage of the data driven product development starting from (a) data gathering, cleaning and normalization, (b) incorporating domain expertize, (c) choice of model, (d) model training and evaluation, (e) design decisions for at-scale deployment, (f) planning for redundancy, (g) continuous monitoring for model deterioration, and (h) logging customer feedback

    With the latest advancements in machine learning methods like deep learning and the availability of vast amount of data, there has never been a better time to build data science driven products. Come join us as we share our learning in this journey.

  • Added to My Schedule
    keyboard_arrow_down
    Dipanjan Sarkar

    Dipanjan Sarkar - The Art of Effective Visualization of Multi-dimensional Data - A hands-on Approach

    schedule 02:00 PM - 02:45 PM place Jupiter people 107 Interested

    Descriptive Analytics is one of the core components of any analysis life-cycle pertaining to a data science project or even specific research. Data aggregation, summarization and visualization are some of the main pillars supporting this area of data analysis. However, dealing with multi-dimensional datasets with typically more than two attributes start causing problems, since our medium of data analysis and communication is typically restricted to two dimensions. We will explore some effective strategies of visualizing data in multiple dimensions (ranging from 1-D up to 6-D) using a hands-on approach with Python and popular open-source visualization libraries like matplotlib and seaborn. We will also do a brief coverage on excellent R visualization libraries like ggplot if we have time.

    BONUS: We will also look at ways to visualize unstructured data with several dimensions including text, images and audio!

  • Added to My Schedule
    keyboard_arrow_down
    Ujjyaini Mitra

    Ujjyaini Mitra - How to build churn propensity model where churn is single digit, in a non commital market

    schedule 02:00 PM - 02:45 PM place Neptune people 28 Interested

    When most known classification models fail to predict month on month telecom churn for a leading telecom operator, what can we do? Could there be an alternative?

02:55
  • Added to My Schedule
    keyboard_arrow_down
    Goda Ramkumar

    Goda Ramkumar - Evolution to Systems Thinking from Model Thinking

    schedule 02:55 PM - 03:15 PM place Grand Ball Room 1 people 92 Interested

    The talk provides a peek into different data science algorithms that come together to power a business like Ola. Area of focus would be on how different models built for different purposes come together to solve one problem and hence it becomes important to view the entire system of these predictive and optimization models and their performance as a whole than improving one model in isolation. Will also be touching upon challenges in measuring and attributing impact of data science algorithms in complex connected systems and strategies for the same.

  • Added to My Schedule
    keyboard_arrow_down
    Subramaniajeeva Kandasamy

    Subramaniajeeva Kandasamy - Operating at scale with Elastic Search in production

    schedule 02:55 PM - 03:15 PM place Grand Ball Room 2 people 54 Interested

    Having problems in your application taking hours of effort to debug because of needle in a haystack of errors? Or unable to find anomalies quickly before it booms up to an issue? Want to do behaviorial analysis to learn about your customers? Want to find customers near you? Elasticsearch would be an one stop solution. You can convert data into valuable insights by uncovering hidden patterns and correlations, with its efficient full text search and analytics.

    Setting up a cluster in smaller scale is simple. But, complexity grows with scale for any distributed system. Elasticsearch is not far from this fact. This presentation will cover the below:

    • Challenges met during the journey and the learnings that built us more stronger
    • Designing & Building a highly scalable and resilient reactive application to process/ingest multi billions of records/day into elasticsearch
  • Added to My Schedule
    keyboard_arrow_down
    Srijak Bhaumik

    Srijak Bhaumik - Let the Machine THINK for You

    schedule 02:55 PM - 03:15 PM place Jupiter people 43 Interested

    Every organization is now focused on the business or customer data and trying hard to get actionable insights out of it. Most of them are either hiring data scientists or up-skilling their existing developers. However, they do understand the domain or business, relevant data and the impact, but, not essentially excellent in data science programming or cognitive computing. To bridge this gap, IBM brings Watson Machine Learning (WML), which is a service for creating, deploying, scoring and managing machine learning models. WML’s machine learning model creation, deployment, and management capabilities are key components of cognitive applications. The essential feature is the “self-learning” capabilities, personalized and customized for specific persona - may it be the executive or business leader, project manager, financial expert or sales advisor. WML makes the need of cognitive prediction easy with model flow capabilities, where machine learning and prediction can be applied easily with just a few clicks, and to work seamlessly without bunch of coding - for different personas to mark boundaries between developers, data scientists or business analysts. In this session, WML's capabilities would be demonstrated by taking a specific case study to solve real world business problem, along with challenges faced. To align with the developers' community, the architecture of this smart platform would be highlighted to help aspiring developers be aware of the design of a large-scale product.

  • Added to My Schedule
    keyboard_arrow_down
    Tuhin Sharma

    Tuhin Sharma - Hybrid Recommendation Systems in News Media using Probabilistic Graphical Models

    schedule 02:55 PM - 03:15 PM place Neptune people 55 Interested

    A typical undertaking of recommender frameworks is to enhance customer experience through prior implicit feedback, by providing relevant content from time to time. These systems actively track different sorts of user behavior, such as buying pattern, watching habits browsing activity etc., in order to model user preferences. Unlike the much more extensively explored explicit feedback, we do not have any direct input from the users regarding their preferences. Where understanding the content is important, it is non-trivial to explain the recommendations to the users.

    When a new customer comes to the system it is very difficult to provide relevant recommendations to the customer by traditional state-of-art collaborative filtering based recommendation systems, where content-based recommendation does not suffer from this problem. On the other hand, content-based recommendation systems fail to achieve good performance when the user profile is not very well defined, where collaborative filtering does not suffer from this problem. So, there is a need to combine the power of these two recommendation systems and create a hybrid recommendation system which can address this problem in a more effective and robust way. Large media and edtech companies in emerging markets are using a version of this approach.

03:15

    Coffee/Tea Break - 15 mins

03:30
  • Added to My Schedule
    keyboard_arrow_down
    Dr. Rohit M. Lotlikar

    Dr. Rohit M. Lotlikar - The Impact of Behavioral Biases to Real-World Data Science Projects: Pitfalls and Guidance

    schedule 03:30 PM - 04:15 PM place Grand Ball Room 1 people 57 Interested

    Data science projects, unlike their software counterparts tend to be uncertain and rarely fit into standardized approach. Each organization has it’s unique processes, tools, culture, data and in-efficiencies and a templatized approach, more common for software implementation projects rarely fits.

    In a typical data science project, a data science team is attempting to build a decision support system that will either automate human decision making or assist a human in decision making. The dramatic rise in interest in data sciences means the typical data science project has a large proportion of relatively inexperienced members whose learnings draw heavily from academics, data science competitions and general IT/software projects.

    These data scientists learn over time that the real world however is very different from the world of data science competitions. In the real-word problems are ill-defined, data may not exist to start with and it’s not just model accuracy, complexity and performance that matters but also the ease of infusing domain knowledge, interpretability/ability to provide explanations, the level of skill needed to build and maintain it, the stability and robustness of the learning, ease of integration with enterprise systems and ROI.

    Human factors play a key role in the success of such projects. Managers making the transition from IT/software delivery to data science frequently do not allow for sufficient uncertainty in outcomes when planning projects. Senior leaders and sponsors, are under pressure to deliver outcomes but are unable to make a realistic assessment of payoffs and risks and set investment and expectations accordingly. This makes the journey and outcome sensitive to various behavioural biases of project stakeholders. Knowing what the typical behavioural biases and pitfalls makes it easier to identify those upfront and take corrective actions.

    The speaker brings his nearly two decades of experience working at startups, in R&D and in consulting to lay forth these recurring behavioural biases and pitfalls.

    Many of the biases covered are grounded in the speakers first-hand experience. The talk will provide examples of these biases and suggestions on how to identify and overcome or correct for them.

  • Added to My Schedule
    keyboard_arrow_down
    Dr. Veena Mendiratta

    Dr. Veena Mendiratta - Network Anomaly Detection and Root Cause Analysis

    schedule 03:30 PM - 04:15 PM place Grand Ball Room 2 people 88 Interested

    Modern telecommunication networks are complex, consist of several components, generate massive amounts of data in the form of logs (volume, velocity, variety), and are designed for high reliability as there is a customer expectation of always on network access. It can be difficult to detect network failures with typical KPIs as the problems may be subtle with mild symptoms (small degradation in performance). In this workshop on network anomaly detection we will present the application of multivariate unsupervised learning techniques for anomaly detection, and root cause analysis using finite state machines. Once anomalies are detected, the message patterns in the logs of the anomaly data are compared to those of the normal data to determine where the problems are occurring. Additionally, the error codes in the anomaly data are analyzed to better understand the underlying problems. The data preprocessing methodology and feature selection methods will also be presented to determine the minimum set of features that can provide information on the network state. The algorithms are developed and tested with data from a 4G network. The impact of applying such methods is the proactive detection and root cause analysis of network anomalies thereby improving network reliability and availability.

  • Added to My Schedule
    keyboard_arrow_down
    Amit Kapoor

    Amit Kapoor / Bargava Subramanian - Architectural Decisions for Interactive Viz

    schedule 03:30 PM - 04:15 PM place Jupiter people 42 Interested

    Visualization is an integral part of the data science process and includes exploratory data analysis to understand the shape of the data, model visualization to unbox the model algorithm, and dashboard visualization to communicate the insight. This task of visualization is increasingly shifting from a static and narrative setup to an interactive and reactive setup, which presents a new set of challenges for those designing interactive visualization applications.

    Creating visualizations for data science requires an interactive setup that works at scale. Bargava Subramanian and Amit Kapoor explore the key architectural design considerations for such a system and discuss the four key trade-offs in this design space: rendering for data scale, computation for interaction speed, adapting to data complexity, and being responsive to data velocity.

    • Rendering for data scale: Envisioning how the visualization can be displayed when data size is small is not hard. But how do you render interactive visualization when you have millions or billions of data points? Technologies and techniques include bin-summarise-smooth (e.g., Datashader and bigvis) and WebGL-based rendering (e.g., deck.gl).
    • Computation for interaction speed: Making the visualization reactive requires the user to have the ability to interact, drill down, brush, and link multiple visual views to gain insight. But how do you reduce the latency of the query at the interaction layer so that the user can interact with the visualization? Technologies and techniques include aggregation and in-memory cubes (e.g., hashcubes, InMEMS, and nanocubes), approximate query processing and sampling (e.g., VerdictDB), and GPU-based databases (e.g., MapD).
    • Adapting to data complexity: Choosing a good visualization design for a singular dataset is possible after a few experiments and iterations, but how do you ensure that the visualization will adapt to the variety, volume, and edge cases in the real data? Technologies and techniques include responsive visualization to space and data, handling high cardinality (e.g., Facet Dive), and multidimensional reduction (e.g., Embedding Projector).
    • Being responsive to data velocity: Designing for periodic query-based visualization refreshes is one thing, but streaming data adds a whole new level of challenge to interactive visualization. So how do you work decide between the trade-offs of real-time and near real-time data and their impact on refreshing visualization? Technologies and techniques include optimizing for near real-time visual refreshes and handling event- and time-based streams.
  • Added to My Schedule
    keyboard_arrow_down
    Santosh Vutukuri

    Santosh Vutukuri - Embedding Artificial Intelligence in Spreadsheet

    schedule 03:30 PM - 03:50 PM place Neptune people 36 Interested

    In today's world all of us are growing our data science capabilities. There are many such organizations who think they are comfortable in spreadsheets (e.g. Microsoft Excel, Google Sheets, IBM Lotus, Apache OpenOffice Calc, Apple Numbers etc.), and they seriously do not want to switch into complex coding using R or Python, and not even into any other analytics tools available in the market. This proposal is for demonstrating how we can embed various artificial intelligence and machine learning algorithms into spreadsheet and get meaningful insights for business or research benefit. This would be helpful for the small scale businesses from the data analysis perspective. This approach with user friendly interface really creates value in decision making.

04:30
05:15

    Closing Talk - 15 mins

Post-Conf Workshop

Sun, Sep 2
09:30

    Registration - 30 mins

10:00
  • Added to My Schedule
    keyboard_arrow_down
    Dr. Jennifer Prendki

    Dr. Jennifer Prendki / Kiran Vajapey - Introduction to Active Learning

    schedule 10:00 AM - 06:00 PM place Jupiter 1 people 46 Interested shopping_cart Sold Out!

    The greatest challenge when building high performance model isn't about choosing the right algorithm or doing hyperparameter tuning: it is about getting high quality labeled data. Without good data, no algorithm, even the most sophisticated one, will deliver the results needed for real-life applications. And with most modern algorithms (such as Deep Learning models) requiring huge amounts of data to train, things aren't going to get better any time soon.

    Active Learning is one of the possible solutions to this dilemma, but is, quite surprisingly, left out of most Data Science conferences and Computer Science curricula. This workshop is hoping to address the lack of awareness of the Machine Learning community for the important topic of Active Learning.

    Link to data used in this course: https://s3-us-west-1.amazonaws.com/figure-eight-dataset/active_learning_odsc_india/Active_Learning_Workshop_data.zip

  • schedule 10:00 AM - 06:00 PM place Jupiter 2 people 26 Interested shopping_cart Sold Out!

    This introductory level workshop will give you the ability to navigate the world of quantitative finance. It will focus on core principles of rigorous statistical research, and try to teach overall intuitions so you are comfortable learning more on your own. It will discuss the workflow of designing trading strategies and executing them in the market with practical examples based on the Quantopian platform.

  • Added to My Schedule
    keyboard_arrow_down
    Dr. Denis Bauer

    Dr. Denis Bauer - Crunch Data and Deploy Serverless Architecture the Smart Way

    schedule 10:00 AM - 06:00 PM place Boardroom people 53 Interested shopping_cart Sold Out!

    The workshop will showcase how to perform machine learning analysis on notebooks, where the participants will be able to run their own Jupyter or Databricks notebook to find predictive features in a dataset with many columns. Furthermore, it will showcase how to deploy a serverless architecture using AWS CloudFormation template. The workshop also provides the opportunity to discuss differences in academic versus commercial data science.

    Before the workshop:

    For the workshop:

  • schedule 10:00 AM - 06:00 PM place Mars people 17 Interested shopping_cart Sold Out!

    You have been hearing about machine learning (ML) and artificial intelligence (AI) everywhere. You have heard about computers recognizing images, generating speech, natural language, and beating humans at Chess and Go.

    The objectives of the workshop:

    1. Learn machine learning, deep learning and AI concepts

    2. Provide hands-on training so that students can write applications in AI

    3. Provide ability to run real machine learning production examples

    4. Understand programming techniques that underlie the production software

    The concepts will be taught in Julia, a modern language for numerical computing and machine learning - but they can be applied in any language the audience are familiar with.

    Workshop will be structured as “reverse classroom” based laboratory exercises that have proven to be engaging and effective learning devices. Knowledgeable facilitators will help students learn the material and extrapolate to custom real world situations.

  • Added to My Schedule
    keyboard_arrow_down
    Sohan Maheshwar

    Sohan Maheshwar - Build Voice-Enabled Experiences with Alexa

    schedule 10:00 AM - 06:00 PM place Pluto people 15 Interested shopping_cart Sold Out!

    This hands-on workshop will take you through the skill-building process starting with an introduction to Amazon Alexa and the Alexa Skills Kit while illustrating why Voice is the next major disruption in computing.

    Build your first skill and then move onto learning voice design and how it differs from designing for the web or mobile. The workshop will also cover advanced topics like storing data in session attributes and building multimodal skills.

    The workshop will be hands-on and everyone will have built few skills by the end of the day. There will also be a mini Open Hack and demo time to showcase what you have built.