{ This is a handson workshop . The use case is Traffic analysis . }

Most machine learning models assume independent and identically distributed (i.i.d) data. Graphical models can capture almost arbitrarily rich dependency structures between variables. They encode conditional independence structure with graphs. Bayesian network, a type of graphical model describes a probability distribution among all variables by putting edges between the variable nodes, wherein edges represent the conditional probability factor in the factorized probability distribution. Thus Bayesian Networks provide a compact representation for dealing with uncertainty using an underlying graphical structure and the probability theory. These models have a variety of applications such as medical diagnosis, biomonitoring, image processing, turbo codes, information retrieval, document classification, gene regulatory networks, etc. amongst many others. These models are interpretable as they are able to capture the causal relationships between different features .They can work efficiently with small data and also deal with missing data which gives it more power than conventional machine learning and deep learning models.

In this session, we will discuss concepts of conditional independence, d- separation , Hammersley Clifford theorem , Bayes theorem, Expectation Maximization and Variable Elimination. There will be a code walk through of simple case study.


Outline/Structure of the Workshop

1. Probability Primer

2. Bayesian Networks

3. Independence in Bayesian Networks (covers d seperation, hammersley clifford)

4. Inference (covers Variable Elimination)

5. Missing data (Expectation Maximization)

6. Case Study using Bayesian networks

Learning Outcome

At the end of the Bayesian Network workshop, one would be able to

  • understand the probabilistic principles of reasoning under uncertainty
  • have insight into algorithms for probabilistic reasoning in Bayesian networks

Target Audience

Data Scientists, Data Analysts, Deep Learning Engineers, Statisticians, Health-science professionals,Machine Learning Engineers,

Prerequisites for Attendees

Basics of probability.


schedule Submitted 4 years ago

  • 45 Mins

    Since we originally proposed the need for a first-class language, compiler and ecosystem for machine learning (ML) - a view that is increasingly shared by many, there have been plenty of interesting developments in the field. Not only have the tradeoffs in existing systems, such as TensorFlow and PyTorch, not been resolved, but they are clearer than ever now that both frameworks contain distinct "static graph" and "eager execution" interfaces. Meanwhile, the idea of ML models fundamentally being differentiable algorithms – often called differentiable programming – has caught on.

    Where current frameworks fall short, several exciting new projects have sprung up that dispense with graphs entirely, to bring differentiable programming to the mainstream. Myia, by the Theano team, differentiates and compiles a subset of Python to high-performance GPU code. Swift for TensorFlow extends Swift so that compatible functions can be compiled to TensorFlow graphs. And finally, the Flux ecosystem is extending Julia’s compiler with a number of ML-focused tools, including first-class gradients, just-in-time CUDA kernel compilation, automatic batching and support for new hardware such as TPUs.

    This talk will demonstrate how Julia is increasingly becoming a natural language for machine learning, the kind of libraries and applications the Julia community is building, the contributions from India (there are many!), and our plans going forward.

  • Dipanjan Sarkar

    Dipanjan Sarkar - Explainable Artificial Intelligence - Demystifying the Hype

    45 Mins

    The field of Artificial Intelligence powered by Machine Learning and Deep Learning has gone through some phenomenal changes over the last decade. Starting off as just a pure academic and research-oriented domain, we have seen widespread industry adoption across diverse domains including retail, technology, healthcare, science and many more. More than often, the standard toolbox of machine learning, statistical or deep learning models remain the same. New models do come into existence like Capsule Networks, but industry adoption of the same usually takes several years. Hence, in the industry, the main focus of data science or machine learning is more ‘applied’ rather than theoretical and effective application of these models on the right data to solve complex real-world problems is of paramount importance.

    A machine learning or deep learning model by itself consists of an algorithm which tries to learn latent patterns and relationships from data without hard-coding fixed rules. Hence, explaining how a model works to the business always poses its own set of challenges. There are some domains in the industry especially in the world of finance like insurance or banking where data scientists often end up having to use more traditional machine learning models (linear or tree-based). The reason being that model interpretability is very important for the business to explain each and every decision being taken by the model.However, this often leads to a sacrifice in performance. This is where complex models like ensembles and neural networks typically give us better and more accurate performance (since true relationships are rarely linear in nature).We, however, end up being unable to have proper interpretations for model decisions.

    To address and talk about these gaps, I will take a conceptual yet hands-on approach where we will explore some of these challenges in-depth about explainable artificial intelligence (XAI) and human interpretable machine learning and even showcase with some examples using state-of-the-art model interpretation frameworks in Python!

  • Subhasish Misra

    Subhasish Misra - Causal data science: Answering the crucial ‘why’ in your analysis.

    45 Mins

    Causal questions are ubiquitous in data science. For e.g. questions such as, did changing a feature in a website lead to more traffic or if digital ad exposure led to incremental purchase are deeply rooted in causality.

    Randomized tests are considered to be the gold standard when it comes to getting to causal effects. However, experiments in many cases are unfeasible or unethical. In such cases one has to rely on observational (non-experimental) data to derive causal insights. The crucial difference between randomized experiments and observational data is that in the former, test subjects (e.g. customers) are randomly assigned a treatment (e.g. digital advertisement exposure). This helps curb the possibility that user response (e.g. clicking on a link in the ad and purchasing the product) across the two groups of treated and non-treated subjects is different owing to pre-existing differences in user characteristic (e.g. demographics, geo-location etc.). In essence, we can then attribute divergences observed post-treatment in key outcomes (e.g. purchase rate), as the causal impact of the treatment.

    This treatment assignment mechanism that makes causal attribution possible via randomization is absent though when using observational data. Thankfully, there are scientific (statistical and beyond) techniques available to ensure that we are able to circumvent this shortcoming and get to causal reads.

    The aim of this talk, will be to offer a practical overview of the above aspects of causal inference -which in turn as a discipline lies at the fascinating confluence of statistics, philosophy, computer science, psychology, economics, and medicine, among others. Topics include:

    • The fundamental tenets of causality and measuring causal effects.
    • Challenges involved in measuring causal effects in real world situations.
    • Distinguishing between randomized and observational approaches to measuring the same.
    • Provide an introduction to measuring causal effects using observational data using matching and its extension of propensity score based matching with a focus on the a) the intuition and statistics behind it b) Tips from the trenches, basis the speakers experience in these techniques and c) Practical limitations of such approaches
    • Walk through an example of how matching was applied to get to causal insights regarding effectiveness of a digital product for a major retailer.
    • Finally conclude with why understanding having a nuanced understanding of causality is all the more important in the big data era we are into.
  • Dr. C.S.Jyothirmayee

    Dr. C.S.Jyothirmayee / Usha Rengaraju / Vijayalakshmi Mahadevan - Deep learning powered Genomic Research

    90 Mins

    The event disease happens when there is a slip in the finely orchestrated dance between physiology, environment and genes. Treatment with chemicals (natural, synthetic or combination) solved some diseases but others persisted and got propagated along the generations. Molecular basis of disease became prime center of studies to understand and to analyze root cause. Cancer also showed a way that origin of disease, detection, prognosis and treatment along with cure was not so uncomplicated process. Treatment of diseases had to be done case by case basis (no one size fits).

    With the advent of next generation sequencing, high through put analysis, enhanced computing power and new aspirations with neural network to address this conundrum of complicated genetic elements (structure and function of various genes in our systems). This requires the genomic material extraction, their sequencing (automated system) and analysis to map the strings of As, Ts, Gs, and Cs which yields genomic dataset. These datasets are too large for traditional and applied statistical techniques. Consequently, the important signals are often incredibly small along with blaring technical noise. This further requires far more sophisticated analysis techniques. Artificial intelligence and deep learning gives us the power to draw clinically useful information from the genetic datasets obtained by sequencing.

    Precision of these analyses have become vital and way forward for disease detection, its predisposition, empowers medical authorities to make fair and situationally decision about patient treatment strategies. This kind of genomic profiling, prediction and mode of disease management is useful to tailoring FDA approved treatment strategies based on these molecular disease drivers and patient’s molecular makeup.

    Now, the present scenario encourages designing, developing, testing of medicine based on existing genetic insights and models. Deep learning models are helping to analyze and interpreting tiny genetic variations ( like SNPs – Single Nucleotide Polymorphisms) which result in unraveling of crucial cellular process like metabolism, DNA wear and tear. These models are also responsible in identifying disease like cancer risk signatures from various body fluids. They have the immense potential to revolutionize healthcare ecosystem. Clinical data collection is not streamlined and done in a haphazard manner and the requirement of data to be amenable to a uniform fetchable and possibility to be combined with genetic information would power the value, interpretation and decisive patient treatment modalities and their outcomes.

    There is hugh inflow of medical data from emerging human wearable technologies, along with other health data integrated with ability to do quickly carry out complex analyses on rich genomic databases over the cloud technologies … would revitalize disease fighting capability of humans. Last but still upcoming area of application in direct to consumer genomics (success of 23andMe).

    This road map promises an end-to-end system to face disease in its all forms and nature. Medical research, and its applications like gene therapies, gene editing technologies like CRISPR, molecular diagnostics and precision medicine could be revolutionized by tailoring a high-throughput computing method and its application to enhanced genomic datasets.

  • Badri Narayanan Gopalakrishnan

    Badri Narayanan Gopalakrishnan / Shalini Sinha / Usha Rengaraju - Lifting Up: How AI and Big data can contribute to anti-poverty programs

    45 Mins
    Case Study

    Ending poverty and zero hunger are top two goals United Nations aims to achieve by 2030 under its sustainable development program. Hunger and poverty are byproducts of multiple factors and fighting them require multi-fold effort from all stakeholders. Artificial Intelligence and Machine learning has transformed the way we live, work and interact. However economics of business has limited its application to few segments of the society. A much conscious effort is needed to bring the power of AI to the benefits of the ones who actually need it the most – people below the poverty line. Here we present our thoughts on how deep learning and big data analytics can be combined to enable effective implementation of anti-poverty programs. The advancements in deep learning , micro diagnostics combined with effective technology policy is the right recipe for a progressive growth of a nation. Deep learning can help identify poverty zones across the globe based on night time images where the level of light correlates to higher economic growth. Once the areas of lower economic growth are identified, geographic and demographic data can be combined to establish micro level diagnostics of these underdeveloped area. The insights from the data can help plan an effective intervention program. Machine Learning can be further used to identify potential donors, investors and contributors across the globe based on their skill-set, interest, history, ethnicity, purchasing power and their native connect to the location of the proposed program. Adequate resource allocation and efficient design of the program will also not guarantee success of a program unless the project execution is supervised at grass-root level. Data Analytics can be used to monitor project progress, effectiveness and detect anomaly in case of any fraud or mismanagement of funds.

  • Akshay Bahadur

    Akshay Bahadur - Minimizing CPU utilization for deep networks

    Akshay Bahadur
    Akshay Bahadur
    Symantec Softwares
    schedule 4 years ago
    Sold Out!
    45 Mins

    The advent of machine learning along with its integration with computer vision has enabled users to efficiently to develop image-based solutions for innumerable use cases. A machine learning model consists of an algorithm which draws some meaningful correlation between the data without being tightly coupled to a specific set of rules. It's crucial to explain the subtle nuances of the network along with the use-case we are trying to solve. With the advent of technology, the quality of the images has increased which in turn has increased the need for resources to process the images for building a model. The main question, however, is to discuss the need to develop lightweight models keeping the performance of the system intact.
    To connect the dots, we will talk about the development of these applications specifically aimed to provide equally accurate results without using much of the resources. This is achieved by using image processing techniques along with optimizing the network architecture.
    These applications will range from recognizing digits, alphabets which the user can 'draw' at runtime; developing state of the art facial recognition system; predicting hand emojis, developing a self-driving system, detecting Malaria and brain tumor, along with Google's project of 'Quick, Draw' of hand doodles.
    In this presentation, we will discuss the development of such applications with minimization of CPU usage.

  • Favio Vázquez

    Favio Vázquez - Complete Data Science Workflows with Open Source Tools

    90 Mins

    Cleaning, preparing , transforming, exploring data and modeling it's what we hear all the time about data science, and these steps maybe the most important ones. But that's not the only thing about data science, in this talk you will learn how the combination of Apache Spark, Optimus, the Python ecosystem and Data Operations can form a whole framework for data science that will allow you and your company to go further, and beyond common sense and intuition to solve complex business problems.

  • Pankaj Kumar

    Pankaj Kumar / Abinash Panda / Usha Rengaraju - Quantitative Finance :Global macro trading strategy using Probabilistic Graphical Models

    90 Mins

    { This is a handson workshop in pgmpy package. The creator of pgmpy package Abinash Panda will do the code demonstration }

    Crude oil plays an important role in the macroeconomic stability and it heavily influences the performance of the global financial markets. Unexpected fluctuations in the real price of crude oil are detrimental to the welfare of both oil-importing and oil-exporting economies.Global macro hedge-funds view forecast the price of oil as one of the key variables in generating macroeconomic projections and it also plays an important role for policy makers in predicting recessions.

    Probabilistic Graphical Models can help in improving the accuracy of existing quantitative models for crude oil price prediction as it takes in to account many different macroeconomic and geopolitical variables .

    Hidden Markov Models are used to detect underlying regimes of the time-series data by discretising the continuous time-series data. In this workshop we use Baum-Welch algorithm for learning the HMMs, and Viterbi Algorithm to find the sequence of hidden states (i.e. the regimes) given the observed states (i.e. monthly differences) of the time-series.

    Belief Networks are used to analyse the probability of a regime in the Crude Oil given the evidence as a set of different regimes in the macroeconomic factors . Greedy Hill Climbing algorithm is used to learn the Belief Network, and the parameters are then learned using Bayesian Estimation using a K2 prior. Inference is then performed on the Belief Networks to obtain a forecast of the crude oil markets, and the forecast is tested on real data.

  • Akash Tandon

    Akash Tandon - Traversing the graph computing and database ecosystem

    Akash Tandon
    Akash Tandon
    Data Engineer
    schedule 4 years ago
    Sold Out!
    45 Mins

    Graphs have long held a special place in computer science’s history (and codebases). We're seeing the advent of a new wave of the information age; an age that is characterized by great emphasis on linked data. Hence, graph computing and databases have risen to prominence rapidly over the last few years. Be it enterprise knowledge graphs, fraud detection or graph-based social media analytics, there are a great number of potential applications.

    To reap the benefits of graph databases and computing, one needs to understand the basics as well as current technical landscape and offerings. Equally important is to understand if a graph-based approach suits your problem.
    These realizations are a result of my involvement in an effort to build an enterprise knowledge graph platform. I also believe that graph computing is more than a niche technology and has potential for organizations of varying scale.
    Now, I want to share my learning with you.

    This talk will touch upon the above points with the general premise being that data structured as graph(s) can lead to improved data workflows.
    During our journey, you will learn fundamentals of graph technology and witness a live demo using Neo4j, a popular property graph database. We will walk through a day in the life of data workers (engineers, scientists, analysts), the challenges that they face and how graph-based approaches result in elegant solutions.
    We'll end our journey with a peek into the current graph ecosystem and high-level concepts that need to be kept in mind while adopting an offering.

  • Shalini Sinha

    Shalini Sinha / Ashok J / Yogesh Padmanaban - Hybrid Classification Model with Topic Modelling and LSTM Text Classifier to identify key drivers behind Incident Volume

    45 Mins
    Case Study

    Incident volume reduction is one of the top priorities for any large-scale service organization along with timely resolution of incidents within the specified SLA parameters. AI and Machine learning solutions can help IT service desk manage the Incident influx as well as resolution cost by

    • Identifying major topics from incident description and planning resource allocation and skill-sets accordingly
    • Producing knowledge articles and resolution summary of similar incidents raised earlier
    • Analyzing Root Causes of incidents and introducing processes and automation framework to predict and resolve them proactively

    We will look at different approaches to combine standard document clustering algorithms such as Latent Dirichlet Allocation (LDA) and K-mean clustering on doc2vec along-with Text classification to produce easily interpret-able document clusters with semantically coherent/ text representation that helped IT operations of a large FMCG client identify key drivers/topics contributing towards incident volume and take necessary action on it.

  • Saikat Sarkar

    Saikat Sarkar / Dhanya Parameshwaran / Dr Sweta Choudhary / Srikanth Ramaswamy / Usha Rengaraju - AI meets Neuroscience

    480 Mins

    This is a mixer workshop with lot of clinicians , medical experts , Neuroimaging experts ,Neuroscientists, data scientists and statisticians will come under one roof to bring together this revolutionary workshop.

    The theme will be updated soon .

    Our celebrity and distinguished presenter Srikanth Ramaswamy who is an advisor at Mysuru Consulting Group and also works Blue Brain Project at the EPFL will be delivering an expert talk in the workshop.


    { This workshop will be a combination of panel discussions , expert talk and neuroimaging data science workshop ( applying machine learning and deep learning algorithms to Neuroimaging data sets}

    { We are currently onboarding several experts from Neuroscience domain --Neurosurgeons , Neuroscientists and Computational Neuroscientists .Details of the speakers will be released soon }

    Abstract for the Neuroimaging Data Science Part of the workshop:

    The study of the human brain with neuroimaging technologies is at the cusp of an exciting era of Big Data. Many data collection projects, such as the NIH-funded Human Connectome Project, have made large, high- quality datasets of human neuroimaging data freely available to researchers. These large data sets promise to provide important new insights about human brain structure and function, and to provide us the clues needed to address a variety of neurological and psychiatric disorders. However, neuroscience researchers still face substantial challenges in capitalizing on these data, because these Big Data require a different set of technical and theoretical tools than those that are required for analyzing traditional experimental data. These skills and ideas, collectively referred to as Data Science, include knowledge in computer science and software engineering, databases, machine learning and statistics, and data visualization.

    The workshop covers Data analysis, statistics and data visualization and applying cutting-edge analytics to complex and multimodal neuroimaging datasets . Topics which will be covered in this workshop are statistics, associative techniques, graph theoretical analysis, causal models, nonparametric inference, and meta-analytical synthesis.

  • Chaitanya Krishna Thanneeru

    Chaitanya Krishna Thanneeru - Taxonomy Building using ML

    45 Mins
    Case Study

    Topic Modeling the art of extracting latent topics/themes that exist in a set of documents. In this talk we will discuss the use cases of Topic Modeling, particularly pertaining to Latent Dirichlet Allocation (LDA), and the implementation work by the Data Science Applications team at Meredith for the purposes of designing auto-taggers, classifiers for the topics in the custom enterprise taxonomy against hundreds of thousands of documents. We will talk about the best practices of choosing the optimal number of topics for hundreds of thousands of documents, how named entity extraction is employed to derive context in the feature space, alignment of machine learning techniques to support the work of taxonomists, the integration with the enterprise architecture to support expert assessor population for curating training data for Google’s AutoML and other deep learning capabilities.
    Latent semantic analysis has been shown to be ideal for quickly clustering the document space. Applied in a hierarchical manner on top-level clusters to derive child clusters and informed with inputs from the subject matter experts and taxonomists, namely taxonomy terms and synonyms, makes it possible to get a sense of the coverage in the content space against the enterprise taxonomy model.
    Where there are shortcomings, additional training data needs to be obtained in order to effectively build auto-tagging solutions. One technique for data augmentation is query formulation, again utilizing entity extraction from owned content along with the taxonomy categories and synonyms, to construct social listening streams to surface new off-property content to become part of the training corpus.

  • Kshitij Srivastava

    Kshitij Srivastava / Manikant Prasad - Data Science in Containers

    45 Mins
    Case Study

    Containers are all the rage in the DevOps arena.

    This session is a live demonstration of how the data team at Milliman uses containers at each step in their data science workflow -

    1) How do containerized environments speed up data scientists at the data exploration stage

    2) How do containers enable rapid prototyping and validation at the modeling stage

    3) How do we put containerized models on production

    4) How do containers make it easy for data scientists to do DevOps

    5) How do containers make it easy for data scientists to host a data science dashboard with continuous integration and continuous delivery

  • Dr. Neha Sehgal

    Dr. Neha Sehgal - Open Data Science for Smart Manufacturing

    45 Mins

    Open Data offers a tremendous opportunity in transformation of today’s manufacturing sector to smarter manufacturing. Smart Manufacturing initiatives include digitalising production processes and integrating IoT technologies for connecting machines to collect data for analysis and visualisation.

    In this talk, an understanding of linkage between various industries within manufacturing sector through lens of Open Data Science will be illustrated. The data on manufacturing sector companies, company profiles, officers and financials will be scraped from UK Open Data API’s. The work I plan to showcase in ODSC is part of UK Made Smarter Project, where the work has been useful for major aerospace alliances to find out the champions and strugglers (SMEs) within manufacturing sector based on the open data gathered from multiple sources. The talk includes discussion on data extraction, data cleaning, data transformation - transforming raw financial information about companies to key metrics of interest - and further data analytics to create clusters of manufacturing companies into "Champions" and "Strugglers". The talk showcased examples of powerful R Shiny based dashboards of interest for suppliers, manufacturer and other key stakeholders in supply chain network.

    Further analysis includes network analysis for industries, clustering and deploying the model as an API using Google Cloud Platform. The presenter will discuss about the necessity of 'Analytical Thinking' approach as an aid to handle complex big data projects and how to overcome challenges while working with real-life data science projects.

  • Saurabh Jha

    Saurabh Jha / Rohan Shravan / Usha Rengaraju - Hands on Deep Learning for Computer Vision

    480 Mins

    Computer Vision has lots of applications including medical imaging, autonomous
    vehicles, industrial inspection and augmented reality. Use of Deep Learning for
    computer Vision can be categorized into multiple categories for both images and
    videos – Classification, detection, segmentation & generation.
    Having worked in Deep Learning with a focus on Computer Vision have come
    across various challenges and learned best practices over a period
    experimenting with cutting edge ideas. This workshop is for Data Scientists &
    Computer Vision Engineers whose focus is deep learning. We will cover state of
    the art architectures for Image Classification, Segmentation and practical tips &
    tricks to train a deep neural network models. It will be hands on session where
    every concepts will be introduced through python code and our choice of deep
    learning framework will be PyTorch v1.0 and Keras.

    Given we have only 8 hours, we will cover the most important fundamentals,
    current techniques and avoid anything which is obsolete or not being used by
    state-of-art algorithms. We will directly start with building the intuition for
    Convolutional Neural Networks, and focus on core architectural problems. We
    will try and answer some of the hard questions like how many layers must be
    there in a network, how many kernels should we add. We will look at the
    architectural journey of some of the best papers and discover what each brought
    into the field of Vision AI, making today’s best networks possible. We will cover 9
    different kinds of Convolutions which will cover a spectrum of problems like
    running DNNs on constrained hardware, super-resolution, image segmentation,
    etc. The concepts would be good enough for all of us to move to harder problems
    like segmentation or super-resolution later, but we will focus on object
    recognition, followed by object detections. We will build our networks step by
    step, learning how optimizations techniques actually improve our networks and
    exactly when should we introduce them. We hope the leave you in confidence
    which will help you read research papers like your second nature. Given we have
    8 hours, and we want the sessions to be productive, we will instead of introducing

    all the problems and solutions, focus on the fundamentals of modern deep neural

  • Gopinath Ramakrishnan

    Gopinath Ramakrishnan - Five Key Pitfalls in Data Analysis

    45 Mins

    Data Science is all about deriving actionable insights through data analysis.
    There is no denying the fact that such insights have a tremendous business value.
    But what if -
    Some crucial data has been left out of consideration ?
    Wrong inferences have been drawn during analysis ?
    Results have been graphically misrepresented?
    Imagine the adverse impact on your business if you take wrong decisions based on such cases.

    In this talk we will discuss the following 5 key pitfalls to lookout for in the data analysis results before you take any decisions based on them
    1. Selection Bias
    2. Survivor Bias
    3. Confounding Effects
    4. Spurious Correlations
    5. Misleading Visualizations

    These are some of the most common points that are overlooked by the beginners in Data Science.

    The talk will draw upon many examples from real life situations to illustrate these points.

  • Vidhya Veeraraghavan

    Vidhya Veeraraghavan - Story Teller - Analytics in Banking & Financial Sector

    45 Mins
    Case Study

    As kids, we always enjoyed stories. Some scary, some holy, some imbibing moral values & some just for fun.

    Analytics is fun when you approach it with passion and curiosity. I know this because I have done this. With few case studies, I wish to illuminate your wits about Analytics and how it is being actively used in Banking and Financial Sector.

    Come join me for a fun ride.

  • Shankar Somayajula

    Shankar Somayajula - Revisiting Market Basket Analysis (MBA) with the help of SQL Pattern Matching