Quantitative Finance :Global macro trading strategy using Probabilistic Graphical Models

{ This is a handson workshop in pgmpy package. The creator of pgmpy package Abinash Panda will do the code demonstration }

Crude oil plays an important role in the macroeconomic stability and it heavily influences the performance of the global financial markets. Unexpected fluctuations in the real price of crude oil are detrimental to the welfare of both oil-importing and oil-exporting economies.Global macro hedge-funds view forecast the price of oil as one of the key variables in generating macroeconomic projections and it also plays an important role for policy makers in predicting recessions.

Probabilistic Graphical Models can help in improving the accuracy of existing quantitative models for crude oil price prediction as it takes in to account many different macroeconomic and geopolitical variables .

Hidden Markov Models are used to detect underlying regimes of the time-series data by discretising the continuous time-series data. In this workshop we use Baum-Welch algorithm for learning the HMMs, and Viterbi Algorithm to find the sequence of hidden states (i.e. the regimes) given the observed states (i.e. monthly differences) of the time-series.

Belief Networks are used to analyse the probability of a regime in the Crude Oil given the evidence as a set of different regimes in the macroeconomic factors . Greedy Hill Climbing algorithm is used to learn the Belief Network, and the parameters are then learned using Bayesian Estimation using a K2 prior. Inference is then performed on the Belief Networks to obtain a forecast of the crude oil markets, and the forecast is tested on real data.

 
35 favorite thumb_down thumb_up 2 comments visibility_off  Remove from Watchlist visibility  Add to Watchlist
 

Outline/Structure of the Workshop

Theory:(30 minute)

Brief Introduction to the Crude Oil Price Prediction Problem

Identification of Macro Economic Factors influencing the Energy Markets

Refresher : Hidden Markov Model and Bayesian Networks

Handson (1 hour) - pgmpy package

Data Retrieval from the EIA and FRED

Data Preprocessing

Regime detection model using Hidden Markov Models

Learning the macroeconomic structure of the oil markets using hill-climbing structural learning.

Testing the constructed model by simulating trades

Learning Outcome

The audience will learn how to construct a macro trading model for crude oil price forecasting by representing structural and macroeconomic changes in the oil market by using Bayesian Networks and HMM .

Target Audience

Quantitative Finance researchers, Algorithmic Trading practioners , Financial Analyst, Data Scientists, financial data scientists, Probabilistic programmers, Statisticians, Machine Learnign Engineers, Deep Learning Engineers,PGM experts.

Prerequisites for Attendees

Basic Understanding of Bayesian Networks is preferred ,not Mandatory though.

Prior programming experience in Python preferred.

schedule Submitted 1 month ago

Public Feedback

comment Suggest improvements to the Speaker
  • Vikas Agrawal
    By Vikas Agrawal  ~  1 week ago
    reply Reply
    Dear Usha: I understand that for this proposal making an comparison with the state of the art can get tricky as you are showing a basic implementation and tools for people to get started . Advancements and refinements above that basic implementation will lead to better performance, and those are beyond the scope of this tutorial. Warm Regards, Vikas
  • Dr. Vikas Agrawal
    By Dr. Vikas Agrawal  ~  2 weeks ago
    reply Reply

    Dear Usha: It will be interesting to our audience at ODSC to highlight in the description how well do with respect to the state of the art and what enables us to do that.

    Warm Regards

    Vikas


  • Liked Viral B. Shah
    keyboard_arrow_down

    Viral B. Shah - Growing a compiler - Getting to ML from the general-purpose Julia compiler

    45 Mins
    Keynote
    Intermediate

    Since we originally proposed the need for a first-class language, compiler and ecosystem for machine learning (ML) - a view that is increasingly shared by many, there have been plenty of interesting developments in the field. Not only have the tradeoffs in existing systems, such as TensorFlow and PyTorch, not been resolved, but they are clearer than ever now that both frameworks contain distinct "static graph" and "eager execution" interfaces. Meanwhile, the idea of ML models fundamentally being differentiable algorithms – often called differentiable programming – has caught on.

    Where current frameworks fall short, several exciting new projects have sprung up that dispense with graphs entirely, to bring differentiable programming to the mainstream. Myia, by the Theano team, differentiates and compiles a subset of Python to high-performance GPU code. Swift for TensorFlow extends Swift so that compatible functions can be compiled to TensorFlow graphs. And finally, the Flux ecosystem is extending Julia’s compiler with a number of ML-focused tools, including first-class gradients, just-in-time CUDA kernel compilation, automatic batching and support for new hardware such as TPUs.

    This talk will demonstrate how Julia is increasingly becoming a natural language for machine learning, the kind of libraries and applications the Julia community is building, the contributions from India (there are many!), and our plans going forward.

  • Liked Dr. Vikas Agrawal
    keyboard_arrow_down

    Dr. Vikas Agrawal - Non-Stationary Time Series: Finding Relationships Between Changing Processes for Enterprise Prescriptive Systems

    45 Mins
    Talk
    Intermediate

    It is too tedious to keep on asking questions, seek explanations or set thresholds for trends or anomalies. Why not find problems before they happen, find explanations for the glitches and suggest shortest paths to fixing them? Businesses are always changing along with their competitive environment and processes. No static model can handle that. Using dynamic models that find time-delayed interactions between multiple time series, we need to make proactive forecasts of anomalous trends of risks and opportunities in operations, sales, revenue and personnel, based on multiple factors influencing each other over time. We need to know how to set what is “normal” and determine when the business processes from six months ago do not apply any more, or only applies to 35% of the cases today, while explaining the causes of risk and sources of opportunity, their relative directions and magnitude, in the context of the decision-making and transactional applications, using state-of-the-art techniques.

    Real world processes and businesses keeps changing, with one moving part changing another over time. Can we capture these changing relationships? Can we use multiple variables to find risks on key interesting ones? We will take a fun journey culminating in the most recent developments in the field. What methods work well and which break? What can we use in practice?

    For instance, we can show a CEO that they would miss their revenue target by over 6% for the quarter, and tell us why i.e. in what ways has their business changed over the last year. Then we provide the prioritized ordered lists of quickest, cheapest and least risky paths to help turn them over the tide, with estimates of relative costs and expected probability of success.

  • Liked Juan Manuel Contreras
    keyboard_arrow_down

    Juan Manuel Contreras - Beyond Individual Contribution: How to Lead Data Science Teams

    Juan Manuel Contreras
    Juan Manuel Contreras
    Head of Data Science
    Even
    schedule 1 month ago
    Sold Out!
    45 Mins
    Talk
    Advanced

    Despite the increasing number of data scientists who are being asked to take on managerial and leadership roles as they grow in their careers, there are still few resources on how to manage data scientists and lead data science teams. There is also scant practical advice on how to serve as head of a data science practice: how to set a vision and craft a strategy for an organization to use data science.

    In this talk, I will describe my experience as a data science leader both at a political party (the Democratic Party of the United States of America) and at a fintech startup (Even.com), share lessons learned from these experiences and conversations with other data science leaders, and offer a framework for how new data science leaders can better transition to both managing data scientists and heading a data science practice.

  • Liked Favio Vázquez
    keyboard_arrow_down

    Favio Vázquez - Complete Data Science Workflows with Open Source Tools

    90 Mins
    Tutorial
    Beginner

    Cleaning, preparing , transforming, exploring data and modeling it's what we hear all the time about data science, and these steps maybe the most important ones. But that's not the only thing about data science, in this talk you will learn how the combination of Apache Spark, Optimus, the Python ecosystem and Data Operations can form a whole framework for data science that will allow you and your company to go further, and beyond common sense and intuition to solve complex business problems.

  • Liked Badri Narayanan Gopalakrishnan
    keyboard_arrow_down

    Badri Narayanan Gopalakrishnan / Shalini Sinha / Usha Rengaraju - Lifting Up: Deep Learning for implementing anti-hunger and anti-poverty programs

    45 Mins
    Case Study
    Intermediate

    Ending poverty and zero hunger are top two goals United Nations aims to achieve by 2030 under its sustainable development program. Hunger and poverty are byproducts of multiple factors and fighting them require multi-fold effort from all stakeholders. Artificial Intelligence and Machine learning has transformed the way we live, work and interact. However economics of business has limited its application to few segments of the society. A much conscious effort is needed to bring the power of AI to the benefits of the ones who actually need it the most – people below the poverty line. Here we present our thoughts on how deep learning and big data analytics can be combined to enable effective implementation of anti-poverty programs. The advancements in deep learning , micro diagnostics combined with effective technology policy is the right recipe for a progressive growth of a nation. Deep learning can help identify poverty zones across the globe based on night time images where the level of light correlates to higher economic growth. Once the areas of lower economic growth are identified, geographic and demographic data can be combined to establish micro level diagnostics of these underdeveloped area. The insights from the data can help plan an effective intervention program. Machine Learning can be further used to identify potential donors, investors and contributors across the globe based on their skill-set, interest, history, ethnicity, purchasing power and their native connect to the location of the proposed program. Adequate resource allocation and efficient design of the program will also not guarantee success of a program unless the project execution is supervised at grass-root level. Data Analytics can be used to monitor project progress, effectiveness and detect anomaly in case of any fraud or mismanagement of funds.

  • Liked Dipanjan Sarkar
    keyboard_arrow_down

    Dipanjan Sarkar - Explainable Artificial Intelligence - Demystifying the Hype

    Dipanjan Sarkar
    Dipanjan Sarkar
    Data Scientist
    Red Hat
    schedule 4 months ago
    Sold Out!
    45 Mins
    Tutorial
    Intermediate

    The field of Artificial Intelligence powered by Machine Learning and Deep Learning has gone through some phenomenal changes over the last decade. Starting off as just a pure academic and research-oriented domain, we have seen widespread industry adoption across diverse domains including retail, technology, healthcare, science and many more. More than often, the standard toolbox of machine learning, statistical or deep learning models remain the same. New models do come into existence like Capsule Networks, but industry adoption of the same usually takes several years. Hence, in the industry, the main focus of data science or machine learning is more ‘applied’ rather than theoretical and effective application of these models on the right data to solve complex real-world problems is of paramount importance.

    A machine learning or deep learning model by itself consists of an algorithm which tries to learn latent patterns and relationships from data without hard-coding fixed rules. Hence, explaining how a model works to the business always poses its own set of challenges. There are some domains in the industry especially in the world of finance like insurance or banking where data scientists often end up having to use more traditional machine learning models (linear or tree-based). The reason being that model interpretability is very important for the business to explain each and every decision being taken by the model.However, this often leads to a sacrifice in performance. This is where complex models like ensembles and neural networks typically give us better and more accurate performance (since true relationships are rarely linear in nature).We, however, end up being unable to have proper interpretations for model decisions.

    To address and talk about these gaps, I will take a conceptual yet hands-on approach where we will explore some of these challenges in-depth about explainable artificial intelligence (XAI) and human interpretable machine learning and even showcase with some examples using state-of-the-art model interpretation frameworks in Python!

  • Liked Dat Tran
    keyboard_arrow_down

    Dat Tran - Image ATM - Image Classification for Everyone

    Dat Tran
    Dat Tran
    Head of AI
    Axel Springer AI
    schedule 3 months ago
    Sold Out!
    45 Mins
    Talk
    Intermediate

    At idealo.de we store and display millions of images. Our gallery contains pictures of all sorts. You’ll find there vacuum cleaners, bike helmets as well as hotel rooms. Working with huge volume of images brings some challenges: How to organize the galleries? What exactly is in there? Do we actually need all of it?

    To tackle these problems you first need to label all the pictures. In 2018 our Data Science team completed four projects in the area of image classification. In 2019 there were many more to come. Therefore, we decided to automate this process by creating a software we called Image ATM (Automated Tagging Machine). With the help of transfer learning, Image ATM enables the user to train a Deep Learning model without knowledge or experience in the area of Machine Learning. All you need is data and spare couple of minutes!

    In this talk we will discuss the state-of-art technologies available for image classification and present Image ATM in the context of these technologies. We will then give a crash course of our product where we will guide you through different ways of using it - in shell, on Jupyter Notebook and on the Cloud. We will also talk about our roadmap for Image ATM.

  • Liked Akshay Bahadur
    keyboard_arrow_down

    Akshay Bahadur - Minimizing CPU utilization for deep networks

    Akshay Bahadur
    Akshay Bahadur
    SDE-I
    Symantec Softwares
    schedule 3 months ago
    Sold Out!
    45 Mins
    Demonstration
    Beginner

    The advent of machine learning along with its integration with computer vision has enabled users to efficiently to develop image-based solutions for innumerable use cases. A machine learning model consists of an algorithm which draws some meaningful correlation between the data without being tightly coupled to a specific set of rules. It's crucial to explain the subtle nuances of the network along with the use-case we are trying to solve. With the advent of technology, the quality of the images has increased which in turn has increased the need for resources to process the images for building a model. The main question, however, is to discuss the need to develop lightweight models keeping the performance of the system intact.
    To connect the dots, we will talk about the development of these applications specifically aimed to provide equally accurate results without using much of the resources. This is achieved by using image processing techniques along with optimizing the network architecture.
    These applications will range from recognizing digits, alphabets which the user can 'draw' at runtime; developing state of the art facial recognition system; predicting hand emojis, developing a self-driving system, detecting Malaria and brain tumor, along with Google's project of 'Quick, Draw' of hand doodles.
    In this presentation, we will discuss the development of such applications with minimization of CPU usage.

  • Liked Dipanjan Sarkar
    keyboard_arrow_down

    Dipanjan Sarkar / Anuj Gupta - A Hands-on Introduction to Natural Language Processing

    480 Mins
    Workshop
    Intermediate

    Data is the new oil and unstructured data, especially text, images and videos contain a wealth of information. However, due to the inherent complexity in processing and analyzing this data, people often refrain from spending extra time and effort in venturing out from structured datasets to analyze these unstructured sources of data, which can be a potential gold mine. Natural Language Processing (NLP) is all about leveraging tools, techniques and algorithms to process and understand natural language-based data, which is usually unstructured like text, speech and so on. In this workshop, we will be looking at tried and tested strategies, techniques and workflows which can be leveraged by practitioners and data scientists to extract useful insights from text data.

    Being specialized in domains like computer vision and natural language processing is no longer a luxury but a necessity which is expected of any data scientist in today’s fast-paced world! With a hands-on and interactive approach, we will understand essential concepts in NLP along with extensive case- studies and hands-on examples to master state-of-the-art tools, techniques and frameworks for actually applying NLP to solve real- world problems. We leverage Python 3 and the latest and best state-of- the-art frameworks including NLTK, Gensim, SpaCy, Scikit-Learn, TextBlob, Keras and TensorFlow to showcase our examples.

    In my journey in this field so far, I have struggled with various problems, faced many challenges, and learned various lessons over time. This workshop will contain a major chunk of the knowledge I’ve gained in the world of text analytics and natural language processing, where building a fancy word cloud from a bunch of text documents is not enough anymore. Perhaps the biggest problem with regard to learning text analytics is not a lack of information but too much information, often called information overload. There are so many resources, documentation, papers, books, and journals containing so much content that they often overwhelm someone new to the field. You might have had questions like ‘What is the right technique to solve a problem?’, ‘How does text summarization really work?’ and ‘Which are the best frameworks to solve multi-class text categorization?’ among many other questions! Based on my prior knowledge and learnings from publishing a couple of books in this domain, this workshop should help readers avoid the pressing issues I’ve faced in my journey so far and learn the strategies to master NLP.

    This workshop follows a comprehensive and structured approach. First it tackles the basics of natural language understanding and Python for handling text data in the initial chapters. Once you’re familiar with the basics, we cover text processing, parsing and understanding. Then, we address interesting problems in text analytics in each of the remaining chapters, including text classification, clustering and similarity analysis, text summarization and topic models, semantic analysis and named entity recognition, sentiment analysis and model interpretation. The last chapter is an interesting chapter on the recent advancements made in NLP thanks to deep learning and transfer learning and we cover an example of text classification with universal sentence embeddings.

  • 45 Mins
    Demonstration
    Intermediate

    Artificial Intelligence (AI) has been rapidly adopted in various spheres of medicine such as microbiological analysis, discovery of drug, disease diagnosis, Genomics, medical imaging and bioinformatics for translating biomedical data into improved human healthcare. Automation in healthcare using machine learning/deep learning assists physicians to make faster, cheaper and more accurate diagnoses.

    We have completed three healthcare projects using deep learning and are currently working on three more healthcare projects. In this session, we shall demonstrate two deep learning based healthcare applications developed using TensorFlow. The discussion of each application will include the following: problem statement, proposed solution, data collected, experimental analysis and challenges faced to achieve this success. Finally, we will briefly discuss the other applications on which we are currently working and the future scope of research in this area.

  • Liked Dr. Atul Singh
    keyboard_arrow_down

    Dr. Atul Singh - Endow the gift of eloquence to your NLP applications using pre-trained word embeddings

    45 Mins
    Talk
    Beginner

    Word embeddings are the plinth stones of Natural Language Processing (NLP) applications, used to transform human language into vectors that can be understood and processed by machine learning algorithms. Pre-trained word embeddings enable transfer of prior knowledge about the human language into a new application thereby enabling rapid creation of a scalable and efficient NLP applications. Since the emergence of word2vec in 2013, the word embeddings field has seen rapid developments by leaps and bounds with each new successive word embedding outperforming the prior one.

    The goal of this talk is to demonstrate the efficacy of using pre-trained word embedding to create scalable and robust NLP applications, and to explain to the audience the underlying theory of word embeddings that makes it possible. The talk will cover prominent word vector embeddings such as BERT and ELMo from the recent literature.

  • Liked Suvro Shankar Ghosh
    keyboard_arrow_down

    Suvro Shankar Ghosh - Real-Time Advertising Based On Web Browsing In Telecom Domain

    45 Mins
    Case Study
    Intermediate

    The following section describes Telco Domain Real-time advertising based on browsing use case in terms of :

    • Potential business benefits to earn.
    • Functional use case architecture depicted.
    • Data sources (attributes required).
    • Analytic to be performed,
    • Output to be provided and target systems to be integrated with.

    This use case is part of the monetization category. The goal of the use case is to provide a kind of DataMart to either Telecom business parties or external third parties sufficient, relevant and customized information to produce real-time advertising to Telecom end users. The customer targets are all Telecom network end-users.

    The customization information to be delivered to advertise are based on several dimensions:

    • Customer characteristics: demographic, telco profile.
    • Customer usage: Telco products or any other interests.
    • Customer time/space identification: location, zoning areas, usage time windows.

    Use case requirements are detailed in the description below as “ Targeting method”

    1. Search Engine Targeting:

    The telco will use users web history to track what users are looking at and to gather information about them. When a user goes onto a website, their web browsing history will show information of the user, what he or she searched, where they are from, found by the ip address, and then build a profile around them, allowing Telco to easily target ads to the user more specifically.

    1. Content and Contextual Targeting:

    This is when advertisers can put ads in a specific place, based on the relative content present. This targeting method can be used across different mediums, for example in an article online, about purchasing homes would have an advert associated with this context, like an insurance ad. This is achieved through an ad matching system which analyses the contents on a page or finds keywords and presents a relevant advert, sometimes through pop-ups.

    1. Technical Targeting

    This form of targeting is associated with the user’s own software or hardware status. The advertisement is altered depending on the user’s available network bandwidth, for example if a user is on their mobile phone that has a limited connection, the ad delivery system will display a version of the ad that is smaller for a faster data transfer rate.

    1. Time Targeting:

    This type of targeting is centered around time and focuses on the idea of fitting in around people’s everyday lifestyles. For example, scheduling specific ads at a timeframe from 5-7pm, when the

    1. Sociodemographic Targeting:

    This form of targeting focuses on the characteristics of consumers, including their age, gender, and nationality. The idea is to target users specifically, using this data about them collected, for example, targeting a male in the age bracket of 18-24. The telco will use this form of targeting by showing advertisements relevant to the user’s individual demographic profile. this can show up in forms of banner ads, or commercial videos.

    1. Geographical and Location-Based Targeting:

    This type of advertising involves targeting different users based on their geographic location. IP addresses can signal the location of a user and can usually transfer the location through different cells.

    1. Behavioral Targeting:

    This form of targeted advertising is centered around the activity/actions of users and is more easily achieved on web pages. Information from browsing websites can be collected, which finds patterns in users search history.

    1. Retargeting:

    Is where advertising uses behavioral targeting to produce ads that follow you after you have looked or purchased are a particular item. Retargeting is where advertisers use this information to ‘follow you’ and try and grab your attention so you do not forget.

    1. Opinions, attitudes, interests, and hobbies:

    Psychographic segmentation also includes opinions on gender and politics, sporting and recreational activities, views on the environment and arts and cultural issues.

  • Liked Tanuj Jain
    keyboard_arrow_down

    Tanuj Jain - Taming the Spark beast for Deep Learning predictions at scale

    45 Mins
    Talk
    Intermediate

    Predicting at scale is a challenging pursuit, especially when working with Deep Learning models. This is because Deep Learning models tend to have high inference time. At idealo.de, Germany's biggest price comparison platform, the Data Science team was tasked with carrying out image tagging to improve our product galleries.

    One of the biggest challenges we faced was to generate predictions for more than 300 million images within a short time while keeping the costs low. Moreover, a resolution for the scaling problem became critical since we intended to apply other Deep Learning models on the same big dataset. We ended up formulating a batch-prediction solution by employing an Apache Spark setup that ran on an AWS EMR cluster.

    Spark is notorious for being difficult to configure and tune. As a result, we had to carry on several optimisation steps in order to meet the scale requirements that adhered to our time and financial constraints. In this talk, I would present our Spark setup and focus on the journey of optimising the Spark tagging solution. Additionally, I would also talk briefly about the underlying deep learning model which was used to predict the image tags.

  • Liked Gaurav Shekhar
    keyboard_arrow_down

    Gaurav Shekhar - AIOps - Prediction of Critical Events

    45 Mins
    Case Study
    Beginner

    With the rise of cloud, distributed architectures, containers, and microservices, a rise in data overload is visible. With growing amounts of DevOps processes; alerts, repeated mundane jobs etc. have put new demands to both synthesize meaning from this influx of information and connect it to broader business objectives.

    AIOps is the application of artificial intelligence for IT operations. AIOps uses machine learning and data science to give IT operations teams a real-time understanding of any issues affecting the availability or performance of the systems under their care. Rather than reacting to issues as they arise in the application environment, AIOps platforms allow IT operations teams to proactively manage performance challenges faster, and in real-time

    This case study focuses on solving the following business needs:

    1. With an ever-increasing rise in alerts, a large number of incidents were getting generated. There was a need to develop a framework that can generate correlations and identify correlated events, thereby reduce overall incidents volume.

    2. For many incidents a reactive strategy does not work and can lead to a loss of reputation; there was a need to develop predictive capabilities that can detect anomalous events and predict critical events well in advance.

    3. Given the pressures of reducing the Resolution time and short window of opportunity available to the analysts, there was a need to provide search capabilities so that the analysts can have a head start as to how similar incidents were solved in past.

    Data from multiple systems sending alerts, including traditional IT monitoring, log events in text format, application and network performance data etc were made available for the PoC.

    The solution framework developed had a discovery phase where the base data was visualized and explored, a NLP driven text mining layer where log data in text format was pre-processed, clustered and correlations were developed to identify related events using Machine Learning algorithms. Topic Mining was used to get a quick overview of a large number of event data. Next, a temporal mining layer explored the temporal relationship between nodes and cluster groups, necessary features were developed on top of the associations generated from temporal layers. Advanced Machine learning algorithms were then developed on these features to predict critical events almost 12 hours in advance. Last but not the least a search layer that computed the similarity of any incident with those in Service Now database was developed that provided analysts insights readily available information on similar incidents and how they were solved in past so that the analysts do not have to reinvent the wheel.

  • Liked Shalini Sinha
    keyboard_arrow_down

    Shalini Sinha / Ashok J / Yogesh Padmanaban - Hybrid Classification Model with Topic Modelling and LSTM Text Classifier to identify key drivers behind Incident Volume

    45 Mins
    Case Study
    Intermediate

    Incident volume reduction is one of the top priorities for any large-scale service organization along with timely resolution of incidents within the specified SLA parameters. AI and Machine learning solutions can help IT service desk manage the Incident influx as well as resolution cost by

    • Identifying major topics from incident description and planning resource allocation and skill-sets accordingly
    • Producing knowledge articles and resolution summary of similar incidents raised earlier
    • Analyzing Root Causes of incidents and introducing processes and automation framework to predict and resolve them proactively

    We will look at different approaches to combine standard document clustering algorithms such as Latent Dirichlet Allocation (LDA) and K-mean clustering on doc2vec along-with Text classification to produce easily interpret-able document clusters with semantically coherent/ text representation that helped IT operations of a large FMCG client identify key drivers/topics contributing towards incident volume and take necessary action on it.

  • Liked Antrixsh Gupta
    keyboard_arrow_down

    Antrixsh Gupta - Creating Custom Interactive Data Visualization Dashboards with Bokeh

    90 Mins
    Workshop
    Beginner

    This will be a hands-on workshop how to build a custom interactive dashboard application on your local machine or on any cloud service provider. You will also learn how to deploy this application with both security and scalability in mind.

    Powerful Data visualization software solutions are extremely useful when building interactive data visualization dashboards. However, these types of solutions might not provide sufficient customization options. For those scenarios, you can use open source libraries like D3.js, Chart.js, or Bokeh to create custom dashboards. While these libraries offer a lot of flexibility for building dashboards with tailored features and visualizations.

  • Liked Saikat Sarkar
    keyboard_arrow_down

    Saikat Sarkar / Dhanya Parameshwaran / Dr Sweta Choudhary / Raunak Bhandari / Srikanth Ramaswamy / Usha Rengaraju - AI meets Neuroscience

    480 Mins
    Workshop
    Advanced

    This is a mixer workshop with lot of clinicians , medical experts , Neuroimaging experts ,Neuroscientists, data scientists and statisticians will come under one roof to bring together this revolutionary workshop.

    The theme will be updated soon .

    Our celebrity and distinguished presenter Srikanth Ramaswamy who is an advisor at Mysuru Consulting Group and also works Blue Brain Project at the EPFL will be delivering an expert talk in the workshop.

    https://www.linkedin.com/in/ramaswamysrikanth/

    { This workshop will be a combination of panel discussions , expert talk and neuroimaging data science workshop ( applying machine learning and deep learning algorithms to Neuroimaging data sets}

    { We are currently onboarding several experts from Neuroscience domain --Neurosurgeons , Neuroscientists and Computational Neuroscientists .Details of the speakers will be released soon }

    Abstract for the Neuroimaging Data Science Part of the workshop:

    The study of the human brain with neuroimaging technologies is at the cusp of an exciting era of Big Data. Many data collection projects, such as the NIH-funded Human Connectome Project, have made large, high- quality datasets of human neuroimaging data freely available to researchers. These large data sets promise to provide important new insights about human brain structure and function, and to provide us the clues needed to address a variety of neurological and psychiatric disorders. However, neuroscience researchers still face substantial challenges in capitalizing on these data, because these Big Data require a different set of technical and theoretical tools than those that are required for analyzing traditional experimental data. These skills and ideas, collectively referred to as Data Science, include knowledge in computer science and software engineering, databases, machine learning and statistics, and data visualization.

    The workshop covers Data analysis, statistics and data visualization and applying cutting-edge analytics to complex and multimodal neuroimaging datasets . Topics which will be covered in this workshop are statistics, associative techniques, graph theoretical analysis, causal models, nonparametric inference, and meta-analytical synthesis.

  • Liked Raunak Bhandari
    keyboard_arrow_down

    Raunak Bhandari / Ankit Desai / Usha Rengaraju - Knowledge Graph from Natural Language: Incorporating order from textual chaos

    90 Mins
    Workshop
    Advanced

    Intro

    What If I told you that instead of the age-old saying that "a picture is worth a thousand words", it could be that "a word is worth a thousand pictures"?

    Language evolved as an abstraction of distilled information observed and collected from the environment for sophisticated and efficient interpersonal communication and is responsible for humanity's ability to collaborate by storing and sharing experiences. Words represent evocative abstractions over information encoded in our memory and are a composition of many primitive information types.

    That is why language processing is a much more challenging domain and witnessed a delayed 'imagenet' moment.

    One of the cornerstone applications of natural language processing is to leverage the language's inherent structural properties to build a knowledge graph of the world.

    Knowledge Graphs

    Knowledge graph is a form of a rich knowledge base which represents information as an interconnected web of entities and their interactions with each other. This naturally manifests as a graph data structure, where nodes represent entities and the relationship between them are the edges.

    Automatically constructing and leveraging it in an intelligent system is an AI-hard problem, and an amalgamation of a wide variety of fields like natural language processing, information extraction and retrieval, graph algorithms, deep learning, etc.

    It represents a paradigm shift for artificial intelligence systems by going beyond deep learning driven pattern recognition and towards more sophisticated forms of intelligence rooted in reasoning to solve much more complicated tasks.

    To elucidate the differences between reasoning and pattern recognition: consider the problem of computer vision: the vision stack processes an image to detect shapes and patterns in order to identify objects - this is pattern recognition, whereas reasoning is much more complex - to associate detected objects with each other in order to meaningfully describe a scene. For this to be accomplished, a system needs to have a rich understanding of the entities within the scene and their relationships with each other.

    To understand a scene where a person is drinking a can of cola, a system needs to understand concepts like people, that they drink certain liquids via their mouths, liquids can be placed into metallic containers which can be held within a palm to be consumed, and the generational phenomenon that is cola, among others. A sophisticated vision system can then use this rich understanding to fetch details about cola in-order to alert the user of their calorie intake, or to update preferences for a customer. A Knowledge Graph's 'awareness' of the world phenomenons can thus be used to augment a vision system to facilitate such higher order semantic reasoning.

    In production systems though, reasoning may be cast into a pattern recognition problem by limiting the scope of the system for feasibility, but this may be insufficient as the complexity of the system scales or we try to solve general intelligence.

    Challenges in building a Knowledge Graph

    There are two primary challenges towards integrating knowledge graphs in systems: acquisition of knowledge and construction of the graph and effectively leveraging it with robust algorithms to solve reasoning tasks. Creation of the knowledge graph can vary widely depending on the breadth and complexity of the domain - from just manual curation to automatically constructing it by leveraging unstructured/semi-structured sources of knowledge, like books and Wikipedia.

    Many natural language processing tasks are precursors towards building knowledge graphs from unstructured text, like syntactic parsing, information extraction, entity linking, named entity recognition, relationship extraction, semantic parsing, semantic role labeling, entity disambiguation, etc. Open information extraction is an active area of research on extracting semantic triplets of object ('John'), predicate ('eats'), subject ('burger') from plain text, which are used to build the knowledge graph automatically.

    A very interesting approach to this problem is the extraction of frame semantics. Frame semantics relates linguistic semantics to encyclopedic knowledge and the basic idea is that the meaning of a word is linked to all essential knowledge that relates to it, for eg. to understand the word "sell", it's necessary to also know about commercial transactions, which involve a seller, buyer, goods, payment, and the relations between these, which can be represented in a knowledge graph.

    This workshop will focus on building such a knowledge graph from unstructured text.

    Learn good research practices like organizing code and modularizing output for productive data wrangling to improve algorithm performance.

    Knowledge Graph at Embibe

    We will showcase how Embibe's proprietary Knowledge Graph manifests and how it's leveraged across a multitude of projects in our Data Science Lab.

  • Liked Dr. C.S.Jyothirmayee
    keyboard_arrow_down

    Dr. C.S.Jyothirmayee / Usha Rengaraju / Vijayalakshmi Mahadevan - Deep learning powered Genomic Research

    90 Mins
    Workshop
    Advanced

    The event disease happens when there is a slip in the finely orchestrated dance between physiology, environment and genes. Treatment with chemicals (natural, synthetic or combination) solved some diseases but others persisted and got propagated along the generations. Molecular basis of disease became prime center of studies to understand and to analyze root cause. Cancer also showed a way that origin of disease, detection, prognosis and treatment along with cure was not so uncomplicated process. Treatment of diseases had to be done case by case basis (no one size fits).

    With the advent of next generation sequencing, high through put analysis, enhanced computing power and new aspirations with neural network to address this conundrum of complicated genetic elements (structure and function of various genes in our systems). This requires the genomic material extraction, their sequencing (automated system) and analysis to map the strings of As, Ts, Gs, and Cs which yields genomic dataset. These datasets are too large for traditional and applied statistical techniques. Consequently, the important signals are often incredibly small along with blaring technical noise. This further requires far more sophisticated analysis techniques. Artificial intelligence and deep learning gives us the power to draw clinically useful information from the genetic datasets obtained by sequencing.

    Precision of these analyses have become vital and way forward for disease detection, its predisposition, empowers medical authorities to make fair and situationally decision about patient treatment strategies. This kind of genomic profiling, prediction and mode of disease management is useful to tailoring FDA approved treatment strategies based on these molecular disease drivers and patient’s molecular makeup.

    Now, the present scenario encourages designing, developing, testing of medicine based on existing genetic insights and models. Deep learning models are helping to analyze and interpreting tiny genetic variations ( like SNPs – Single Nucleotide Polymorphisms) which result in unraveling of crucial cellular process like metabolism, DNA wear and tear. These models are also responsible in identifying disease like cancer risk signatures from various body fluids. They have the immense potential to revolutionize healthcare ecosystem. Clinical data collection is not streamlined and done in a haphazard manner and the requirement of data to be amenable to a uniform fetchable and possibility to be combined with genetic information would power the value, interpretation and decisive patient treatment modalities and their outcomes.

    There is hugh inflow of medical data from emerging human wearable technologies, along with other health data integrated with ability to do quickly carry out complex analyses on rich genomic databases over the cloud technologies … would revitalize disease fighting capability of humans. Last but still upcoming area of application in direct to consumer genomics (success of 23andMe).

    This road map promises an end-to-end system to face disease in its all forms and nature. Medical research, and its applications like gene therapies, gene editing technologies like CRISPR, molecular diagnostics and precision medicine could be revolutionized by tailoring a high-throughput computing method and its application to enhanced genomic datasets.

  • Liked Shrutika Poyrekar
    keyboard_arrow_down

    Shrutika Poyrekar / kiran karkera / Usha Rengaraju - Introduction to Bayesian Networks

    90 Mins
    Workshop
    Advanced

    { This is a handson workshop . The use case is Traffic analysis . }

    Most machine learning models assume independent and identically distributed (i.i.d) data. Graphical models can capture almost arbitrarily rich dependency structures between variables. They encode conditional independence structure with graphs. Bayesian network, a type of graphical model describes a probability distribution among all variables by putting edges between the variable nodes, wherein edges represent the conditional probability factor in the factorized probability distribution. Thus Bayesian Networks provide a compact representation for dealing with uncertainty using an underlying graphical structure and the probability theory. These models have a variety of applications such as medical diagnosis, biomonitoring, image processing, turbo codes, information retrieval, document classification, gene regulatory networks, etc. amongst many others. These models are interpretable as they are able to capture the causal relationships between different features .They can work efficiently with small data and also deal with missing data which gives it more power than conventional machine learning and deep learning models.

    In this session, we will discuss concepts of conditional independence, d- separation , Hammersley Clifford theorem , Bayes theorem, Expectation Maximization and Variable Elimination. There will be a code walk through of simple case study.