Cleaning, preparing , transforming, exploring data and modeling it's what we hear all the time about data science, and these steps maybe the most important ones. But that's not the only thing about data science, in this talk you will learn how the combination of Apache Spark, Optimus, the Python ecosystem and Data Operations can form a whole framework for data science that will allow you and your company to go further, and beyond common sense and intuition to solve complex business problems.

 
 

Outline/Structure of the Tutorial

  • Intro
  • What is Data Science
  • Introduction to Apache Spark (PySpark oriented)
  • The need for Optimus
  • Introduction to Optimus
  • Building blocks of a Data Science Workflow
  • DataOps
  • Machine Learning with PySpark and Optimus
  • Deploying and monitoring models
  • Final words

Learning Outcome

You’ll learn how to build a complete data science workflow using Apache Spark (PySpark), Optimus, the Python ecosystem and Data Operations, also how to solve real-life problems and working with big data to solve complex business cases.

Target Audience

Data Scientists, Data Engineers, Data Specialists, Machine Learning Engineers, Data Science Enthusiasts.

Prerequisites for Attendees

  • A prior knowledge of Python is necessary.
  • Some familiarity with Spark would be great.
  • Principles of Data Science, Machine Learning and Programming.
  • There will be coding. You will need to bring your laptop and have an internet connection.

Install Optimus:

pip install optimuspyspark

schedule Submitted 1 year ago

Public Feedback


    • 45 Mins
      Keynote
      Intermediate

      Since we originally proposed the need for a first-class language, compiler and ecosystem for machine learning (ML) - a view that is increasingly shared by many, there have been plenty of interesting developments in the field. Not only have the tradeoffs in existing systems, such as TensorFlow and PyTorch, not been resolved, but they are clearer than ever now that both frameworks contain distinct "static graph" and "eager execution" interfaces. Meanwhile, the idea of ML models fundamentally being differentiable algorithms – often called differentiable programming – has caught on.

      Where current frameworks fall short, several exciting new projects have sprung up that dispense with graphs entirely, to bring differentiable programming to the mainstream. Myia, by the Theano team, differentiates and compiles a subset of Python to high-performance GPU code. Swift for TensorFlow extends Swift so that compatible functions can be compiled to TensorFlow graphs. And finally, the Flux ecosystem is extending Julia’s compiler with a number of ML-focused tools, including first-class gradients, just-in-time CUDA kernel compilation, automatic batching and support for new hardware such as TPUs.

      This talk will demonstrate how Julia is increasingly becoming a natural language for machine learning, the kind of libraries and applications the Julia community is building, the contributions from India (there are many!), and our plans going forward.

    • Liked Dat Tran
      keyboard_arrow_down

      Dat Tran - Image ATM - Image Classification for Everyone

      Dat Tran
      Dat Tran
      Head of AI
      Axel Springer AI
      schedule 1 year ago
      Sold Out!
      45 Mins
      Talk
      Intermediate

      At idealo.de we store and display millions of images. Our gallery contains pictures of all sorts. You’ll find there vacuum cleaners, bike helmets as well as hotel rooms. Working with huge volume of images brings some challenges: How to organize the galleries? What exactly is in there? Do we actually need all of it?

      To tackle these problems you first need to label all the pictures. In 2018 our Data Science team completed four projects in the area of image classification. In 2019 there were many more to come. Therefore, we decided to automate this process by creating a software we called Image ATM (Automated Tagging Machine). With the help of transfer learning, Image ATM enables the user to train a Deep Learning model without knowledge or experience in the area of Machine Learning. All you need is data and spare couple of minutes!

      In this talk we will discuss the state-of-art technologies available for image classification and present Image ATM in the context of these technologies. We will then give a crash course of our product where we will guide you through different ways of using it - in shell, on Jupyter Notebook and on the Cloud. We will also talk about our roadmap for Image ATM.

    • Liked Dr. Vikas Agrawal
      keyboard_arrow_down

      Dr. Vikas Agrawal - Non-Stationary Time Series: Finding Relationships Between Changing Processes for Enterprise Prescriptive Systems

      45 Mins
      Talk
      Intermediate

      It is too tedious to keep on asking questions, seek explanations or set thresholds for trends or anomalies. Why not find problems before they happen, find explanations for the glitches and suggest shortest paths to fixing them? Businesses are always changing along with their competitive environment and processes. No static model can handle that. Using dynamic models that find time-delayed interactions between multiple time series, we need to make proactive forecasts of anomalous trends of risks and opportunities in operations, sales, revenue and personnel, based on multiple factors influencing each other over time. We need to know how to set what is “normal” and determine when the business processes from six months ago do not apply any more, or only applies to 35% of the cases today, while explaining the causes of risk and sources of opportunity, their relative directions and magnitude, in the context of the decision-making and transactional applications, using state-of-the-art techniques.

      Real world processes and businesses keeps changing, with one moving part changing another over time. Can we capture these changing relationships? Can we use multiple variables to find risks on key interesting ones? We will take a fun journey culminating in the most recent developments in the field. What methods work well and which break? What can we use in practice?

      For instance, we can show a CEO that they would miss their revenue target by over 6% for the quarter, and tell us why i.e. in what ways has their business changed over the last year. Then we provide the prioritized ordered lists of quickest, cheapest and least risky paths to help turn them over the tide, with estimates of relative costs and expected probability of success.

    • Liked Dipanjan Sarkar
      keyboard_arrow_down

      Dipanjan Sarkar - Explainable Artificial Intelligence - Demystifying the Hype

      45 Mins
      Tutorial
      Intermediate

      The field of Artificial Intelligence powered by Machine Learning and Deep Learning has gone through some phenomenal changes over the last decade. Starting off as just a pure academic and research-oriented domain, we have seen widespread industry adoption across diverse domains including retail, technology, healthcare, science and many more. More than often, the standard toolbox of machine learning, statistical or deep learning models remain the same. New models do come into existence like Capsule Networks, but industry adoption of the same usually takes several years. Hence, in the industry, the main focus of data science or machine learning is more ‘applied’ rather than theoretical and effective application of these models on the right data to solve complex real-world problems is of paramount importance.

      A machine learning or deep learning model by itself consists of an algorithm which tries to learn latent patterns and relationships from data without hard-coding fixed rules. Hence, explaining how a model works to the business always poses its own set of challenges. There are some domains in the industry especially in the world of finance like insurance or banking where data scientists often end up having to use more traditional machine learning models (linear or tree-based). The reason being that model interpretability is very important for the business to explain each and every decision being taken by the model.However, this often leads to a sacrifice in performance. This is where complex models like ensembles and neural networks typically give us better and more accurate performance (since true relationships are rarely linear in nature).We, however, end up being unable to have proper interpretations for model decisions.

      To address and talk about these gaps, I will take a conceptual yet hands-on approach where we will explore some of these challenges in-depth about explainable artificial intelligence (XAI) and human interpretable machine learning and even showcase with some examples using state-of-the-art model interpretation frameworks in Python!

    • 90 Mins
      Workshop
      Intermediate

      Machine learning and deep learning have been rapidly adopted in various spheres of medicine such as discovery of drug, disease diagnosis, Genomics, medical imaging and bioinformatics for translating biomedical data into improved human healthcare. Machine learning/deep learning based healthcare applications assist physicians to make faster, cheaper and more accurate diagnosis.

      We have successfully developed three deep learning based healthcare applications and are currently working on two more healthcare related projects. In this workshop, we will discuss one healthcare application titled "Deep Learning based Craniofacial Distance Measurement for Facial Reconstructive Surgery" which is developed by us using TensorFlow. Craniofacial distances play important role in providing information related to facial structure. They include measurements of head and face which are to be measured from image. They are used in facial reconstructive surgeries such as cephalometry, treatment planning of various malocclusions, craniofacial anomalies, facial contouring, facial rejuvenation and different forehead surgeries in which reliable and accurate data are very important and cannot be compromised.

      Our discussion on healthcare application will include precise problem statement, the major steps involved in the solution (deep learning based face detection & facial landmarking and craniofacial distance measurement), data set, experimental analysis and challenges faced & overcame to achieve this success. Subsequently, we will provide hands-on exposure to implement this healthcare solution using TensorFlow. Finally, we will briefly discuss the possible extensions of our work and the future scope of research in healthcare sector.

    • Liked Dr. C.S.Jyothirmayee
      keyboard_arrow_down

      Dr. C.S.Jyothirmayee / Usha Rengaraju / Vijayalakshmi Mahadevan - Deep learning powered Genomic Research

      90 Mins
      Workshop
      Advanced

      The event disease happens when there is a slip in the finely orchestrated dance between physiology, environment and genes. Treatment with chemicals (natural, synthetic or combination) solved some diseases but others persisted and got propagated along the generations. Molecular basis of disease became prime center of studies to understand and to analyze root cause. Cancer also showed a way that origin of disease, detection, prognosis and treatment along with cure was not so uncomplicated process. Treatment of diseases had to be done case by case basis (no one size fits).

      With the advent of next generation sequencing, high through put analysis, enhanced computing power and new aspirations with neural network to address this conundrum of complicated genetic elements (structure and function of various genes in our systems). This requires the genomic material extraction, their sequencing (automated system) and analysis to map the strings of As, Ts, Gs, and Cs which yields genomic dataset. These datasets are too large for traditional and applied statistical techniques. Consequently, the important signals are often incredibly small along with blaring technical noise. This further requires far more sophisticated analysis techniques. Artificial intelligence and deep learning gives us the power to draw clinically useful information from the genetic datasets obtained by sequencing.

      Precision of these analyses have become vital and way forward for disease detection, its predisposition, empowers medical authorities to make fair and situationally decision about patient treatment strategies. This kind of genomic profiling, prediction and mode of disease management is useful to tailoring FDA approved treatment strategies based on these molecular disease drivers and patient’s molecular makeup.

      Now, the present scenario encourages designing, developing, testing of medicine based on existing genetic insights and models. Deep learning models are helping to analyze and interpreting tiny genetic variations ( like SNPs – Single Nucleotide Polymorphisms) which result in unraveling of crucial cellular process like metabolism, DNA wear and tear. These models are also responsible in identifying disease like cancer risk signatures from various body fluids. They have the immense potential to revolutionize healthcare ecosystem. Clinical data collection is not streamlined and done in a haphazard manner and the requirement of data to be amenable to a uniform fetchable and possibility to be combined with genetic information would power the value, interpretation and decisive patient treatment modalities and their outcomes.

      There is hugh inflow of medical data from emerging human wearable technologies, along with other health data integrated with ability to do quickly carry out complex analyses on rich genomic databases over the cloud technologies … would revitalize disease fighting capability of humans. Last but still upcoming area of application in direct to consumer genomics (success of 23andMe).

      This road map promises an end-to-end system to face disease in its all forms and nature. Medical research, and its applications like gene therapies, gene editing technologies like CRISPR, molecular diagnostics and precision medicine could be revolutionized by tailoring a high-throughput computing method and its application to enhanced genomic datasets.

    • Liked Badri Narayanan Gopalakrishnan
      keyboard_arrow_down

      Badri Narayanan Gopalakrishnan / Shalini Sinha / Usha Rengaraju - Lifting Up: How AI and Big data can contribute to anti-poverty programs

      45 Mins
      Case Study
      Intermediate

      Ending poverty and zero hunger are top two goals United Nations aims to achieve by 2030 under its sustainable development program. Hunger and poverty are byproducts of multiple factors and fighting them require multi-fold effort from all stakeholders. Artificial Intelligence and Machine learning has transformed the way we live, work and interact. However economics of business has limited its application to few segments of the society. A much conscious effort is needed to bring the power of AI to the benefits of the ones who actually need it the most – people below the poverty line. Here we present our thoughts on how deep learning and big data analytics can be combined to enable effective implementation of anti-poverty programs. The advancements in deep learning , micro diagnostics combined with effective technology policy is the right recipe for a progressive growth of a nation. Deep learning can help identify poverty zones across the globe based on night time images where the level of light correlates to higher economic growth. Once the areas of lower economic growth are identified, geographic and demographic data can be combined to establish micro level diagnostics of these underdeveloped area. The insights from the data can help plan an effective intervention program. Machine Learning can be further used to identify potential donors, investors and contributors across the globe based on their skill-set, interest, history, ethnicity, purchasing power and their native connect to the location of the proposed program. Adequate resource allocation and efficient design of the program will also not guarantee success of a program unless the project execution is supervised at grass-root level. Data Analytics can be used to monitor project progress, effectiveness and detect anomaly in case of any fraud or mismanagement of funds.

    • Liked Juan Manuel Contreras
      keyboard_arrow_down

      Juan Manuel Contreras - How to lead data science teams: The 3 D's of data science leadership

      45 Mins
      Talk
      Advanced

      Despite the increasing number of data scientists who are asked to take on leadership roles as they grow in their careers, there are still few resources on how to lead data science teams successfully.

      In this talk, I will argue that an effective data science leader has to wear three hats: Diplomat (understand the organization and their team and liaise between them), Diagnostician (figure out how what organizational needs can be met by their team and how), and Developer (grow their and their team's skills as well as the organization's understanding of data science to maximize the value their team can drive).

      Throughout, I draw on my experience as a data science leader both at a political party (the Democratic Party of the United States of America) and at a fintech startup (Even.com).

      Talk attendees will learn a framework for how to manage data scientists and lead a data science practice. In turn, attendees will be better prepared to tackle new or existing roles as data science leaders or be better able to identify promising candidates for these roles.

    • Liked Ramanathan R
      keyboard_arrow_down

      Ramanathan R / Gurram Poorna Prudhvi - Time Series analysis in Python

      240 Mins
      Workshop
      Intermediate

      “Time is precious so is Time Series Analysis”

      Time series analysis has been around for centuries helping us to solve from astronomical problems to business problems and advanced scientific research around us now. Time stores precious information, which most machine learning algorithms don’t deal with. But time series analysis, which is a mix of machine learning and statistics helps us to get useful insights. Time series can be applied to various fields like economy forecasting, budgetary analysis, sales forecasting, census analysis and much more. In this workshop, We will look at how to dive deep into time series data and make use of deep learning to make accurate predictions.

      Structure of the workshop goes like this

      • Introduction to Time series analysis
      • Time Series Exploratory Data Analysis and Data manipulation with pandas
      • Forecast Time series data with some classical method (AR, MA, ARMA, ARIMA, GARCH, E-GARCH)
      • Introduction to Deep Learning and Time series forecasting using MLP and LSTM
      • Forecasting using XGBoost
      • Financial Time Series data

      Libraries Used:

      • Keras (with Tensorflow backend)
      • matplotlib
      • pandas
      • statsmodels
      • sklearn
      • seaborn
      • arch
    • Liked Anuj Gupta
      keyboard_arrow_down

      Anuj Gupta - Natural Language Processing Bootcamp - Zero to Hero

      480 Mins
      Workshop
      Intermediate

      Data is the new oil and unstructured data, especially text, images and videos contain a wealth of information. However, due to the inherent complexity in processing and analyzing this data, people often refrain from spending extra time and effort in venturing out from structured datasets to analyze these unstructured sources of data, which can be a potential gold mine. Natural Language Processing (NLP) is all about leveraging tools, techniques and algorithms to process and understand natural language based unstructured data - text, speech and so on.

      Being specialized in domains like computer vision and natural language processing is no longer a luxury but a necessity which is expected of any data scientist in today’s fast-paced world! With a hands-on and interactive approach, we will understand essential concepts in NLP along with extensive case- studies and hands-on examples to master state-of-the-art tools, techniques and frameworks for actually applying NLP to solve real- world problems. We leverage Python 3 and the latest and best state-of- the-art frameworks including NLTK, Gensim, SpaCy, Scikit-Learn, TextBlob, Keras and TensorFlow to showcase our examples. You will be able to learn a fair bit of machine learning as well as deep learning in the context of NLP during this bootcamp.

      In our journey in this field, we have struggled with various problems, faced many challenges, and learned various lessons over time. This workshop is our way of giving back a major chunk of the knowledge we’ve gained in the world of text analytics and natural language processing, where building a fancy word cloud from a bunch of text documents is not enough anymore. You might have had questions like ‘What is the right technique to solve a problem?’, ‘How does text summarization really work?’ and ‘Which are the best frameworks to solve multi-class text categorization?’ among many other questions! Based on our prior knowledge and learnings from publishing a couple of books in this domain, this workshop should help readers avoid some of the pressing issues in NLP and learn effective strategies to master NLP.

      The intent of this workshop is to make you a hero in NLP so that you can start applying NLP to solve real-world problems. We start from zero and follow a comprehensive and structured approach to make you learn all the essentials in NLP. We will be covering the following aspects during the course of this workshop with hands-on examples and projects!

      • Basics of Natural Language and Python for NLP tasks
      • Text Processing and Wrangling
      • Text Understanding - POS, NER, Parsing
      • Text Representation - BOW, Embeddings, Contextual Embeddings
      • Text Similarity and Content Recommenders
      • Text Clustering
      • Topic Modeling
      • Text Summarization
      • Sentiment Analysis - Unsupervised & Supervised
      • Text Classification with Machine Learning and Deep Learning
      • Multi-class & Multi-Label Text Classification
      • Deep Transfer Learning and it's promise
      • Applying Deep Transfer Learning - Universal Sentence Encoders, ELMo and BERT for NLP tasks
      • Generative Deep Learning for NLP
      • Next Steps

      With over 10 hands-on projects, the bootcamp will be packed with plenty of hands-on examples for you to go through, try out and practice and we will try to keep theory to a minimum considering the limited time we have and the amount of ground we want to cover. We hope at the end of this workshop you can takeaway some useful methodologies to apply for solving NLP problems in the future. We will be using Python to showcase all our examples.

    • Liked Suvro Shankar Ghosh
      keyboard_arrow_down

      Suvro Shankar Ghosh - Learning Entity embedding’s form Knowledge Graph

      45 Mins
      Case Study
      Intermediate
      • Over a period of time, a lot of Knowledge bases have evolved. A knowledge base is a structured way of storing information, typically in the following form Subject, Predicate, Object
      • Such Knowledge bases are an important resource for question answering and other tasks. But they often suffer from their incompleteness to resemble all the data in the world, and thereby lack of ability to reason over their discrete Entities and their unknown relationships. Here we can introduce an expressive neural tensor network that is suitable for reasoning over known relationships between two entities.
      • With such a model in place, we can ask questions, the model will try to predict the missing data links within the trained model and answer the questions, related to finding similar entities, reasoning over them and predicting various relationship types between two entities, not connected in the Knowledge Graph.
      • Knowledge Graph infoboxes were added to Google's search engine in May 2012

      What is the knowledge graph?

      ▶Knowledge in graph form!

      ▶Captures entities, attributes, and relationships

      More specifically, the “knowledge graph” is a database that collects millions of pieces of data about keywords people frequently search for on the World wide web and the intent behind those keywords, based on the already available content

      ▶In most cases, KGs is based on Semantic Web standards and have been generated by a mixture of automatic extraction from text or structured data, and manual curation work.

      ▶Structured Search & Exploration
      e.g. Google Knowledge Graph, Amazon Product Graph

      ▶Graph Mining & Network Analysis
      e.g. Facebook Entity Graph

      ▶Big Data Integration
      e.g. IBM Watson

      ▶Diffbot, GraphIQ, Maana, ParseHub, Reactor Labs, SpazioDati

    • Liked Dr. Atul Singh
      keyboard_arrow_down

      Dr. Atul Singh - Endow the gift of eloquence to your NLP applications using pre-trained word embeddings

      45 Mins
      Talk
      Beginner

      Word embeddings are the plinth stones of Natural Language Processing (NLP) applications, used to transform human language into vectors that can be understood and processed by machine learning algorithms. Pre-trained word embeddings enable transfer of prior knowledge about the human language into a new application thereby enabling rapid creation of a scalable and efficient NLP applications. Since the emergence of word2vec in 2013, the word embeddings field has seen rapid developments by leaps and bounds with each new successive word embedding outperforming the prior one.

      The goal of this talk is to demonstrate the efficacy of using pre-trained word embedding to create scalable and robust NLP applications, and to explain to the audience the underlying theory of word embeddings that makes it possible. The talk will cover prominent word vector embeddings such as BERT and ELMo from the recent literature.

    • Liked Srivalya  Elluru
      keyboard_arrow_down

      Srivalya Elluru - A Robust Approach to Open Vocabulary Image Retrieval with Deep Convolutional Neural Networks and Transfer Learning

      20 Mins
      Talk
      Beginner

      Enabling computer systems to respond to conversational human language is a challenging problem with wide-ranging applications in the field of robotics and human computer interaction. Specifically, in image searches, humans tend to describe objects in fine-grained detail like color or company, for which conventional retrieval algorithms have shown poor performance. In this paper, a novel approach for open vocabulary image retrieval, capable of selecting the correct candidate image from among a set of distractions given a query in natural language form, is presented. Our methodology focuses on generating a robust set of image-text projections capable of accurately representing any image, with an objective of achieving high recall. To this end, an ensemble of classifiers is trained on ImageNet for representing high-resolution objects, Cifar 100 for smaller resolution images of objects and Caltech 256 for challenging views of everyday objects, for generating category-based projections. In addition to category based projections, we also make use of an image captioning model trained on MS COCO and Google Image Search (GISS) to capture additional semantic/latent information about the candidate images. To facilitate image retrieval, the natural language query and projection results are converted to a common vector representation using word embeddings, with which query-image similarity is computed. The proposed model when benchmarked on the RefCoco dataset, achieved an accuracy of 68.8%, while retrieving semantically meaningful candidate images.

    • Liked Aakash Goel
      keyboard_arrow_down

      Aakash Goel / Ankit Kalra - Detect Workout Pose for Virtual Gym using CNN

      45 Mins
      Talk
      Beginner

      Approximately 80% of the people across globe do not use gym, yet they pay $30 to $125/month.Attrition from gym is linked with discouraging results and lack of engagement. Traditional gym users don’t know proper exercise regimen and users prefer workout regimens that are fun, customizable and social.

      To combat above problem, we came up with idea to provide customized fitness solutions using Artificial Intelligence. In this talk, we showcase how we can leverage Deep Learning based Architectures like CNN to develop "Workout pose detection" that tracks user movement and classify it corresponding to specific trained workout and will determine whether the performed pose is correct or wrong.


      Keywords: CNN, Deep Learning, Image classification Model, Computer Vision.

    • Liked Karthik Bharadwaj T
      keyboard_arrow_down

      Karthik Bharadwaj T - Failure Detection using Driver Behaviour from Telematics

      45 Mins
      Case Study
      Beginner

      Telematics data have a potential to unlock revenue of 1.5 trillion. Unfortunately this data has not been tapped by many users.

      In this case study Karthik Thirumalai would discuss how we can use telematics data to identify driver behaviour and do preventive maintenance in automobile.

    • Liked Indranil Chandra
      keyboard_arrow_down

      Indranil Chandra - Data Science Project Governance Framework

      Indranil Chandra
      Indranil Chandra
      Assistant Manager
      CITI
      schedule 1 year ago
      Sold Out!
      45 Mins
      Talk
      Executive

      Data Science Project Governance Framework is a framework that can be followed by any new Data Science business or team. It will help in formulating strategies around how to leverage Data Science as a business, how to architect Data Science based solutions and team formation strategy, ROI calculation approaches, typical Data Science project lifecycle components, commonly available Deep Learning toolsets and frameworks and best practices used by Data Scientists. I will use an actual use case while covering each of these aspects of building the team and refer to examples from my own experiences of setting up Data Science teams in a corporate/MNC setup.

      A lot of research is happening all around the world in various domains to leverage Deep Learning, Machine Learning and Data Science based solutions to solve problems that would otherwise be impossible to solve using simple rule based systems. All the major players in the market and businesses are also getting started and setting up new Data Science teams to take advantages of modern State-of-the-Art ML/DL techniques. Even though most of the Data Scientists are great at knowledge of mathematical modeling techniques, they lack the business acumen and management knowledge to drive Data Science based solutions in a corporate/MNC setup. On the other hand, management executives in most of the corporates/MNCs do not have first hand knowledge of setting up new Data Science team and approach to solving business problems using Data Science. This session will help bridge the above mentioned gap and help Executives and Data Scientists provide a common ground around which they can easily build any Data Science business/team from ground zero.

      GitHub Link -> https://github.com/indranildchandra/DataScience-Project-Governance-Framework

    • Liked Suvro Shankar Ghosh
      keyboard_arrow_down

      Suvro Shankar Ghosh - Real-Time Advertising Based On Web Browsing In Telecom Domain

      45 Mins
      Case Study
      Intermediate

      The following section describes Telco Domain Real-time advertising based on browsing use case in terms of :

      • Potential business benefits to earn.
      • Functional use case architecture depicted.
      • Data sources (attributes required).
      • Analytic to be performed,
      • Output to be provided and target systems to be integrated with.

      This use case is part of the monetization category. The goal of the use case is to provide a kind of DataMart to either Telecom business parties or external third parties sufficient, relevant and customized information to produce real-time advertising to Telecom end users. The customer targets are all Telecom network end-users.

      The customization information to be delivered to advertise are based on several dimensions:

      • Customer characteristics: demographic, telco profile.
      • Customer usage: Telco products or any other interests.
      • Customer time/space identification: location, zoning areas, usage time windows.

      Use case requirements are detailed in the description below as “ Targeting method”

      1. Search Engine Targeting:

      The telco will use users web history to track what users are looking at and to gather information about them. When a user goes onto a website, their web browsing history will show information of the user, what he or she searched, where they are from, found by the ip address, and then build a profile around them, allowing Telco to easily target ads to the user more specifically.

      1. Content and Contextual Targeting:

      This is when advertisers can put ads in a specific place, based on the relative content present. This targeting method can be used across different mediums, for example in an article online, about purchasing homes would have an advert associated with this context, like an insurance ad. This is achieved through an ad matching system which analyses the contents on a page or finds keywords and presents a relevant advert, sometimes through pop-ups.

      1. Technical Targeting

      This form of targeting is associated with the user’s own software or hardware status. The advertisement is altered depending on the user’s available network bandwidth, for example if a user is on their mobile phone that has a limited connection, the ad delivery system will display a version of the ad that is smaller for a faster data transfer rate.

      1. Time Targeting:

      This type of targeting is centered around time and focuses on the idea of fitting in around people’s everyday lifestyles. For example, scheduling specific ads at a timeframe from 5-7pm, when the

      1. Sociodemographic Targeting:

      This form of targeting focuses on the characteristics of consumers, including their age, gender, and nationality. The idea is to target users specifically, using this data about them collected, for example, targeting a male in the age bracket of 18-24. The telco will use this form of targeting by showing advertisements relevant to the user’s individual demographic profile. this can show up in forms of banner ads, or commercial videos.

      1. Geographical and Location-Based Targeting:

      This type of advertising involves targeting different users based on their geographic location. IP addresses can signal the location of a user and can usually transfer the location through different cells.

      1. Behavioral Targeting:

      This form of targeted advertising is centered around the activity/actions of users and is more easily achieved on web pages. Information from browsing websites can be collected, which finds patterns in users search history.

      1. Retargeting:

      Is where advertising uses behavioral targeting to produce ads that follow you after you have looked or purchased are a particular item. Retargeting is where advertisers use this information to ‘follow you’ and try and grab your attention so you do not forget.

      1. Opinions, attitudes, interests, and hobbies:

      Psychographic segmentation also includes opinions on gender and politics, sporting and recreational activities, views on the environment and arts and cultural issues.

    • 45 Mins
      Demonstration
      Intermediate

      Recent advancements in AI are proving beneficial in development of applications in various spheres of healthcare sector such as microbiological analysis, discovery of drug, disease diagnosis, Genomics, medical imaging and bioinformatics for translating a large-scale data into improved human healthcare. Automation in healthcare using machine learning/deep learning assists physicians to make faster, cheaper and more accurate diagnoses.

      Due to increasing availability of electronic healthcare data (structured as well as unstructured data) and rapid progress of analytics techniques, a lot of research is being carried out in this area. Popular AI techniques include machine learning/deep learning for structured data and natural language processing for unstructured data. Guided by relevant clinical questions, powerful deep learning techniques can unlock clinically relevant information hidden in the massive amount of data, which in turn can assist clinical decision making.

      We have successfully developed three deep learning based healthcare applications using TensorFlow and are currently working on three more healthcare related projects. In this demonstration session, first we shall briefly discuss the significance of deep learning for healthcare solutions. Next, we will demonstrate two deep learning based healthcare applications developed by us. The discussion of each application will include precise problem statement, proposed solution, data collected & used, experimental analysis and challenges encountered & overcame to achieve this success. Finally, we will briefly discuss the other applications on which we are currently working and the future scope of research in this area.

    • Liked Pankaj Kumar
      keyboard_arrow_down

      Pankaj Kumar / Abinash Panda / Usha Rengaraju - Quantitative Finance :Global macro trading strategy using Probabilistic Graphical Models

      90 Mins
      Workshop
      Advanced

      { This is a handson workshop in pgmpy package. The creator of pgmpy package Abinash Panda will do the code demonstration }

      Crude oil plays an important role in the macroeconomic stability and it heavily influences the performance of the global financial markets. Unexpected fluctuations in the real price of crude oil are detrimental to the welfare of both oil-importing and oil-exporting economies.Global macro hedge-funds view forecast the price of oil as one of the key variables in generating macroeconomic projections and it also plays an important role for policy makers in predicting recessions.

      Probabilistic Graphical Models can help in improving the accuracy of existing quantitative models for crude oil price prediction as it takes in to account many different macroeconomic and geopolitical variables .

      Hidden Markov Models are used to detect underlying regimes of the time-series data by discretising the continuous time-series data. In this workshop we use Baum-Welch algorithm for learning the HMMs, and Viterbi Algorithm to find the sequence of hidden states (i.e. the regimes) given the observed states (i.e. monthly differences) of the time-series.

      Belief Networks are used to analyse the probability of a regime in the Crude Oil given the evidence as a set of different regimes in the macroeconomic factors . Greedy Hill Climbing algorithm is used to learn the Belief Network, and the parameters are then learned using Bayesian Estimation using a K2 prior. Inference is then performed on the Belief Networks to obtain a forecast of the crude oil markets, and the forecast is tested on real data.

    • Liked Shalini Sinha
      keyboard_arrow_down

      Shalini Sinha / Ashok J / Yogesh Padmanaban - Hybrid Classification Model with Topic Modelling and LSTM Text Classifier to identify key drivers behind Incident Volume

      45 Mins
      Case Study
      Intermediate

      Incident volume reduction is one of the top priorities for any large-scale service organization along with timely resolution of incidents within the specified SLA parameters. AI and Machine learning solutions can help IT service desk manage the Incident influx as well as resolution cost by

      • Identifying major topics from incident description and planning resource allocation and skill-sets accordingly
      • Producing knowledge articles and resolution summary of similar incidents raised earlier
      • Analyzing Root Causes of incidents and introducing processes and automation framework to predict and resolve them proactively

      We will look at different approaches to combine standard document clustering algorithms such as Latent Dirichlet Allocation (LDA) and K-mean clustering on doc2vec along-with Text classification to produce easily interpret-able document clusters with semantically coherent/ text representation that helped IT operations of a large FMCG client identify key drivers/topics contributing towards incident volume and take necessary action on it.