In this digital era when the attention span of customers is reducing drastically, for a marketer it is imperative to understand the following 4 aspects more popularly known as "The 4R's of Marketing" if they want to increase our ROI:

- Right Person

- Right Time

- Right Content

- Right Channel

Only when we design and send our campaigns in such a way, that it reaches the right customers at the right time through the right channel telling them about stuffs they like or are interested in ... can we expect higher conversions with lower investment. This is a problem that most of the organizations need to solve for to stay relevant in this age of high market competition.

Among all these we will put special focus on appropriate content generation based on targeted user base using Markov based models and do a quick hack session.

The time breakup can be:

5 mins : Difference between Martech and traditional marketing. The 4R's of marketing and why solving for them is crucial

5 mins : What is Smart Segments and how to solve for it, with a short demo

5 mins : How marketers use output from Smart Segments to execute targeted campaigns

5 mins: What is STO, how it can be solved and what is the performance uplift seen by clients when they use it

5 mins: What is Channel Optimization, how it can be solved and what is the performance uplift seen by clients when they use it

5 mins: Why sending the right message to customers is crucial, and introduction to appropriate content creation

15 mins: Covering different Text generation nuances, and a live demo with walk through of a toy code implementation


Outline/Structure of the Demonstration

  • What is Mar-Tech and how is it different from traditional marketing
  • Understanding 4R's of Marketing and why solving for them is crucial
  • Preferred Channel - How it works and performance
  • Send Time Optimization - How it works and performance
  • Smart Segmentation - Pre built segments and demo
  • Content Generation - Text Generation nuances, Short Demo/Hack Session using a Markov based model

Learning Outcome

  • Why to solve for the 4R's of Marketing
  • How to solve for the 4R's of Marketing
  • Text Generation
  • Markov Chain

Target Audience

Marketers, Analysts, Decision Makers, NLP Experts



schedule Submitted 2 years ago

Public Feedback

    • 45 Mins

      Since we originally proposed the need for a first-class language, compiler and ecosystem for machine learning (ML) - a view that is increasingly shared by many, there have been plenty of interesting developments in the field. Not only have the tradeoffs in existing systems, such as TensorFlow and PyTorch, not been resolved, but they are clearer than ever now that both frameworks contain distinct "static graph" and "eager execution" interfaces. Meanwhile, the idea of ML models fundamentally being differentiable algorithms – often called differentiable programming – has caught on.

      Where current frameworks fall short, several exciting new projects have sprung up that dispense with graphs entirely, to bring differentiable programming to the mainstream. Myia, by the Theano team, differentiates and compiles a subset of Python to high-performance GPU code. Swift for TensorFlow extends Swift so that compatible functions can be compiled to TensorFlow graphs. And finally, the Flux ecosystem is extending Julia’s compiler with a number of ML-focused tools, including first-class gradients, just-in-time CUDA kernel compilation, automatic batching and support for new hardware such as TPUs.

      This talk will demonstrate how Julia is increasingly becoming a natural language for machine learning, the kind of libraries and applications the Julia community is building, the contributions from India (there are many!), and our plans going forward.

    • Subhasish Misra

      Subhasish Misra - Causal data science: Answering the crucial ‘why’ in your analysis.

      45 Mins

      Causal questions are ubiquitous in data science. For e.g. questions such as, did changing a feature in a website lead to more traffic or if digital ad exposure led to incremental purchase are deeply rooted in causality.

      Randomized tests are considered to be the gold standard when it comes to getting to causal effects. However, experiments in many cases are unfeasible or unethical. In such cases one has to rely on observational (non-experimental) data to derive causal insights. The crucial difference between randomized experiments and observational data is that in the former, test subjects (e.g. customers) are randomly assigned a treatment (e.g. digital advertisement exposure). This helps curb the possibility that user response (e.g. clicking on a link in the ad and purchasing the product) across the two groups of treated and non-treated subjects is different owing to pre-existing differences in user characteristic (e.g. demographics, geo-location etc.). In essence, we can then attribute divergences observed post-treatment in key outcomes (e.g. purchase rate), as the causal impact of the treatment.

      This treatment assignment mechanism that makes causal attribution possible via randomization is absent though when using observational data. Thankfully, there are scientific (statistical and beyond) techniques available to ensure that we are able to circumvent this shortcoming and get to causal reads.

      The aim of this talk, will be to offer a practical overview of the above aspects of causal inference -which in turn as a discipline lies at the fascinating confluence of statistics, philosophy, computer science, psychology, economics, and medicine, among others. Topics include:

      • The fundamental tenets of causality and measuring causal effects.
      • Challenges involved in measuring causal effects in real world situations.
      • Distinguishing between randomized and observational approaches to measuring the same.
      • Provide an introduction to measuring causal effects using observational data using matching and its extension of propensity score based matching with a focus on the a) the intuition and statistics behind it b) Tips from the trenches, basis the speakers experience in these techniques and c) Practical limitations of such approaches
      • Walk through an example of how matching was applied to get to causal insights regarding effectiveness of a digital product for a major retailer.
      • Finally conclude with why understanding having a nuanced understanding of causality is all the more important in the big data era we are into.
    • Juan Manuel Contreras

      Juan Manuel Contreras - How to lead data science teams: The 3 D's of data science leadership

      Juan Manuel Contreras
      Juan Manuel Contreras
      Data Science Manager
      schedule 2 years ago
      Sold Out!
      45 Mins

      Despite the increasing number of data scientists who are asked to take on leadership roles as they grow in their careers, there are still few resources on how to lead data science teams successfully.

      In this talk, I will argue that an effective data science leader has to wear three hats: Diplomat (understand the organization and their team and liaise between them), Diagnostician (figure out how what organizational needs can be met by their team and how), and Developer (grow their and their team's skills as well as the organization's understanding of data science to maximize the value their team can drive).

      Throughout, I draw on my experience as a data science leader both at a political party (the Democratic Party of the United States of America) and at a fintech startup (

      Talk attendees will learn a framework for how to manage data scientists and lead a data science practice. In turn, attendees will be better prepared to tackle new or existing roles as data science leaders or be better able to identify promising candidates for these roles.

    • Indranil Basu

      Indranil Basu - Machine Generation of Recommended Image from Human Speech

      45 Mins


      Synthesizing audio for specific domains has many practical applications in creative sound design for music and film. But the application is not restricted to entertainment industry. We propose an architecture that will convert audio (human voice) to the voice owner’s preferred image – for the time being we restrict the intended images to two domains – Object Design and Human body. Many times, human beings are unable to describe a design (may be power-point presentation or interior decoration of a house) or a known person by verbally described attributes as they are able to visualise the same design or the person. But the other person, the listener may be unable to interpret the object or human descriptions from the speaker’s verbal descriptions as he/she is not visualising the same. Complete communication thus needs much of a trial and error and overall hazardous and time consuming. Examples of such situations are 1) While making presentation, an executive or manager can visualise something and an express to his/her employee to make the same. But, making the best slides from manger’s description may not be proper. Another relevant example is that a house owner or office owner wants his/her premises to have certain design which he/she can visualise and express to the concerned vendor. But the vendor may not be able to produce the same. Also, trial and error in this case is highly expensive. Having an automated Image, recommended to him/her can address this problem. 2) Verbal description of a terrorist or criminal suspect (facial description and/or attribute) may not be always available to all the security people every time, in Airports or Railway Stations or sensitive areas. Presence of a software system having Machine Generated Image with Ranked Recommendation for such suspect can immediately point to one or very few people in a crowded Airport or even Railway Station or any such sensitive place. Security agencies can then frisk only those people or match their attributes with existing database. This can avoid hazardous manual checking of every people in the same process and can help the security agencies to do adequate checking for those recommended individuals.

      We can use a Sequential Architecture consisting of simple NLP and more complex Deep Learning algorithms primarily based on Generative Adversarial Network (GAN) and Neural Personalised Ranking (NPR) to help the object designers and security personnel for serving their specific purposes.

      The idea to combat the problem:

      I propose a combination of Deep Learning and Recommender System approach to tackle this problem. Architecture of the Solution model consists of 4 major Components – 1) Speech to Text

      2) Text Classification into Person or Design; 3) Text to Image Formation; 4) Recommender System

      We are trying to address these four steps in consecutive applications of effective Machine Learning and Deep Learning Algorithms. Deep Learning community has already been able to make significant progress in terms of Text to Image generation and also in Ranking based Recommender System

      Brief Details about the four major pillars of this problem:

      Deep Learning based Speech Recognition – Primary technique for Speech to text could be Baidu’s DeepSpeech for which a Tensorflow implementation is readily available. Also, Google Cloud Speech-to-Text enables the develop to convert Voice to text. Voice of the user needs to be converted in .wav file. Our steps for Deep-Speech-2 are like this – Fixing GPU memory, Adding Batch normalization to RNN, implement row Convolution layer and generate text.

      Nowadays, we have quite a few free Speech to Text software, e.g. Google Docs Voice typing, windows Speech Recognition, Speech-notes etc.

      Text Classification of Content – This is needed to classify the converted text into two classes – a) Design Description or b) Human Attribute Description because these two applications and therefore image types are different. This may be Statistically easier part, but its importance is immense. A Dictionary of words related to Designs and Personal Attributes can be built using online available resources. Then, a supervised algorithm using tf-idf and Latent Semantic Analysis (LSA) should be able to classify the text into two classes – Object and Person. These are very much traditional and proven techniques in many NLP research

      Text to Image Formation – This is our main component for this proposal. Today, one of the most challenging problems in the world of Computer Vision is synthesizing high-quality images from text descriptions. In recent years, GANs have been found to generate good results. Samples generated by existing text-to-image approaches can roughly reflect the meaning of the given descriptions, but they fail to contain necessary details and vivid object parts. There have been a few approaches to address this problem, all using GAN. One of those is given as Stacked Generative Adversarial Networks (StackGAN). Heart of such approaches is Conditional GAN which is an extension of GAN where both generator and discriminator receive additional conditioning variables c, yielding G(z, c) and D(x, c). This formulation allows G to generate images conditioned on variables c.

      In our case, we train deep convolutional generative adversarial network (DC-GAN) conditioned on text features. These text features are encoded by a hybrid character-level convolutional-recurrent neural network. Overall, DC-GAN uses text embeddings where the context of a word is of prime importance. Class label determined in the earlier step will be of help in this case. This will simply help DC-GAN to generate more relevant images than irrelevant ones. Details will be discussed during the talk

      The most straightforward way to train a conditional GAN is to view (text, image) pairs as joint observations and train the discriminator to judge pairs as real or fake. The discriminator has no explicit notion of whether real training images match the text embedding context. To account for this, in GAN-CLS, in addition to the real/fake inputs to the discriminator during training, a third type of input consisting of real images with mismatched text is added, which the discriminator must learn to score as fake. By learning to optimize image/text matching in addition to the image realism, the discriminator can provide an additional signal to the generator. (details are in talk)

      Image Recommender System – In the last step, we propose personalised image recommendation for the user from the set of images generated by GAN-CLS architecture. Image Recommendation brings down the number of choice of images to a top N (N=3, 5, 10 ideally) with a rank given to each of those and therefore user finds it easier to choose. In this case, we propose Neural Personalized Ranking (NPR) – a personalized pairwise ranking model over implicit feedback datasets – that is inspired by Bayesian Personalized Ranking (BPR) and recent advances in neural networks. We like to mention that, now NPR is improved to contextual enhanced NPR. This enhanced Model depends on implicit feedbacks from the users, its contexts and incorporates the idea of generalized matrix factorization. Contextual NPR significantly outperforms its competitors

      In the presentation, we shall describe the complete sequence in detail

    • Gopinath Ramakrishnan

      Gopinath Ramakrishnan - Five Key Pitfalls in Data Analysis

      45 Mins

      Data Science is all about deriving actionable insights through data analysis.
      There is no denying the fact that such insights have a tremendous business value.
      But what if -
      Some crucial data has been left out of consideration ?
      Wrong inferences have been drawn during analysis ?
      Results have been graphically misrepresented?
      Imagine the adverse impact on your business if you take wrong decisions based on such cases.

      In this talk we will discuss the following 5 key pitfalls to lookout for in the data analysis results before you take any decisions based on them
      1. Selection Bias
      2. Survivor Bias
      3. Confounding Effects
      4. Spurious Correlations
      5. Misleading Visualizations

      These are some of the most common points that are overlooked by the beginners in Data Science.

      The talk will draw upon many examples from real life situations to illustrate these points.

    • Pushker Ravindra

      Pushker Ravindra - Data Science Best Practices for R and Python

      20 Mins

      How many times did you feel that you were not able to understand someone else’s code or sometimes not even your own? It’s mostly because of bad/no documentation and not following the best practices. Here I will be demonstrating some of the best practices in Data Science, for R and Python, the two most important programming languages in the world for Data Science, which would help in building sustainable data products.

      - Integrated Development Environment (RStudio, PyCharm)

      - Coding best practices (Google’s R Style Guide and Hadley’s Style Guide, PEP 8)

      - Linter (lintR, Pylint)

      - Documentation – Code (Roxygen2, reStructuredText), README/Instruction Manual (RMarkdown, Jupyter Notebook)

      - Unit testing (testthat, unittest)

      - Packaging

      - Version control (Git)

      These best practices reduce technical debt in long term significantly, foster more collaboration and promote building of more sustainable data products in any organization.

    • Siboli Mukherjee

      Siboli Mukherjee - Real time Anomaly Detection in Network KPI using Time Series

      Siboli Mukherjee
      Siboli Mukherjee
      Data Analyst
      Vodafone Idea Ltd
      schedule 2 years ago
      Sold Out!
      20 Mins
      Experience Report


      How to accurately detect Key Performance Indicator (KPI) anomalies is a critical issue in cellular network management. In this talk I shall introduce CNR(Cellular Network Regression) a unified performance anomaly detection framework for KPI time-series data. CNR realizes simple statistical modelling and machine-learning-based regression for anomaly detection; in particular, it specifically takes into account seasonality and trend components as well as supports automated prediction model retraining based on prior detection results. I demonstrate here how CNR detects two types of anomalies of practical interest, namely sudden drops and correlation changes, based on a large-scale real-world KPI dataset collected from a metropolitan LTE network. I explore various prediction algorithms and feature selection strategies, and provide insights into how regression analysis can make automated and accurate KPI anomaly detection viable.

      Index Terms—anomaly detection, NPAR (Network Performance Analysis)


      The continuing advances of cellular network technologies make high-speed mobile Internet access a norm. However, cellular networks are large and complex by nature, and hence production cellular networks often suffer from performance degradations or failures due to various reasons, such as back- ground interference, power outages, malfunctions of network elements, and cable disconnections. It is thus critical for network administrators to detect and respond to performance anomalies of cellular networks in real time, so as to maintain network dependability and improve subscriber service quality. To pinpoint performance issues in cellular networks, a common practice adopted by network administrators is to monitor a diverse set of Key Performance Indicators (KPIs), which provide time-series data measurements that quantify specific performance aspects of network elements and resource usage. The main task of network administrators is to identify any KPI anomalies, which refer to unexpected patterns that occur at a single time instant or over a prolonged time period.

      Today’s network diagnosis still mostly relies on domain experts to manually configure anomaly detection rules such a practice is error-prone, labour intensive, and inflexible. Recent studies propose to use (supervised) machine learning for anomaly detection in cellular networks . ellular networks, a common practice adopted by network administrators is to monitor a diverse set of Key Performance Indicators (KPIs), which provide time-series data measurements that quantify specific performance aspects of network elements and resource usage. The main task of network administrators is to identify any KPI anomalies, which refer to unexpected patterns that occur at a single time instant or over a prolonged time period.

      Today’s network diagnosis still mostly relies on domain experts to manually configure anomaly detection rules such a practice is error-prone, labour intensive, and inflexible. Recent studies propose to use (supervised) machine learning for anomaly detection in cellular networks .