Overcoming data limitations in real-world data science initiatives
“Is this the only data you have?” An expression of surprise not uncommonly encountered when evaluating a new opportunity to apply data science. Suitability of available data is a key factor in the abandonment of many otherwise well considered data science initiatives.
"Could the folks who were responsible for the design of the business process and the supporting IT applications not been more forward thinking and captured the more of the relevant data? To make it even worse, for the data that is being captured, the manual entries are not even consistent between the operators."
Well, don't throw up you hands just yet. If you are a relatively newly minted data scientist, you are probably used to data being served to you on a platter! (Kaggle, UCI, Imagenet ..add your favourite platter to the list)
Generally 3 types of challenges are present..
- At one extreme.. They are building a new app. They want to incorporate a recommendation engine. The app is not released ! There is no data, zero, nada, zilch..
- At the other extreme.. I want us to build a up-sell engine. They have a massive database with a huge number of tables. If I just look for revenue related fields, I see 10 different customer revenue fields! Which is the right one to use!!
- The client wants me to build a promotion targeting engine. But they keep changing their offers every month! By the time I have enough data for a promotion, they are ready to kill that promotion move on to some other promotion..
- They want to build a decision support engine. But the available attributes capture only 20-30% of what goes into making the decision. How it this going to be of any help?
Sounds familiar? You are not alone. The speaker using case studies from his own experience will guide the audience on how they can make the best of the situation and deliver a value adding data science solution, or how to decide whether it is more prudent to not pursue it after all.
Outline/Structure of the Talk
1. The data suitability challenge
2. Why it is the way it is
3. What can we do about it
4. Case studies
5. Lessons learnt and what works.
Learning Outcome
Audience will be able to apply a wider perspective that is needed when investigating opportunities/use cases where suitability of data may seem a show-stopper.
Target Audience
Managers and Team Leads running data science projects,Sr. Executives
Prerequisites for Attendees
Experience with atleast one real-world data science project is helpful to appreciate this talk.