Big Data to Big Intelligence - Generation of Actionable Insights from Open Source Data
As a data scientist i have been lucky enough to be develop to be a part of highly critical and cutting edge solutions for pristine organizations like Intel , Indian Army etc. While each of them was an amazing experience in its own right , the challenges i faced and the knowledge i gained from making an Open Source Intelligence gathering and Analytics/Prediction tool for the Indian Army are unmatched.
An OSINT tool can have some powerful capabalities like :-
- Predict and estimate the location of an Twitter/Facebook user (who has disabled his location obviously!) through various metrics.
- Predict occurrence of certain events (eg. Riot's) based of information gathered from various Open Sources.
- Identify and Predict accounts of people who may be potential suspects (Security use Case) or potential Influencers (Commercial use case).
- Contextual analysis of words to derive relevant insights.
I worked on this project for over an year , and since then have been using my experience of OSINT in other sectors like Healthcare/Pharma . While working for other sectors i was appalled by the way OSINT information was being under-utilized .
I would like to share my experience of working with some great mean on a very critical project for out military by discussing the problems i faced and how they can be overcome , at the same time i hope to give you a guideline on how you can efficiently utilize the power of OSINT information in your respective field be in consumer goods , healthcare or energy. I hope that the attendees will pick up some valuable insights from my experience which will help them in projects ranging from NLP to Time-Series analytics.
Outline/Structure of the Case Study
This is an overview of how we will be walking through the session , for the first part i will be explaining what exactly is OSINT and how this project was obtained and how its scope was defined . Next we will go over various modules of the project and the challenges they presented and end with some of my key takeaways :-
Intro: How this project came to be and overview of OSINT/Social Media Analytics
- How i was able to acquire this amazing project.
- Small Demo / Overview of the tool
- What i thought of OSINT/ Social media analytics before starting the project
- OSINT - The huge free data store which you never use
- Large Data vs Relevant Data
- How good are the social media tools that are currently availabe in the market
Module 1:- Location Estimation
- Why it is very difficult and main challenges faced.
- Social Media Data + OSINT = Your Location
- People Lie on the Internet [How i dealt with faulty data]
- Using Networks to refine location accuracy.
- Machine Learning techniques used and discussed :- SVM (Radial Basis), SGD Classifier , Ward Clustering
Module 2:- Contextual Analysis
- Things are rarely as they seem [Problem Statement Overview]
- The Regional Barrier & Why techniques like transfer learning dont work too well.
- Why this is one of the most important and challenging modules.
- Different Languages , Different Problems , Same Solution.
- Machine Learning techniques used and discussed :- LDA(Latent Dirichlet Allocation) + some custom methods
Module 3 :- Identifying Potential Suspects
- Why i found this impossible at first.
- Your tweets betray you [People reveal more than they think , how i could exploit that]
- How i was able to implement machine learning on such a hapazard data set.
- Machine Learning techniques used and discussed :- light GBM
Closing Remarks :-
- Good Data vs Bad Data and how to learnt to tell the difference.
- How what i learnt here is applicable in other feilds.
- How you can use the power of Machine Learning and OSINT for a variety of tasks.
I hope that the key takeaways would be the following :-
- Develop an understanding over the power of OSINT data.
- Learn the common challenges faced while dealing with it.
- Understand how to use OSINT in your own projects.
AI Enthusiasts,Data Scientists,Managers,People working in NLP, People interested in exploring Open Source data
Prerequisites for Attendees
Basic knowledge of data wrangling techniques and machine learning concepts might help you absorb more from the talk :)