Big Data to Big Intelligence - Using AI to Generate Actionable Insights from Open Source Data
As a data scientist i have been lucky enough to be a part of highly critical and cutting edge solutions for pristine organizations like Intel , Indian Army etc. While each of them was an amazing experience in its own right , the challenges i faced and the knowledge i gained from making an Open Source Intelligence gathering and Analytics/Prediction tool for the Indian Army are unmatched . This experience showed me how powerful Open source data can be if it is used correctly .
An OSINT tool can have some powerful capabalities like :-
- Predict and estimate the location of an Twitter/Facebook user (who has disabled his location obviously!) through various metrics.
- Predict occurrence of certain events (eg. Riot's) based of information gathered from various Open Sources.
- Identify and Predict accounts of people who may be potential suspects (Security use Case) or potential Influencers (Commercial use case).
- Contextual analysis of words to derive relevant insights.
Open Source Data is however very challenging to work with for a vast array of reasons . This is the issue i aim to tackle with this talk , I will be going over 3 exciting projects which have been made using open source data through which i shall demonstrate various techniques to find / modify / model / and use machine learning techniques on the data. While going over the projects i shall also try to draw parallels so as to how you can use similar techniques in your own endeavours.
Outline/Structure of the Case Study
This is an overview of how we will be walking through the session , for the first part i will be explaining what exactly is OSINT and how this data can be leveraged . Once we have cleared up how to gather and analyze Open Source data , we will get to the crucial part , Deriving Intelligence . Intelligence from Analysis is what enables you as an individual/Organization to act upon the analysis that you get . I will be going over 3 unique projects where in we use Open Source data and Derive Intelligence from it . With each project we will discuss the main challenges that were faced [ And what you might face ] and how to overcome them.
Intro: Overview of OSINT/Social Media Analytics
- What people usuwally of OSINT/ Social media analytics
- OSINT - The huge free data store which you never use
- Large Data vs Relevant Data
- How good are the social media tools that are currently availabe in the market
Module 1:- Location Estimation
- Why it is very difficult and main challenges faced.
- Social Media Data + OSINT = Your Location
- People Lie on the Internet [How i dealt with faulty data]
- Using Networks to refine location accuracy.
- Machine Learning techniques used and discussed :- SVM (Radial Basis), SGD Classifier , Ward Clustering
Module 2:- Contextual Analysis
- Things are rarely as they seem [Problem Statement Overview]
- The Regional Barrier & Why techniques like transfer learning dont work too well.
- Why this is one of the most important and challenging modules.
- Different Languages , Different Problems , Same Solution.
- Machine Learning techniques used and discussed :- LDA(Latent Dirichlet Allocation) + some custom methods
Module 3 :- Identifying Potential Suspects
- Why i found this impossible at first.
- Your tweets betray you [People reveal more than they think , how i could exploit that]
- How i was able to implement machine learning on such a hapazard data set.
- Machine Learning techniques used and discussed :- light GBM
Closing Remarks :-
- Good Data vs Bad Data and how to learnt to tell the difference.
- How what i learnt here is applicable in other feilds.
- How you can use the power of Machine Learning and OSINT for a variety of tasks.
I hope that the key takeaways would be the following :-
- Develop an understanding over the power of OSINT data.
- Learn the common challenges faced while dealing with it.
- Understand how to use OSINT in your own projects.
AI Enthusiasts,Data Scientists,Managers,People working in NLP, People interested in exploring Open Source data
Prerequisites for Attendees
Basic knowledge of data wrangling techniques and machine learning concepts might help you absorb more from the talk :)