Big Data to Big Intelligence - Using AI to Generate Actionable Insights from Open Source Data

schedule Aug 8th 12:00 - 12:45 PM place Jupiter people 189 Interested

As a data scientist i have been lucky enough to be a part of highly critical and cutting edge solutions for pristine organizations like Intel , Indian Army etc. While each of them was an amazing experience in its own right , the challenges i faced and the knowledge i gained from making an Open Source Intelligence gathering and Analytics/Prediction tool for the Indian Army are unmatched . This experience showed me how powerful Open source data can be if it is used correctly .

An OSINT tool can have some powerful capabalities like :-

  • Predict and estimate the location of an Twitter/Facebook user (who has disabled his location obviously!) through various metrics.
  • Predict occurrence of certain events (eg. Riot's) based of information gathered from various Open Sources.
  • Identify and Predict accounts of people who may be potential suspects (Security use Case) or potential Influencers (Commercial use case).
  • Contextual analysis of words to derive relevant insights.

Open Source Data is however very challenging to work with for a vast array of reasons . This is the issue i aim to tackle with this talk , I will be going over 3 exciting projects which have been made using open source data through which i shall demonstrate various techniques to find / modify / model / and use machine learning techniques on the data. While going over the projects i shall also try to draw parallels so as to how you can use similar techniques in your own endeavours.

 
 

Outline/Structure of the Case Study

This is an overview of how we will be walking through the session , for the first part i will be explaining what exactly is OSINT and how this data can be leveraged . Once we have cleared up how to gather and analyze Open Source data , we will get to the crucial part , Deriving Intelligence . Intelligence from Analysis is what enables you as an individual/Organization to act upon the analysis that you get . I will be going over 3 unique projects where in we use Open Source data and Derive Intelligence from it . With each project we will discuss the main challenges that were faced [ And what you might face ] and how to overcome them.

Intro: Overview of OSINT/Social Media Analytics

  • What people usually think of OSINT/ Social media analytics
  • OSINT - The huge free data store which you never use
  • Large Data vs Relevant Data
  • How good are the social media tools that are currently available in the market

Module 1:- Location Estimation

  • Why it is very difficult and main challenges faced.
  • Social Media Data + OSINT = Your Location
  • People Lie on the Internet [How i dealt with faulty data]
  • Using Networks to refine location accuracy.
  • Machine Learning techniques used and discussed :- SVM (Radial Basis), SGD Classifier , Ward Clustering

Module 2:- Contextual Analysis

  • Things are rarely as they seem [Problem Statement Overview]
  • The Regional Barrier & Why techniques like transfer learning dont work too well.
  • Why this is one of the most important and challenging modules.
  • Different Languages , Different Problems , Same Solution.
  • Machine Learning techniques used and discussed :- LDA(Latent Dirichlet Allocation) + some custom methods

Module 3 :- Identifying Potential Suspects

  • Why i found this impossible at first.
  • Your tweets betray you [People reveal more than they think , how i could exploit that]
  • How i was able to implement machine learning on such a hapazard data set.
  • Machine Learning techniques used and discussed :- light GBM

Closing Remarks :-

  • Good Data vs Bad Data and how to learnt to tell the difference.
  • How what i learnt here is applicable in other feilds.
  • How you can use the power of Machine Learning and OSINT for a variety of tasks.
  • Questions

Learning Outcome

I hope that the key takeaways would be the following :-

- Develop an understanding over the power of OSINT data.

- Learn the common challenges faced while dealing with it.

- Understand how to use OSINT in your own projects.

Target Audience

AI Enthusiasts,Data Scientists,Managers,People working in NLP, People interested in exploring Open Source data

Prerequisites for Attendees

Basic knowledge of data wrangling techniques and machine learning concepts might help you absorb more from the talk :)

schedule Submitted 4 months ago

Public Feedback

comment Suggest improvements to the Speaker
  • Ashay Tamhane
    By Ashay Tamhane  ~  4 months ago
    reply Reply

    Thanks Yash for your proposal. It is indeed very interesting. Couple of points:

    1. Since the topic is sensitive, could you clarify if you have obtained the required clearance from relevant authorities for this talk?

    2. It will be helpful if you could post a short video on same/similar topic.

    • Yash Deo
      By Yash Deo  ~  3 months ago
      reply Reply

      Hey Akshay ,

      I do not have any video that i can share as of right now , will be giving a talk on a similar topic on April end , will try and have that recorded .

      As of the sensitivity , yes i shall present a clearance from the required authorities if you would require. I have given talks before and as long as im not going too technical about how i implemented it , there is no issue . My focus will be more into how OSINT can be used for various purpose and what challenges it may present , and while doing so , will present parellels from my experience on this project which should aid my point. If you have any other suggestions or tips on how i should go about it , please be free to share . 

      I will however be going over the structure of  the talk with the concerned authorities at a later stage and may make some changes as required.

      Please feel free to reach out to me on my personal/work email if required. :)

  • Usha Rengaraju
    By Usha Rengaraju  ~  3 months ago
    reply Reply

    Dear Yash.

    Thank you  for the proposal submission . Could you please mention the machine learning techniques used in your experience report.

    Thanks and Regards,

    Usha Rengaraju

     

    • Yash Deo
      By Yash Deo  ~  3 months ago
      reply Reply

      Hey Usha ,

      Thanks a lot for your feedback. Have updated each of the modules to include the main ML techniques used in them. [refer the last point in each module]

       

      Thanks

      Yash Deo

       

  • Naresh Jain
    By Naresh Jain  ~  3 months ago
    reply Reply

    Hi Yash,

    Thanks for your proposal. The topic is certainly very interesting. Where can I learn more about your tool? Is it open sourced? If one wants to try it where can I do so?

    • Yash Deo
      By Yash Deo  ~  3 months ago
      reply Reply

      Hey Naresh ,

      Also would like to point out that the main focus of my talk will be to help enable the audiance to effectively utilize Open source data in their projects by arming them with the knowledge of what all problems they may run into , how to solve them and what techniques to use (for data cleaning / ML ) on their data.

      The demo of my tool was intended just to drive the point across and show how powerful open source data can be and how much information we can really extract from it.

      Thanks

      Yash Deo

       

    • Yash Deo
      By Yash Deo  ~  3 months ago
      reply Reply

      Hey Naresh , 

      Unfortunately , there are several reasons(mainly security) because of which i cant keep this tool open-source as of now. I have been working on making some of the modules accessible as seperate files on GitHub , but it will take some time.

      As far as the tool goes , i can provide you a detailed description of some of the modules and their functions , and if needed i can acquire the permission to give a small demo of some of the modules . When giving a demo , I usually have a set story/sequence that i follow to drive my point across and not reveal too much and for this talk i had planned to only demo module 1 and a part of module 2 with a video showing the other capabalities.

      Thanks

      Yash 

  • Deepti Tomar
    By Deepti Tomar  ~  4 months ago
    reply Reply

    Dear Yash,

    Thanks for your submission. Request you to provide a trailer video of your session. You can record a short video on your phone or camera for this purpose. 

    Thanks,

    Deepti

    • Yash Deo
      By Yash Deo  ~  4 months ago
      reply Reply

      Hey Deepti , 

      Sure , Will Do! Just curious what exactly do you want me to include in the trailer . An overview of what all i will talk about or do you want me to pick a single small topic among the one's ill be speaking on and give a brief about that?

       

      Thanks

      Yash Deo

       

      • Deepti Tomar
        By Deepti Tomar  ~  4 months ago
        reply Reply

        Hello Yash,

        You can either talk about the overview or pick a single topic and give a brief about it. 

         

        Thanks,

        Deepti

        • Yash Deo
          By Yash Deo  ~  3 months ago
          reply Reply

          Hey Deepti ,

          Sorry for the delay , had a cold till now ,still do actually :( , But i have added a small video that should give you an idea of how i am going to be going about .

          Let me know if you have any suggestions of things i should add/remove .

          Thanks

          Yash 

  • Kuldeep Jiwani
    By Kuldeep Jiwani  ~  4 months ago
    reply Reply

    Hi Yash,

    You have chosen an interesting topic for your proposal, good to know that you wish to share the importance of OSINT feed to public.

    2 quick questions:

    1. Are you going to focus on some aspect of OSINT feeds like internet (social media) or academic data or public government data ?
      • Or this would be a general talk focusing on all major sources
    2. As for the learning outcome to the audience
      • Clearly they will learn about handling various imp information in OSINT feeds
      • Will you also sharing some of your experiences on some Data mining / ML techniques you applied on top of raw OSINT data?
    • Yash Deo
      By Yash Deo  ~  4 months ago
      reply Reply

      Hey Kuldeep ,

      1. I would be talking about most of the major sources of OSINT feeds but will be focusing more heavily on the Social Media/Internet aspect as i believe is relevant in most use cases.

      2. Of course i will sharing my experience of Data mining/ML over the OSINT data as data mining is over this data is a major road block most people new to this field face.

      Feel free to let me know if i should include something else or take a particular approach which would suit the audience better.

      Thanks!

      • Kuldeep Jiwani
        By Kuldeep Jiwani  ~  4 months ago
        reply Reply

        Thanks for the clarification