Privacy preserving machine learning is an emerging field which is in active research. The most prolific successful machine learning models today are built by aggregating all data together at a central location. While centralised techniques are great , there are plenty of scenarios such as user privacy, legal concerns ,business competitiveness or bandwidth limitations ,wherein data cannot be aggregated together. Federated Learningcan help overcome all these challenges with its decentralised strategy for building machine learning models. Paired with privacy preserving techniques such as encryption and differential privacy, Federated Learning presents a promising new way for advancing machine learning solutions.

In this talk I’ll be bringing the audience upto speed with the progress in Privacy preserving machine learning while discussing platforms for developing models and present a demo on healthcare use cases.

 
 

Outline/Structure of the Demonstration

In this talk I’ll be introducing the audience to the emerging field of Privacy preserving machine learning which sits right at the intersection of decentralised machine learning and privacy preserving techniques. The talk will include overview of current platforms along with use case demonstration.

Timewise outline :

00:00- 01:00 : Intro

01:00 - 03:00 :Need for Privacy Aware Machine Learning

03:00 -07:00 : Federated Learning Intro+FL in healthcare

07:00-12:00: Privacy concerns + Tools & Platforms

12:00 -18:00 :Demo

Learning Outcome

This novel technique presents new opportunities for businesses to work around private and sensitive datasets. This demonstration will help audience understand building blocks of developing privacy preserving machine learning.

Target Audience

Machine Learning Engineers, Data Scientists, Product Managers

Prerequisites for Attendees

Knowledge of current machine learning techniques.

schedule Submitted 8 months ago

Public Feedback

comment Suggest improvements to the Author
  • Ravi Balasubramanian
    By Ravi Balasubramanian  ~  6 months ago
    reply Reply

    Hi Amogh,

    Thanks a lot for your interesting proposal. You have mentioned that a health care use will be covered during this talk with a demo. It would be great if you can shed more light on the below questions:

    1. Can you please provide details of a specific use case? What kind of AI healthcare application this is? 

    2. What are the DS models you are using for this use case? How does your DS pipeline look?
    3. What is the size of your datasets? How does different train_test data splits work in a FL setup?  

    • Amogh Kamat Tarcar
      By Amogh Kamat Tarcar  ~  5 months ago
      reply Reply

      Hi Ravi, 

      I have a couple of use cases from our Persistent Research Lab. For instance there is one use case which analysis medical insurance claims.

      Privacy preserving technology stack is evolving. The use i would be demonstrating is using a  pytorch model in a federated learning setting which utilises pysyft . 

      The size of the dataset is ~50k-100k rows. In a FL set up , the data schema is consistent and Train data resides at respective client node. For evaluation , a test dataset is curated at the co-ordinator node which is representative of the test data from each node, each node separately have their test datasets as well.

  • Ashay Tamhane
    By Ashay Tamhane  ~  7 months ago
    reply Reply

    Hi Amogh, thanks for an interesting proposal. Could you elaborate on the 'Demo' part of the outline? Is this a visual demo or will you be covering numerical results from your experiments/use case?

    • Amogh Kamat Tarcar
      By Amogh Kamat Tarcar  ~  7 months ago
      reply Reply

      Hi Ashay , the 'Demo' part will be visual demo of the experiment using UI/notebook interface. As the full process  cannot be demonstrated in limited time , I will be discussing the experiment results at the the end while covering one representative walkthrough of the process flow.

      • Ashay Tamhane
        By Ashay Tamhane  ~  6 months ago
        reply Reply

        Thanks Amogh. Another question: are you proposing to keep the privacy of individuals intact by a different way of storing data across servers, or is this more related to techniques related to machine learning (fairness etc)?

        • Amogh Kamat Tarcar
          By Amogh Kamat Tarcar  ~  6 months ago
          reply Reply

          Hi Akshay,  The privacy preserving techniques are inspired from decentralised machine learning techniques wherein the data is not aggregated on a central server. For instance 4 cancer hospitals can build a common ML model while not aggregating their data at a central place.

          These techniques are not directly related to the subset which work for fairness which check if the models are developing bias for certain data points/feature groups. 

           

  • Deepti Tomar
    By Deepti Tomar  ~  7 months ago
    reply Reply

    Hello Amogh,

    Thanks for your time and efforts on the proposal! Could you answer the following questions to help the program committee understand your proposal better?

    • Are these demo(s) /use case(s) from your project work (industry-specific use cases)? Speaker's experience on the project helps people understand the concept better.
    • Did you use these techniques to help solve a particular problem? 
    • If yes, would you be sharing the challenges faced in the implementation of the technique in your application and the workarounds?

    Thanks,

    Deepti

    • Amogh Kamat Tarcar
      By Amogh Kamat Tarcar  ~  7 months ago
      reply Reply

      Hi Deepti , 

      • Are these demo(s) /use case(s) from your project work (industry-specific use cases)? Speaker's experience on the project helps people understand the concept better. 
        • Amogh:  Yes. The demo /use cases our from my project work at Persistent Research Lab. They are healthcare domain specific use cases.
      • Did you use these techniques to help solve a particular problem? 
        • Amogh: Yes. These techniques are bespoke for sensitive datasets which require privacy preserving implementation.
      • If yes, would you be sharing the challenges faced in the implementation of the technique in your application and the workarounds?
        • Amogh: Yes. I'll be sharing our experience while implementing these use cases including our specific challenges and workarounds. 
      • Deepti Tomar
        By Deepti Tomar  ~  7 months ago
        reply Reply

        Thanks for your response, Amogh! We will let you know in case if we have more questions.

  • Natasha Rodrigues
    By Natasha Rodrigues  ~  8 months ago
    reply Reply

    Hi Amogh,

    Thanks for your proposal! Requesting you to update the Outline/Structure section of your proposal with a time-wise breakup of how you plan to use 20 mins for the topics you've highlighted?

    To help the program committee understand your presentation style, can you add the slides for your proposal and provide a link to your past recording or record a small 1-2 mins trailer of your talk and share the link to the same?

    Also, in order to ensure the completeness of your proposal, we suggest you go through the review process requirements

    Thanks,

    Natasha

    • Amogh Kamat Tarcar
      By Amogh Kamat Tarcar  ~  7 months ago
      reply Reply

      Hi Natasha, I have updated the proposal with slides and outline. Here is a short recording of a related topic ; Intro to differential privacy  . https://youtu.be/w5mL4YEf7pE 

      • Natasha Rodrigues
        By Natasha Rodrigues  ~  7 months ago
        reply Reply

        Hi Amogh,

        Thank you for the updates and the video, requesting you to add your video link to the video section in your proposal.

        Thanks,

        Natasha

        • Amogh Kamat Tarcar
          By Amogh Kamat Tarcar  ~  7 months ago
          reply Reply

          Thanks Natasha.  Updated video link. 

           


  • Bharati Patidar
    keyboard_arrow_down

    Bharati Patidar - AI/ML under the covers of modern Master Data Management

    20 Mins
    Talk
    Intermediate

    Data quality is utmost important in Master Data Management solutions. Data curation and standardisation involves multiple iterations of exchange between customers and its’ vendors. Rules written for validations and corrections, pile up and their maintenance gets costlier with time. Data quality rules can run from 500+ to 20K, many of which get outdated, but cannot be taken out risking any regressions. To address these challenges, we turned to machine learning to enable autocorrection of the human errors and standardisation of the content across products on-boarded.

    This talk is about our journey to fix the problem at hand where we started with implementing a simple spell check algorithm using edit distance/Levenshtein distance to more complex language models. We used state of the art approaches such as a char-to-char sequence model with encode decoder, auto encoders, attention based transformers and even BERT. The result from these models were getting better, but not good enough to the quality expected. These experiments with latest techniques helped us build a strong intuition and understanding of language models.

    I will also be touching upon the data collection, it’s challenges and our work arounds. The key takeaway will be performance comparisons of the various techniques and approaches from the experiments, (in the context of our use case) something similar that I had once longed to see before starting on this journey. I will also share my experience on intuitions learned and common mistakes to be aware of.

    If there is anything that blocks you today from trying new techniques, or keeps you wondering how and where to start from, or anything that I could help you with, please leave a comment and I will work to get answers to, in this talk (if the talk gets accepted, if not pls reach out to me on linkedIn and I will be happy to help.).