Privacy preserving machine learning is an emerging field which is in active research. The most prolific successful machine learning models today are built by aggregating all data together at a central location. While centralised techniques are great , there are plenty of scenarios such as user privacy, legal concerns ,business competitiveness or bandwidth limitations ,wherein data cannot be aggregated together. Federated Learningcan help overcome all these challenges with its decentralised strategy for building machine learning models. Paired with privacy preserving techniques such as encryption and differential privacy, Federated Learning presents a promising new way for advancing machine learning solutions.

In this talk I’ll be bringing the audience upto speed with the progress in Privacy preserving machine learning while discussing platforms for developing models and present a demo on healthcare use cases.


Outline/Structure of the Demonstration

In this talk I’ll be introducing the audience to the emerging field of Privacy preserving machine learning which sits right at the intersection of decentralised machine learning and privacy preserving techniques. The talk will include overview of current platforms along with use case demonstration.

Timewise outline :

00:00- 01:00 : Intro

01:00 - 03:00 :Need for Privacy Aware Machine Learning

03:00 -07:00 : Federated Learning Intro+FL in healthcare

07:00-12:00: Privacy concerns + Tools & Platforms

12:00 -18:00 :Demo

Learning Outcome

This novel technique presents new opportunities for businesses to work around private and sensitive datasets. This demonstration will help audience understand building blocks of developing privacy preserving machine learning.

Target Audience

Machine Learning Engineers, Data Scientists, Product Managers

Prerequisites for Attendees

Knowledge of current machine learning techniques.



schedule Submitted 3 years ago

  • Bharati Patidar

    Bharati Patidar - AI/ML under the covers of modern Master Data Management

    20 Mins

    Data quality is utmost important in Master Data Management solutions. Data curation and standardisation involves multiple iterations of exchange between customers and its’ vendors. Rules written for validations and corrections, pile up and their maintenance gets costlier with time. Data quality rules can run from 500+ to 20K, many of which get outdated, but cannot be taken out risking any regressions. To address these challenges, we turned to machine learning to enable autocorrection of the human errors and standardisation of the content across products on-boarded.

    This talk is about our journey to fix the problem at hand where we started with implementing a simple spell check algorithm using edit distance/Levenshtein distance to more complex language models. We used state of the art approaches such as a char-to-char sequence model with encode decoder, auto encoders, attention based transformers and even BERT. The result from these models were getting better, but not good enough to the quality expected. These experiments with latest techniques helped us build a strong intuition and understanding of language models.

    I will also be touching upon the data collection, it’s challenges and our work arounds. The key takeaway will be performance comparisons of the various techniques and approaches from the experiments, (in the context of our use case) something similar that I had once longed to see before starting on this journey. I will also share my experience on intuitions learned and common mistakes to be aware of.

    If there is anything that blocks you today from trying new techniques, or keeps you wondering how and where to start from, or anything that I could help you with, please leave a comment and I will work to get answers to, in this talk (if the talk gets accepted, if not pls reach out to me on linkedIn and I will be happy to help.).