Productionizing Machine Learning - Lessons learnt from trying to supercharge big data

Machine Learning (ML) systems are vulnerable to a host of an unexpected class of behaviours such as adversarial perturbations, backdoors/trojans and discrimination to name a few. To this end, there is a pressing need for ML systems to be evaluated comprehensively beyond rudimentary metrics such as accuracy to robustly ensure the quality of ML models that are put into production settings.

The broad problem I would like to address is “What are the important properties for production-level ML models and how does one evaluate them?”. In attempting to answer this question, I will describe some of my salient works towards this goal in this talk.

In the first work towards this goal, we have developed Aequitas [1].

  • Aequitas searches, measures and mitigates the violations of individual fairness in ML. it is known that historical datasets are often biased.
  • Additionally, removing the sensitive features (gender, race, religious/sexual orientation) isn’t usually effective to enforce fairness.
  • Aequitas uses a probabilistic framework to effectively search the input space of an ML model to uncover such violations.

Another work, Ogma [2] encapsulates a large part of a text-based input space using a context-free grammar.

  • Using this grammar we direct test generation to expose violations in text-based classifiers.
  • Ogma is unique because it does not rely on any labelled data for test generation.
  • Ogma solves the oracle problem through differential testing.

In the domain of ML security, we have developed Neo [3], a completely blackbox tool that detects backdoors that may be present in ML systems.

  • Backdoors in ML are analogous to backdoors in traditional software.
  • If a poisoned model is presented with a specific trigger in an input, the input will be misclassified.
  • Our goal with Neo is to identify if a model is poisoned, to reconstruct the trigger and mitigate the attack.

In this talk, I aim to acquaint the audience with the cutting edge research that is happening at my group as part of the Singapore University of Technology and Design.

[1] Sakshi Udeshi, Pryanshu Arora, and Sudipta Chattopadhyay. Automated directed fairness testing. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018, pages 98–108, 2018. 

[2] Sakshi Udeshi and Sudipta Chattopadhyay. Grammar based directed testing of machine learning systems. IEEE Transactions on Software Engineering, 2019. URL: 

[3] Sakshi Udeshi, Shanshan Peng, Gerald Woo, Lionell Loh, Louth Rawshan, and Sudipta Chattopadhyay. Model agnostic defence against backdoor attacks in machine learning. 2019. URL: 02203.


Outline/Structure of the Talk

7 mins - General overview of research in the area + importance

3 mins - Introduction to Asset Research Group

9 mins - Aequitas research deep dive + impact

9 mins - Ogma research deep dive + impact

9 mins - Neo research deep dive + impact

3 mins - Closing thoughts and future work

5 mins - QA + Discussion

Learning Outcome

  • ML is a unique paradigm of software and no doubt, the world of computing is excited about its commercial and critical applications.
  • What is limiting the large(r) scale deployment of ML is quality. Currently, to the best of my knowledge, there does not exist engineering infrastructure (metrics and software) to ensure the quality of arbitrary ML models.
  • As part of my research, I want to guide the design of this infrastructure such that engineers in the world of computing can produce models provably free of such bugs.
  • I hope that I can illustrate the importance of this problem to the attendees of Agile India and inspire discussions about the best solutions to solve this problem.

Target Audience

Developers, Leads



schedule Submitted 1 year ago

  • Karthik Gaekwad

    Karthik Gaekwad - Kube Me This! Kubernetes Ideas and Best Practices

    Karthik Gaekwad
    Karthik Gaekwad
    Principal Engineer
    schedule 1 year ago
    Sold Out!
    20 Mins

    Kubernetes is a force to be reckoned with for enterprises, and there’s been a major shift to move their infrastructure to use Kubernetes. This session covers best practices you’ll want to keep in mind when making a shift from deploying applications to web servers and moving to a microservices model and Kubernetes. You’ll learn about topics you should consider while moving to Kubernetes and principles you should follow when building out your Kubernetes-based applications or infrastructure. The ideas presented are from war stories and experiences of working in the trenches with Kubernetes for the managed service for Oracle Cloud.

  • Damodharan Rajalingam

    Damodharan Rajalingam - Shrinking Production Incidents

    45 Mins

    Shrinking Production Incidents details an organized approach for reducing the overall time of production outages.

    Reliability is a key feature of a service that keeps users happy. Users do not stick with services that have frequent outages.

    In this session, you will get to know an organized approach that will help your team to manage production incidents so your service will have fewer, shorter and smaller outages, enabling increased release velocity for your developers.

  • Shripad Agashe

    Shripad Agashe - Dishonest Organisation : How good intentions can lead to ethical fading

    Shripad Agashe
    Shripad Agashe
    Code Wrangler
    On My Own
    schedule 1 year ago
    Sold Out!
    45 Mins

    Enterprise wide data collection always has data quality issues. Those who have ever dealt in collecting productivity data during haydays of SEI-CMM would know what I mean. Top management would always focus on being as predictable as possible in operations. That means getting a good handle on a simple thing like how much time a task would take for a given technology in a given domain. So everytime something new came up, there will be more data points to be collected for more accurate forecasts. It would reach a point where the people who are supposed to capture this data simply would not have enough time to truthfully gather what is being asked for, apart from their routine day jobs. All the data collection requirements would be seen as a way by top management to dissociate from failure if any, by simply stating that they did not get enough data. This only served a political purpose.

    This sort of approach is a typical feature of F.W. Talyorian approach where everything is looked at as a machine to be orchestrated by a competent person in charge. It excludes lot possibilities when the strategy meets execution. Namely, Knowledge Gap, Understanding gap and execution gap. So for any undesirable outcome, there will be more effort to be spent on collecting more information, more reporting etc. etc. discounting uncertainty in external environment

    .But any of what I have described above is not even the most damaging part. The damaging part is the deluge of requirements force people to lie and managers to accept it as a white lie. Everybody knows that the data they are getting is rubbish ( as they themselves might have done when they were collecting data).This “strategic lying” leads to what Ann Tenbrunsel termed as “Ethical fading”. In her words “This paper examines the root of unethical decisions by identifying the psychological forces that promote self-deception. Self-deception allows one to behave self-interestedly while, at the same time, falsely believing that one’s moral principles were upheld. The end result of this internal con game is that the ethical aspects of the decision “fade” into the background, the moral implications obscured.”

    In this talk apart from the points mentioned above, I will cover how following “agile processes” and “dev ops” culture would alleviate one from this “ethical fading”. To enable truthful organisation, the top leaders have to carefully calibrate and limited amount of reporting burden that can be put on subordinates. So essentially it boils down to having a fixed bandwidth for reporting requirements and management can only add any requirements by removing one of already existing if the effort taken for the new requirements is likely to breach the “fixed reporting bandwidth limit”. An approach taken by prussian military in 1870s and later on adopted by various high functioning organisations throughout the world.