Detecting Bias in AI: A Systems View & A Technique for Datasets
Modern machine learning (ML) offers a new way of creating software to solve problems, focused on learning structures, learning algorithms, and data. In all steps of this process, from the specification of the problem, to the datasets chosen as relevant to the solution, to the choice of learning structures and algorithms, a variety of biases can creep in and compound each other. In this talk, we present a systems view of detecting Bias in AI/ML systems as analogous to the software testing problem. To start, a variety of expectations from an AI/ML system can be specified given its intended goals and deployment. Different kinds of bias can then be mapped to different failure modes, which can then be tested for during a variety of techniques. We will also describe a new technique based on Topological Data Analysis to detect bias in source datasets. This technique utilizes a persistence homology based visualization and is lightweight: the human-in-the-loop does not need to select metrics or tune parameters, and carry out this step before choosing a model. We’ll describe experiments on the German credit dataset using this technique to demonstrate its effectiveness.
Outline/Structure of the Talk
The first part of the talk will provide an overview of bias in machine learning systems including different types of bias that can occur at various stages of the machine learning pipeline, and the implications these biases have on different stakeholders. In the second part of the talk, we introduce a light-weight bias detection technique based on topological data analysis. This method can be applied as a pre-processing step and can serve as an accessible tool for non-domain experts to visualize bias due to various attributes in a dataset.
Learning Outcome
- A comprehensive overview of bias in machine learning systems
- Knowledge of light-weight tools to check for bias in datasets
Target Audience
Data Scientists, Data Engineers, Data Specialists, Machine Learning Engineers, Data Science Enthusiasts