Detecting Bias in AI: A Systems View & A Technique for Datasets

schedule Aug 8th 12:00 - 12:45 PM place Grand Ball Room 2 people 138 Interested

Modern machine learning (ML) offers a new way of creating software to solve problems, focused on learning structures, learning algorithms, and data. In all steps of this process, from the specification of the problem, to the datasets chosen as relevant to the solution, to the choice of learning structures and algorithms, a variety of biases can creep in and compound each other. In this talk, we present a systems view of detecting Bias in AI/ML systems as analogous to the software testing problem. To start, a variety of expectations from an AI/ML system can be specified given its intended goals and deployment. Different kinds of bias can then be mapped to different failure modes, which can then be tested for during a variety of techniques. We will also describe a new technique based on Topological Data Analysis to detect bias in source datasets. This technique utilizes a persistence homology based visualization and is lightweight: the human-in-the-loop does not need to select metrics or tune parameters, and carry out this step before choosing a model. We’ll describe experiments on the German credit dataset using this technique to demonstrate its effectiveness.

 
 

Outline/Structure of the Talk

The first part of the talk will provide an overview of bias in machine learning systems including different types of bias that can occur at various stages of the machine learning pipeline, and the implications these biases have on different stakeholders. In the second part of the talk, we introduce a light-weight bias detection technique based on topological data analysis. This method can be applied as a pre-processing step and can serve as an accessible tool for non-domain experts to visualize bias due to various attributes in a dataset.

Learning Outcome

  • A comprehensive overview of bias in machine learning systems
  • Knowledge of light-weight tools to check for bias in datasets

Target Audience

Data Scientists, Data Engineers, Data Specialists, Machine Learning Engineers, Data Science Enthusiasts

schedule Submitted 2 months ago

Public Feedback

comment Suggest improvements to the Speaker