Estimating Minimum Viable Model Performance Baselines for Machine Learning Projects

In an ideal world we would prioritize machine learning projects according to their expected payoff. Which in turn would require a reasonable estimate of both the probability of success and the return given that success. Creating such an estimate is difficult, and instead we tend to be guided by heuristics, implicit and explicit, about the size of the problem and its difficulty.

Very often the difficulty estimation is limited to a discussion about what it would take to feasibly deploy a model and integrate it as part of the business process. Estimating the difficulty of building the model is generally left to the expectations of the machine learning scientist and their experience (or lack thereof) in the problem domain.

In this talk, I will be presenting the results of research into methods to estimate minimum required performance characteristics of the model given a set of information about the business problem that is to be solved. These technique have been implemented into an open source application which allow discussion of some of the implementation details. The result is a set of techniques that will allow data scientists and managers to estimate whether a proposed machine learning project is likely to succeed before any modelling needs to be done.


Outline/Structure of the Talk

1. Introduction - Why we need to estimate the minimum performance requirement

2. Classification Business Requirements - How we capture and represent the costs/benefits of a machine learning implementation for a classification problem

3. Analytical Solutions - What can we directly calculate from the known costs/benefits

4. Numerical Solutions - What can we estimate about required performance through simulation

5. Regression Business Requirements - How can we capture and represent the costs and benefits of implemented a regression model.

6. Methods for Estimating Impact of Regression Models

7. Conclusions

Learning Outcome

Attendees will come away understanding the efficiency saving of estimating minimal require performance characteristics of machine learning models. As well as a set of examples of how to do this for certain kinds of problems. These examples will also show them general computational techniques for making these estimates that they could then apply to their own bespoke problems.

Target Audience

data scientists, machine learning engineers, analytics managers

Prerequisites for Attendees

Attendees will require an understanding of the general types of machine learning projects (supervised, unsupervised, classification, regression) and an understanding of basic arithmetic and probability theory. Attendees will benefit from having had some experience deploying a machine learning model into an automated decision process.

schedule Submitted 1 year ago

Public Feedback