The Three-Rs of Data-Science - Repeatability, Reproducibility, and Replicability

schedule May 6th 11:25 - 11:55 AM place Green Room people 60 Interested

Adaptation of data-science in industry has been phenomenal in the last 5 years. Primary focus of these adaptations has been about combining the three dimensions of machine-learning i.e. the ‘data’, the ‘model architecture’ and the ‘parameters’ to predict an outcome. Slight change in any of these dimensions has potential to skew the predicted outcomes. So how do we build trust with our models? And how do we manage the variances across multiple models trained on varied set of data, model-architectures and parameters? Why the three Rs i.e. “Repeatability, Reproducibility, and Replicability” may have a relevance in industry application of data-science?

This talk has following goals:

  • Justify (with demonstrations) as to why “Repeatability, Reproducibility, and Replicability” is important in data-science even if the application is beyond experimental research and is geared towards industry applications.
  • Discuss in detail the requirements around ensuring “Repeatability, Reproducibility, and Replicability” in data-science.
  • Discuss ways to observe repeatability, reproducibility, and replicability with provenance and automated model management.
  • Present various approaches and available tooling pertaining to provenance and model managements and compare and contrast them.
 
1 favorite thumb_down thumb_up 0 comments visibility_off  Remove from Watchlist visibility  Add to Watchlist
 

Outline/Structure of the Talk

Outline of talk:

  • Explain how repeatability, reproducibility, and replicability is defined in data-science.
  • Briefly go over the research literature explaining the importance of repeatability, reproducibility, and replicability and how they are different.
  • Justify (with demonstrations) as to why “Repeatability, Reproducibility, and Replicability” is important in data-science even if the application is beyond experimental research and is geared towards industry applications.
  • Discuss in detail the requirements around ensuring “Repeatability, Reproducibility, and Replicability” in data-science.
  • Discuss ways to observe repeatability, reproducibility, and replicability with provenance and automated model management.
  • Present various approaches and available tooling pertaining to provenance and model managements and compare and contrast them.

Learning Outcome

  • This talk aims to provoke some thoughts around why Repeatability, Reproducibility, and Replicability are important and still relevant in industry application of data-science.
  • This talk would also cover approaches and tooling options available for provenance (data/model/parameter sets) and model managements.

Target Audience

Machine-Learning Engineers, Data-Scientists, Data Engineers, Architects, Product Owners

Prerequisites for Attendees

Basic understanding of machine-learning and data-science would be helpful.

schedule Submitted 4 months ago