Metamorphic Testing for Machine Learning Models with Search Relevancy Example

Accuracy of a Model can be improved in several levels and multiple variables, boundaries and guidelines. With the well known problem statement and solution, it is difficult to evaluate for all the given cases the model would be predicting expected outcomes. Machine Learning Models are solving for the problems for which results are unknown, most of the times. This arises a problem of Test Oracle. Recent surveys and work have shown that this difficulty can be reduced by some of the blackbox testing techniques such as Metamorphic Testing, Fuzzing, Dual Coding et.,

Even though the output of a Model is not known, we can make few predictions based on the Metamorphic relations. A metamorphic relation refers to the relationship between the software input change and output change during multiple program executions. Many metamorphic relations are created based on the transformation from training data set or test data set. We further classify them into Coarse-grained Data transformation and Fine-grained data transformation.

We will discuss different transformations. Will go through the example of a Search relevancy problem and will analyse the application of Metamorphic testing to verify the Machine model built.

 
 

Outline/Structure of the Experience Report

  • Introduction - 2 min
    • Will discuss about the difficulty of Machine learning model testing
  • Test Oracle - 1 min
    • What is Test Oracle?
    • How it is constructed
  • Problem with Test Oracle - 2 mins
    • Why typical Test Oracle not applicable to Machine Learning
  • Pseduo Oracle - 1 mins
    • What is Pseudo Oracle and it's advantages
  • Metamorphic Testing - 3 mins
    • What is Metamorphic testing
    • Metamorphic testing with simple Example
  • Transformations - 3 mins
    • Coarse-grained Data transformation
    • Fine-grained data transformation
  • Search Relevancy - 3 mins
    • What is Search Relevancy
    • Machine Learning Model
    • Problem in Testing the Machine Learning Model
  • Application of Metamorphic testing in Search Relevancy - 4 mins
    • How we used Metamorphic testing for our Model
  • Advantages of Metamorphic Testing

Learning Outcome

Detail understanding Metamorphic testing.

How to improve the quality of the Machine Learning model using Metamorphic testing

How to use metamorphic relations for designing test cases for Machine Learning model

Target Audience

Software Engineer, Data Scientist, AI Enthusiast, QA Engineers, Project Managers

Prerequisites for Attendees

Basic knowledge of Machine Learning and development

schedule Submitted 7 months ago

Public Feedback

comment Suggest improvements to the Author
  • Kuldeep Jiwani
    By Kuldeep Jiwani  ~  6 months ago
    reply Reply

    Hi Vinayaka,

    You have listed the approach you wish to talk about. But there is no information on where you used it. Like you have mentioned it is hard to test the output of models so you have to use this approach. So please provide some details what kind of data science problem were you solving, which models were being trained and tested by your framework.

    The most important what did you achieve by using this approach. Can you share some details like earlier x% of test cases were matching and now (x+y)% cases started matching because of your approach.

    This will help the program committee get a better idea of your work.

    • Vinayaka Mayura G G
      By Vinayaka Mayura G G  ~  6 months ago
      reply Reply

      Hello Kuldeep,

      This approach was used for verifying Models built to solve the Search Relevancy problem for an e-commerce application. Models built were generic in nature and was deployed/trained to different tenants. Each tenant had different training data set. The model built was based on the popular XGBoost Algorithm.

      Model had multiple features for each tenants and result were varied drastically for the incremental change. With the complexity of huge input data set and different tenants, verifying quality of the each deployment was difficult. We planned for automation but it did not provide significant results as the trend of the search was always changing.

      From some of the available scholarly articles[1] we found that there has been already a research work done on this. We followed this approach as it was the similar kind of problem we were solving. Former approach was taking around ~80 Test cases per tenant to verify the deployment were reduced to ~20 Test cases. 


      [1]. https://ieeexplore.ieee.org/abstract/document/7254235

       

      Regards,

      Vinayaka Mayura

      • Kuldeep Jiwani
        By Kuldeep Jiwani  ~  6 months ago
        reply Reply

        Hi VInayak,

        Thanks for adding the details.

        You have mentioned the approached and its ML constituents. But I am still missing upon the first part of the question where you have used it in e-commerce domain to solve which problem.

        Like in the research paper you attached they clearly mentioned that they are solving the problem of testing a search engine as there are no objective ways to measure the test results. There results will be useful search engine developers and users.

        Similarly what problem are you solving and whom will it benefit, please elaborate further.

        • Vinayaka Mayura G G
          By Vinayaka Mayura G G  ~  6 months ago
          reply Reply

          Hello Kuldeep,

          Sorry, I missed this mail notification. Let me explain briefly about this.

          Our team was part of the E commerce product search team which focused on improving the search relevancy. Existing basic elastic search was not providing better product results to the user. We took the approach of solving this using the Machine Learning models. 

          We collected the user browser history and seperated it as different events. Example, search, product details page, product listing page, Advertisement, Affiliate redirections, Redirection etc., we identified the product which user is clicking. If we display 30 products in a results page, we identified which product's user is clicking. Example: whether user is clicking on the first five products displayed or is he navigating to the next page because he is not finding good results at the top. 

          There were multiple challenges like, for Example for search query "Iphone X" there was good search and conversion history, when "iphone 11" was released, more boost was given to the later based on the internal calculation. As we deployed this for multiple tenants, user language was also different. We can not use the same results set for all the tenants.

          The referred research paper was not exactly the same what we were doing but we referred this because it also focused on verifying the "correct ranking" and "consistent ranking". 

          Let me know if you need more details.

          Thanks,

          Vinayaka Mayura 

  • Deepti Tomar
    By Deepti Tomar  ~  7 months ago
    reply Reply

    Hello Vinayaka,

    Thanks for your time and efforts on the proposal! Could you answer the following questions to help the program committee understand your proposal better?

    • Are you going to share demo(s) /use case(s) from your project work (industry-specific use cases)? Speaker's experience on the project helps people understand the concept better.
    • Did you use these techniques to help solve a particular problem? 
    • If yes, would you be sharing the Challenges faced in the implementation of the technique in your application and the workarounds?

    Thanks,

    Deepti

    • Vinayaka Mayura G G
      By Vinayaka Mayura G G  ~  7 months ago
      reply Reply

      Hello Deepti,

      Please find the below answers for your queries.

      • Are you going to share demo(s) /use case(s) from your project work (industry-specific use cases)? Speaker's experience on the project helps people understand the concept better.
        • Yes, I will be showing a demo of this where we used this solution for solving one of our client requirements. Which is a widely used and high traffic website. Which collects TB's of data everyday. 
      • Did you use these techniques to help solve a particular problem? 
        • We used these technique to verify Machine learning models we built to solve the Search Relevancy use case for a retail application. Also will discuss where else this approach can be applied.
      • If yes, would you be sharing the Challenges faced in the implementation of the technique in your application and the workarounds?
        • Will discuss about the other solution's which helped with this and will share the drawbacks of this approach also.

      Let me know if you have any more questions or need info.

      Regards,

      Vinayaka Mayura

       

      • Deepti Tomar
        By Deepti Tomar  ~  7 months ago
        reply Reply

        Thanks for your response, Vinayaka! We will let you know in case if we have more questions.