Deep Learning Based Selenium Test Failure-triage Classification Systems

Problem Statement:

While running thousands of automated test scripts on every nightly test schedule, we see a mixed test result of pass and failures. The problem begins when there is a heap of failed tests, we caught in the test-automation trap: unable to complete the test-failure triage from a preceding automated test run before the next testable build was released.

Deep Learning Model:

The classification was achieved by introducing Machine Learning in the initial months and followed by Deep Learning algorithms into our Selenium, Appium automation tests. Our major classification was based on the failed test cases: Script, Data, Environment, and Application Under Test and that internally had hundreds of sub-classifications.

To overcome this problem, we started to build and train an AI using Deep Learning, which simulates a human to categorize the test case result suite. Based on the test result failure for each test, the AI model would predict an outcome through API, categorizes and prioritize on the scale of 0 to 1. Based on the prediction model, the algorithm takes appropriate response actions on those test cases that are failed like re-run test again or run for different capabilities. We kick-started this by gathering the historical data set of 1.6 million records, which was collected over a 12 months period, data including the behavior of the test case execution and the resulting suite.

This Deep Learning-based algorithm has been provided the quality to break down the new defects based on its category, and a classification score is given on a scale of 0-1. We’ve also established a cutoff threshold based on its accuracy of improving, and to group the failed test cases based on its similarity. Classification of the test cases is done in high granularity for sophisticated analysis, and our statistical report states that the classification of the defects has been increased with 87% accuracy over a year. The system has been built based on the feedback adapting models, where for each right classification it will be rewarded and for the wrong, a penalty is given. So whenever receiving a penalty the system will automatically enhance itself for the next execution.

The algorithm has a powerful model for detecting false-positive test results calculated using the snapshot comparisons, test steps count, script execution time and the log messages. Also, the model has been built with other features like – duplicate failure detection, re-try algorithms and defect logging API, etc.

The entire classification system has been packaged and deployed in the cloud where it can be used as a REST service. The application has been built with its own reinforcement learning where it uses the classification score to enhance itself and this is programmed to perform in an inconclusive range.

In sum, this deep learning solution can assist all Selenium testers to classify their test results in no-time and can assist to take next steps automatically and allow team could focus its efforts on new test failures.


Outline/Structure of the Talk

- Failures classification (Traditional Approach) and Pain points
- General Classification Types: Script, Data, Environment or application
- Deep Learning Model design for failure triage classification system
- Historical data set: features and challenges
- Predictive scoring and cutoff threshold
- Grouping, Attribute-based assertion
- Feedback adapting models (Rewards and Penalty)
- Package, deploy in the cloud and make it as REST service
- Our success and failures during this deep learning implementations
- Reinforcement learning for false positives and true negatives, inconclusive range.

Learning Outcome

The test automation engineer can learn and implement the deep learning model at their workplace and the algorithms can assist them to classify based on their data set and configured classification and subtypes in a short span.

Target Audience

Every test automation engineer

Prerequisites for Attendees

A basic understanding of the ML algorithm and their pain points towards their failure classifications from test automation results.

schedule Submitted 3 weeks ago

Public Feedback

comment Suggest improvements to the Speaker
  • Suresh Nagarajan
    By Suresh Nagarajan  ~  3 weeks ago
    reply Reply

    I like these aspects of the submission, and they should be retained:

    • ...

    I think the submission could be improved by:

    • ...