Using Computer Vision to Reduce Test Automation Blind Spots

The standard test automation toolkit easily completes web and mobile automation, but it fails to detect elements on desktop and mobile content-based applications. Computer vision (CV) replicates the human eye using deep learning technology and can determine objects in pictures, which helps machines orient in space and perform repetitive detection tasks. Let's see how a CV can help automate the testing of a much wider software product list.

Anton Angelov will present to you a solution that combines functional tests written on WebDriver W3C protocol with a CV engine based on SikuliX. You will see examples of how his teams managed to create automated tests for verifying complex functionalities such as PDFs, charts, etc. At the end of the presentation, you will know how you can build a similar library in your native programming language to leverage the benefits of the combination between WebDriver and CV.

 
 

Outline/Structure of the Demonstration

1. Will define the problem we had, what we needed to test- give examples. PDFs and charts from our customer websites

2. We will talk about what the computer vision is and give some examples

3. Describe the overall architecture of the solution we designed and see how it can be used to solve our problem

4. Explain what a SikuliX is

5. Demos- real coding how we can create the mentioned design - combining WebDriver/WinAppDriver with SikuliX

6. Give real-world examples

Learning Outcome

understanding what computer vision is and how it can be used to help test automation
practical knowledge about SikuliX and its usages
understand an overall design of implementing a solution for CV with a combination of functional WebDriver tests

Target Audience

automation QAs, software developers in test

Prerequisites for Attendees

basic WebDriver knowledge, intermediate Java, C# or similar OOP language

schedule Submitted 2 months ago

Public Feedback

comment Suggest improvements to the Speaker
  • Pallavi R Sharma
    By Pallavi R Sharma  ~  1 month ago
    reply Reply

    Hello Anton

    The topic where you are integrating web/mobile automation drivers, with windows driver, and silkuliX which is automates based on image to provide a complete end to end solution sounds like an interesting topic. 

    But sikuliX is a well known open source tool and many people in audience must be already aware of its usage with selenium to achieve a solution when in need, same with winappdriver. 

    these tools are available in the market for quite sometime now and have a good user base and quite popular. 

    so in your talk, what extra are you proposing with the "CV" solution which would benefit the audience, can you kindly elaborate Anton. 

    • Anton Angelov
      By Anton Angelov  ~  1 month ago
      reply Reply

      Hello Pallavi, 

      Thank you for your questions! You are right that Sikuli is open source and popular. Some people use it separately where Selenium tools cannot cope. Can you share some popular free open source tools that use this approach and are well-designed and written? Whick commercial tools you have in mind? Applitools and similar use their own engines as far as I know. In the. Net world there was just a single sample project which is not supported anymore and not working in most cases. So I was going to share an open source github repo and explain how it works and why it can be beneficial. Moreover I will give real world examples where it is applicable as a story. The CV will be a little bit of introduction to Sikuli and this set of tooling nothing on top of it.

      I added a lot more details about the talk in the answers to the questions of your colleague Robin. 

      • Pallavi R Sharma
        By Pallavi R Sharma  ~  1 month ago
        reply Reply

        Anton

        i never said commercial tools, by the way Opkey is one of the commercial tools, which is a tool agnostic framework over multiple tools/solutions and provides integrated solution in case one needs. i am not sure if Katalon does it or not. and these are both commercial not open source. 

        by market i meant, already available in the 'world' out there. sikuli is not something new, and its integration with scripts for selenium along with handling windows is also not new. 

        i remember using these tools almost 7 years back to solve client problems which a long time ago. so im just curious to know what is that "CV" solution new thing in this talk which you will provide which audience will not know already ? 

        i already read the discussion between you and robin, and only after that i asked these questions. 

        im sorry if you had to repeat some of your answers, but i am trying to understand the 'novelty' in the solution proposed. 

        • Anton Angelov
          By Anton Angelov  ~  1 month ago
          reply Reply

          Pallavi, 

          I understand from where the confusion may come which is maybe my fault that I haven't explained well and will try to do it now. 

          The tools you shared are UI tools which don't require much coding skills the same is valid for pure SikuliX if you use the SikuliX editor. The approach and C#  library I am going to show wrapps SikuliX engine but basically you do everything with C# code. For example perform an action on a web page with WebDriver at the end of your test call a a method from the CV library to verify the screen or do another action. The novelty will be is that I will explain how to built it. Maybe you have done the same or other engineers in the world did it but I haven't read any articles or saw any videos about it. It is always possible to be a problem with my research. :)

          The library I will demonstrate and explain how it works and give real world examples will be freely downloadable on Github. 

          • Pallavi R Sharma
            By Pallavi R Sharma  ~  4 weeks ago
            reply Reply

            Anton, i also understand i haven't been clearn either actually "text" is a not a very effective mode of communication. I understood your solution. We built exactly the same in java using selenium, sikuli jars, pure code etc. 

            I appreciate your skill set, and your knowledge in here. Thanks for your explanation. I do not have any further questions. 

  • Robin Gupta
    By Robin Gupta  ~  2 months ago
    reply Reply

    I like these aspects of the submission, and they should be retained:

    • Using Sikuli and CV to detect/operate web elements seems like a good idea

    I think the submission could be improved by:

    • Adding a quick introduction to training sets for CV
    • Limitations of the platform
    • Cost implications and real life implementation examples
    • Comparison to contemporary tools/approaches
    • Anton Angelov
      By Anton Angelov  ~  2 months ago
      reply Reply

      Thank you for the suggestions. Let me iterate over them so that we have a shared understanding.
      As mentioned in the proposed agenda- I will explain what CV is- is this what you meant by "Adding a quick introduction to training sets for CV"?

      About the limitations- I agree that it is something that should be mentioned. I was going to say them during the real-world examples and what problems we had. For example- that it is a bit tricky with handling different desktop resolutions. Another tricky point is that it is hard to make the solution work in grid mode. It is not impossible, but additional code is needed- e.g., Selenium Grid plugin.

      I don't entirely get what do you mean by "cost implications"- you mean license costs or hardware ones? Since the tools are free, so is the library that I will demonstrate. Moreover, the library is even open-sourced. Also, as mentioned in the proposed agenda- a big part of the talk will be to look into real-life examples.

      I like the idea about the comparison to contemporary tools/approaches, I can compare the library/approach to tools such as AppliTools and some custom-made image-comparison solutions.

      Let me know what do you think.

      • Robin Gupta
        By Robin Gupta  ~  2 months ago
        reply Reply

        Thanks for the response.

        From the cost implications, I wanted to highlight the compute cost (cloud, hardware, GPU etc) and the time spent on setting up the solution (training, modelling, implementation and maintenance.

        • Anton Angelov
          By Anton Angelov  ~  2 months ago
          reply Reply

          Thanks for the clarification. I can add a little info about the suggestions. CPU, RAM consumption. There aren't any specifics about the hardware or cloud. About the implementation, I can mention what will cost if it is C# where you can download the source code directly and how much time- probably, you will need to create a similar port to other languages.
          From your response, should I understand that you like the other suggestions for improvements to the structure?

          • Robin Gupta
            By Robin Gupta  ~  2 months ago
            reply Reply

            Yes, the response clarifies the improvements