A quick study on challenges with automated Visual Testing Engine
Although recently at Perfect Dashboard our focus has slightly shifted to Cross-Selling as a Service, AutoUpdater is our crown jewel and is always on our mind. Today, I want to bring forward an interesting issue we have encountered while working on our proprietary AI-powered visual testing engine. Contrary to what one might think, is not an edge case at all.
I want to bring this case to your attention, because dealing with such challenges is absolutely critical if the automated tests are to work correctly and serve their purpose. After all, without them, safe automated WordPress updates would not be possible; hence the best and most efficient way of securing websites against automated hacking would be off the table.
The rule of thumb for testing the website after the software update is: The website should always look and work the same way before and after the update. For instance, if you decide to update Jetpack from 7.1 to 7.1.1, the website should remain intact and unchanged. This is the foundation on which the entire concept of automated visual testing is built. Only the lack of difference between a screenshot of a website before and after an update is a clear indication that the update was successful.
Well, this statement is true in 98% of cases, but there is one notable exception. As we see thousands of website updates each month, we know that there are updates, that bring new features and thus change the appearance of a website. And in such cases, the website is supposed to look different after the update but it doesn’t mean that the website got broken.
The latest example is WordPress 5.1, which brought a small and quite useful feature in the native WordPress commenting system represented by the checkbox below the comment box.
Now, if the system would rely on the pixel difference between the screens, it absolutely wouldn’t stand a chance in such case and will flag this website as broken after the update process as 6.78% pixels have changed between the screens. This would be enough to cause a false negative, making a website owner waste a lot of time on looking for a bug while causing unnecessary stress.
‘This is exactly what AI is for!’ one might think. I’m sorry, but again, not entirely true. A simple implementation of AI is awesome when it comes to, i.e., recognizing dynamic content on a website and limiting false positives. Introducing this technology was a huge gamechanger in the automated tests field. But in this case, a simple AI setup will not be able to deal with it correctly, and most likely will flag the website as broken after the update.
It’s because all the siamese neural networks (which are usually used in such comparisons) disregard the time sequence of screenshots. In other words, they are not able to acknowledge that the first screenshot was taken before the update and second was taken after one. Because, for the system, there is no “first” or “second” screenshot. There are just two screenshots to compare. Consequently, for the siamese network, a disappearance of the checkbox which used to be there means precisely the same thing as a new checkbox appearing on the other screen. And usually, it’s enough to trigger an alert, as missing content (that’s how the AI would qualify this case) is a clear indication that something is wrong with a website after an update.
The issue described above is only one example of an endless number of challenges on the way to the fully automated Visual Test Engine and reach our ultimate goal: Allowing the world to sleep peacefully at night, while WordPress websites are being updated and tested automatically while avoiding false negatives and positives so that web developers and owners would not get a heart attack while having their morning coffee.