AI in Software Testing: The Hype, The Facts, The Potential

AI in software testing banner.png


The Promise of AI 

The promise of Artificial Intelligence (AI) in software testing is that an intelligent agent will one day replace humans. Instead of the struggle of manual labor involved in the endless unit and integrated proofing of software quality, machines will test computer systems without human intervention. Software quality will improve dramatically, delivery times will compress from minutes to seconds, and vendors and customers will experience a software renaissance of inexpensive and user-friendly computer applications (apps). 

The luxury of inexpensive storage space, blazing fast processing rates, readily available AI training sets, and the Internet have converged to turn this promise into hype, however. 

How can AI be used to improve software testing? Does the hype live up to the facts about and constraints on AI in software testing? What is the nature of software testing that makes autonomous means challenging to develop and implement? And what is the true reality that AI research can deliver for the software testing industry?


The Hype

Searches on Google or any other search engine for AI in Software Testing reveal an assortment of magical solutions promised to potential buyers. Many solutions offer to reduce the manual labor involved in software testing, increase the quality, and reduce the costs to the organization. The vendors promise that their AI solutions will solve the software testing "problem". The "Holy Grail" of software testing — the magical thinking goes — is to replace human beings, to take them and their mistakes and oversights out of the software development loop. The aim is to shorten the testing cycle, make it more effective, and less cumbersome. But is that desirable or even possible?


The Reality

Of course, the reality is far more complex and daunting when it comes to taking humans out of a human-centered process; that is, software development. Software development is a process for and by humans. No matter the methodology — Waterfall, Rapid Application Development, DevOps, Agile, et al — humans remain central to the purpose of the activity.

Humans, then, define the boundaries and the potential of the software they create. The reality of software testing, then, is that the "goal posts'' — to use an international football metaphor — are always shifting. Business requirements may not be clear and they may always be changing. Further, user demands for usability are seldom fixed, and even developer expectations for what is possible from the software can be changeable.

Indeed, the initial standards and methodologies for software testing (including the use of the term Quality Assurance) come from manufacturing product testing. Within the manufacturing context, products can (and should) be well-defined. And because manufacturing goods are far less malleable than software, testing is far more mechanical. Testing routines are "set in stone".

Software testing does not allow such uniform, robotic methods of assuring quality, though. In modern software development, you don't know what you don't know as a developer. For example, perhaps User Experience (UX) expectations have changed since the first iteration of the software. Or maybe expectations of screen load times should be faster, or scrolling needs to be speedier, or users no longer want lengthy scrolling down a screen, as it is no longer in vogue.

Whatever the reason, AI can never anticipate or test for what it or, certainly, its creators had not seen coming. Tester constraints to the imagination (that is, trying to know what they don't know), will also constrain AI. So there can be no truly autonomous AI in software testing.

Creating a software testing "Terminator" may pique the interest of the media and prospective buyers, but deployment is a mirage. Instead, software testing autonomy makes more sense within the context of a staged maturation of artificial intelligence; one in which AI that works in tandem with humans is a viable outcome. 


AI Stages

Software testing AI development has three stages of development maturity: Operational, Process, and Systemic.

Operational use of Artificial Intelligence in software testing is where the overwhelming majority of AI-enabled software testing is currently at. Operational testing, at its most basic, involves creating scripts that mimic routines human testers may have to do themselves hundreds of times. The "AI" in this instance is far from intelligent and may help with items like shortening script creation, repeated executions,  and storing results.

Process AI is a more mature version of Operational AI. Testers can use Process AI for test generation. Other uses may include test coverage analysis and recommendations, defect root cause analysis and effort estimations, and test environment optimization.   Process AI can also facilitate synthetic data creation, based on patterns and usages.

Further, Process AI can provide an additional set of "eyes" and resources to offset some of the risks that testers take on when they are setting up the test execution strategy. In actual application, Process AI may relieve testers' labor when it comes to testing after a change in the code. 

Manual testing often sees testers retesting the entire application, looking for unintended consequences of a code change. Process AI may, on the other hand, recommend a test to a single unit (or limited impact area), instead of a wholesale retest of the entire application. At this level of AI, we find a clear gain in development time and cost. 

In the third stage, Systemic AI, on the other hand, the future can become a slippery slope of unrequited promises.


Systemic AI

One of the reasons systemic — or fully autonomous — AI testing is not possible (at least for now) is because of the overhead in training the AI will require. Humans can have a high degree of confidence that Process AI suggests that a single unit test is adequate to assure software quality. Autonomous AI, though, will not increase a human's confidence level that the software meets all requirements, even those not known about.

Truly autonomous AI would have to test for requirements not even humans know. Humans, then, would have to engineer human-centric means of testing the autonomous AI's assumptions and conclusions. The "proofs" that AI autonomous testing for unknown conditions would in themselves require a great deal of time and resources to provide humans the 100% level of confidence to feel assured the AI was accurate.

The development of autonomous software testing is therefore "asymptotic": it can never be fully realized because humans wouldn't trust it; therefore, why work toward full autonomy in the first place?


Training AI

Though fully autonomous AI is a chimera, developing AI that supports and extends human efforts at software quality is a worthwhile pursuit. And herein is where humans become reinforcing teachers of the AI: testers must consistently monitor, correct, and teach the AI with ever-evolving learning sets. The challenge is to train the AI in the difference and assign risks to various "bugs'' in the software it is testing. Training has to be an ongoing effort, in the same way, autonomous car makers have to train their AI systems on the difference between a person crossing a street and a bicycle rider. 

While a cat is a cat, though, testers must train software testing AI with past data to build their confidence in the agent. However, truly autonomous AI in testing needs to project future conditions — developer-induced and user-induced — which it cannot do based on historical data. Instead, trainers train AI based on data sets laced with the trainers' own biases. The biases put blinkers on the possibility that AI can explore the same way blinders keep a horse from wandering off an established path. Increasingly biased AI becomes increasingly untrustworthy. Confidence becomes low that the AI is performing as expected.

The best the AI can be trained to do is deal in risk probabilities and arrive at risk mitigation strategies ultimately assessed by humans.


Risk Mitigation

Ultimately, software testing is a confidence game. The tester weighs the probable outcomes of initial implementations and changes to code that could cause problems for developers and users alike. Confidence can never be 100% that software testing has fully explored every likelihood of application breakdown. Whether manually performed by humans or autonomously, all software testing is risk-based. 

Testers have to decide the test coverage based on the probability the code they covered may (or may not) create operational problems. They must also use risk analysis to decide outside the coverage area they should be concerned about. AI is no different.

And even if AI determines and displays relative probabilities of software failure at any point in the chains of user activity, a human still needs to confirm the calculation. At its most realistic, though, the probabilities for software continuity that AI offers are "polluted" by historical biases, or, at the most utopian vision for AI, humans will still not have a high level of confidence in the AI's risk assessment and prescriptions to mitigate risk. 


Katalon Forges On with Its Vision for AI

Katalon is committed to developing and delivering AI-enabled software testing tools that are practical and effective. The tools should produce realistic results for testers with minimal work to effectively use the Katalon platform. And the testing software should alleviate a great deal of the testers' manual labor. 

Katalon believes the most exciting — and potentially disruptive — deployment of AI in software testing is at the second level of AI development maturity: Process AI. One Katalon researcher noted, "the biggest practical usage of AI applied for software testing is at that process level, the first stage of autonomous test creation. It would be when we are able to create automated tests that can be applied by and for me."

So, autonomous and self-directed AI that replaces all human involvement in the software testing process is hype. However, the expectation that AI can supplement human efforts, extend, and shorten test times is realistic and desirable. It's also in the not-too-distant future.