Took me years to finally get a Roomba (those robot vacuums from iRobot that have been around since 2002). I was pretty skeptical about how well it would clean and how smart and autonomous it really was.
My wife was on board on the benefits from the start, but I needed some convincing. After plenty of back-and-forth (and my wife’s persistence), we went for it.
Honestly, the benefits weren’t obvious to me from the start. I’m Brazilian, living in London, where most homes are 2 or 3 stories. Ours has 3 floors and plenty of stairs.
Once my wife finishes cleaning one floor, she has to manually carry the Roomba to the next, and even then, the battery can’t cover the whole house in one charge. So, keeping the place clean still requires a bit of human effort.
It’s great for light cleaning, but for deep cleaning, we still have to do it ourselves and bring in a professional cleaner once in a while.
Plus, maybe it’s just my OCD talking, but its weird, zigzagging paths around the room drive me nuts. It looks way too random to actually be cleaning efficiently.
Source: CNET
Even though it has its quirks and requires some extra effort, my wife thinks it works great for her. At the end of the day, if she’s happy, I’m happy!
I think our disagreement comes from having different expectations. I imagined the robot would handle all the cleaning perfectly on its own, with minimal human help.
In hindsight, it is clear that my expectations were pretty unrealistic. I fell for the hype, thinking it would be a silver bullet for all our cleaning needs.
My wife, on the other hand, sees it for what it truly is: a useful assistant for light cleaning, similar to other appliances that help with household chores. This way, she can avoid the repetitive tasks allowing her to focus on more creative interests, like learning Italian as her third language (which she’s getting pretty good at).
I was focused on the negatives, seeing the glass as half empty, while my wife took a more realistic approach, viewing it as half full. She recognizes the robotic vacuum as a handy assistant with some autonomy, but not a substitute for human-driven heavy duty cleaning, given the limitations of current technology (maybe in the future, but not right now).
Switching gears to the software testing field, I think the same mindset applies. We’re caught up in the hype of some amazing, almost magical advancements in AI, with many companies promoting a compelling narrative about the benefits of AI for autonomous testing.
The reality is, we’re not there yet. The technology isn’t advanced or mature enough to achieve a higher level of autonomy.
Companies have poured tons of time and money into making self-driving cars a reality, but we’re still nowhere near full autonomy. It’s a gradual process, with each level of autonomy clearly defined by the Society of Automotive Engineers (SAE). They’ve even got a guide that breaks down each stage of the journey toward fully autonomous driving, see below:
Source: SAE
Autonomous testing follows a similar journey, a gradual, step-by-step evolution, and right now, we’re only at the very first stage of this path.
Remember, in testing, we depend on consistent, deterministic, and repetitive processes, which we all know AI/GenAI is not known for just yet.
And that’s perfectly okay. We should focus on the positives, adjust our expectations, and make the most of tools that can increase our productivity and simplify our lives now, all while keeping an eye on the future.
Honestly, there’s no agreed-upon definition for autonomous testing. It can be really broad or super narrow, however, when I read about autonomous testing, the discussion mostly focuses on the autonomous execution of tests and the tasks required to maintain it, such as flaky test detection, self-healing, root cause analysis, among other capabilities.
I see no problem with autonomous execution of testing; it’s a welcome addition that makes my life as a tester easier.
However, whether you use traditional approaches like the V-Model and W-Model, shift-left and shift-right testing, or any flavor of agile methods to align with modern development practices, many parts of the testing process still create bottlenecks due to manual and inefficient practices.
Tasks such as test design, test specification, coverage analysis, test prioritization, bug triage/troubleshooting, and test data generation, among many others that are often done manually and inefficiently, will be significantly enhanced with the help of AI.
There are tons of tasks and use cases where a bit of autonomy could really boost our productivity and simplify the testing process. Here are a few examples I can think of off the top of my head, but it's not a complete list:
Like I said, this isn’t a complete list, and there’s so much more out there; I’m just scratching the surface of all the potential use cases. You probably have other tasks you handle every day that could really benefit from a bit of autonomy (and automation).
I suppose the question is no longer 'To AI or not to AI?' but rather, 'When and how should we use it?'.
As discussed earlier, with the right mindset and appropriate expectations for the outcomes produced by AI-augmented testing tools, even a small degree of autonomy can free us to focus on the most engaging, challenging, and creative aspects of testing.
By leveraging the right tools, there's potential and opportunity to automate these tasks while incorporating a certain level of autonomy, allowing human testers to focus on reviewing results/ouputs and applying their creativity and expertise on corner cases and more complex use cases or even making critical risk decisions.
I’ve read several LinkedIn posts from people claiming that reviewing AI results or guiding the AI to produce the correct output is time-consuming and sometimes pointless; they argue it’s often better to just do the work manually. I can’t disagree, there are many cases where it’s frustrating and inefficient, especially when the problem is too complex or there’s no AI-augmented tool available for that specific task.
However, by keeping the right expectations in mind, as discussed in this article from Fast Company, we should approach AI tools like a smart intern. According to the article, these tools can enhance how users perform their daily tasks, but like any intern, they can make mistakes at times. Here’s a verbatim excerpt from the article that I’d like to highlight:
"In practice, an intern mentality encourages users to think about working with GenAI as the evolution of trust in a relationship. When you first start using GenAI, just like on an intern’s first day on the job, you’re going to want to check every bit of work it produces. Over time, analogous to being a couple of months into the summer internship, you may find some tasks that the AI intern performs well enough to accept as a first pass, but still need to check and make your own. There may be other tasks the intern performs so reliably that you don’t even need to check its work. And there may be still other tasks that you don’t want to entrust to the intern at all".
At the end of the day, this is one of the challenges of being an early adopter, the technology isn’t quite there yet to tackle our most pressing problems, but we’re willing to experiment and explore its boundaries to see what’s possible (or not).
Keep in mind, this is still an emerging field, and the underlying technology is in its early stages and actively evolving. New stuff comes out all the time, but most of it isn’t quite polished yet.
I think the latest hot-off-the-press tech is Anthropic’s 'computer use'. While it wasn’t built specifically for software testing, it’s a new foundational tool that could be leveraged for smarter, more efficient autonomous testing execution. Keep in mind, though, it’s still in public beta and, as Anthropic mentioned in their announcement, it’s experimental, sometimes cumbersome, and prone to errors. But with rapid improvements, it has the potential to be a real game-changer.
I have complete confidence that AI can help with the routine and mundane work in software testing, but the best outcomes come from the collaboration between humans and AI, a concept known as Human-in-the-Loop (HITL). Realizing the full advantages of AI demands skilled testers capable of identifying what truly matters amid the noise created by companies developing AI tools.
A word of caution: as I highlighted in my previous article, 'Don’t let AI be a distraction; if pen and paper are the best solution, use them'. Avoid falling for shiny object syndrome, choose the tool and approach that best suit the task at hand. Don’t try to solve every problem with AI, as these tools are often not mature enough and may not deliver the expected results.
I'm glad you made it to the end of the article! I’d like to ask a favor: please share in the comments which tasks and use cases in your daily routine could benefit from some level of autonomy driven by AI-augmented testing tools. Feel free to share your insights and perspective so we can all learn from our collective knowledge.
Cristiano Caetano is an entrepreneur and product expert with extensive experience in software testing, B2B SaaS, and marketplaces. Founder of Zephyr Scale, the top-selling app in the Atlassian ecosystem, he is now the VP of Product Management at Katalon, where he continues to drive innovation in the tech space.