Estimating Effort of Test Automation Projects

Written by Dr Vu Nguyen | Aug 16, 2017 3:25:18 AM

Software test estimation is a process that determines testing effort, schedule, staff, and other related metrics for software projects. It is a crucial practice that managers should master to fulfill their job’s responsibilities as they need to produce accurate and reasonable estimates for the success of their projects. Business executives need reliable cost estimates to make investment decisions. Project managers need effort, schedule, and staff estimates for allocating and planning resources and making decisions about testing tools, strategies, and approaches.

This article describes a method for estimating size and effort of test automation. This method is based on the qEstimation process introduced by Nguyen et al. [1]. It is also based on our experience in estimating time of test automation projects using Katalon Studio, a tool for generating and executing automated tests of web and mobile applications.

The first step in our method is to estimate software testing size using Test Case Point Analysis (TCPA). The estimated size in TCPA is then used to compute the effort using a simple historical productivity ratio. Effort estimates for future test cycles are then refined, which are expected to be more accurate, using a closed-loop feedback. These steps are shown in the figure below.

1. Estimate Testing Size Using TCPA

Test plan, test design, test cases, test procedures, and test reports are the main outputs of software testing activities [2]. Of these, test cases are the most important artifact serving as a basis for software testing. Each test case usually includes inputs (test data), expected outputs, execution conditions (precondition), test procedures to be performed, pass/fail criteria, and results of a test item. Thus, the TCPA procedure uses test cases as the main input for measuring testing size. The size unit is Test Case Point (TCP).

TCP is measured using four elements: checkpoint, precondition, test data, and type of test cases. The first three elements reflect the largeness of test cases, and the last element takes into account the complexity differences among various types of test.

The checkpoint is the condition in which the tester verifies whether the result produced by the target function matches the expected criterion. One test case consists of one or more checkpoints. One checkpoint is counted as one Test Case Point.

The test case’s precondition specifies the condition to execute test cases, including environment and tool setups needed to generate and execute the test cases. Some preconditions may be related to data prepared for the test case. The ratings for test case’s precondition include None, Low, Medium, and High. Each rating level for precondition is assigned a TCP count.

Rating level	Number of TCP	Description
None	0	The precondition is not applicable or important to execute the test case. Or, the precondition is just reused from the previous test case to continue the current test case.
Low	1	The condition for executing the test case is available with some simple modifications required. Or, some simple setting-up steps are needed.
Medium	3	Some explicit preparations are needed to execute the test case. The condition for executing is available with some additional modifications required. Or, some additional setting-up steps are needed.
High	5	Heavy hardware and/or software configurations are needed to execute the test case.

Test data is used to execute test cases. It can be generated at the execution time, sourced from previous tests, or generated by test scripts. Test data is either test case specific, or general to a group of test cases, or for the whole system. In the latter cases, the data can be reused in multiple test cases. The ratings include None, Low, Medium, and High. Each rating level is assigned a number of TCP.

Rating level	Number of TCP	Description
None	0	No test data preparation is needed.
Low	1	Test data is needed, but it is simple so that it can be created during the test case execution time. Or, the test case uses a slightly modified version of the existing test data, i.e., little effort required to modify the test data.
Medium	3	Test data is deliberately prepared in advance with extra effort to ensure its completeness, comprehensiveness, and consistency.
High	6	Test data is prepared in advance with considerable effort to ensure its completeness, comprehensiveness, and consistency or by using support tools to generate and database to store and manage. Scripts may be required to generate test data.

UTCP for a test case = Checkpoint TCP + Precondition TCP + Test data TCP

This TCP count is considered unadjusted TCP or UTCP as it does not account for complexity differences in various types of test.

In practice, it is sometimes difficult to count checkpoints precisely, especially for complex or unclear test cases. If so, the team can estimate TCP using a consensus-based approach like Planning Poker and Wideband Delphi.

Adjust TCP for Automated Scripting

It is possible that test scripts written for a test case may be reused in other test cases, and they may also reuse existing test scripts. When test scripts are developed for reuse, they require extra effort to design, implement, and validate. On the other hand, test scripts reusing others are expected to take less effort to develop. To account for these effects, the Developed for Reused and Reused Existing factors are used to adjust TCP counts for test cases. The ratings for these factors are specified as follows:

Rating Levels for Developed for Reuse:

Rating level	Extra TCP count	Description
None	0	Test script is specific, and nothing may be from this test script. Or, this test script reuses most of existing scripts.
Low	0.25	Parts (around 25%) of the script can be reused.
Medium	0.5	Half of the script can be reused.
High	0.75	Around 75% of the script can be reused.
Extra High	1	Almost all of the script can be reused in other scripts.

Rating Levels for Reused Existing:

Rating level	TCP discount	Description
None	0	Nothing is reused from other scripts. This test case/script is the first of the test area.
Low	0.25	Parts (around 25%) of the script are reused from other scripts.
Medium	0.5	Half of the script is reused from other scripts.
High	0.75	Around 75% of the script is reused from the scripts.
Extra High	1	Almost all of the script is reused from other scripts.

Adjusted TCP for a test case = UTCP + (DevelopedForReused – ReusedExisting) * UTCP

2. Estimate Testing Effort

Software testing effort is driven by the size discussed above. However, it is also influenced by other factors, such as:

Capability and experience of the testing team.
Experience with testing tools, frameworks, and environments.
Systems under test.
Types of test, e.g. desktop, web, mobile, or API testing.

It is very challenging to measure these effects to drive effort. However, it is reasonable to assume that the effects of these factors do not change much from one test cycle to another within the same project. Based on this assumption, one can estimate testing effort for a cycle, a set of test cases, or even a project if a historical productivity or size/effort ratio is available. One can also perform testing activities on a sample of test cases, collect actual effort and compute a productivity ratio

(Effort = TCP count / productivity ratio).

The productivity ratio differs from one project to another as each has different characteristics. It may change from one test cycle to another as the testing team may gain more experience and testing activities are more stable. However, as a reference, our experience with automated testing projects using Katalon Studio indicates that the productivity ratio ranges from 1.5 person-hours/TCP to 2.5 person-hours/TCP, averaging 2.0 person-hours/TCP.

We hope this post to serve as a reference for Katalon Studio users to estimate and compare test automation effort using Katalon Studio and other test automation tools.

References:

[1] Nguyen, Vu, Vu Pham, and Vu Lam. “qEstimation: a process for estimating size and effort of software testing.” Proceedings of the 2013 International Conference on Software and System Process. ACM, 2013.

[2] IEEE 829-1998, “IEEE Standard for Software Test Documentation.” 1998.

Trigger modal

View full post