White box testing is a testing method where testers evaluate the quality of a system with full knowledge of its internal structures. Here, the testers have access to the system's source code and understand how it operates internally. They know not only what the software does but also how it achieves those results.
In this article, we’ll learn more about white box testing in-depth, the common techniques used, and white box testing best practices.
White box testing is a crucial testing technique because it examines the system from the developer's perspective. It requires testers to look inside the software's "white box" to understand its inner workings.
Testers can then verify that all internal operations function as expected and identify any security vulnerabilities, logic errors, or gaps in code coverage. This method ensures that the system performs not only its intended functions but does so efficiently and securely.
White box testing is the opposite of black box testing. With black box testing, testers do not need to know the internal working of the system. With white box testing, testers need to understand the internal workings. This is the opacity of the system.
Read More: What is Black Box Testing? Definition, Techniques, Best Practices
Here's a quick comparison of the 3 approaches:
Feature | Black Box Testing | White Box Testing | Grey Box Testing |
Definition | Testing without knowing the internal code structure. Focuses on input-output and functionality. | Testing with full knowledge of the internal code structure. Focuses on code logic and coverage. | Testing with partial knowledge of the internal code structure. Combines elements of both black and white box testing. |
Testers' Knowledge | No knowledge of the internal workings of the software. | Complete knowledge of the internal workings of the software. | Partial knowledge of the internal workings of the software. |
Focus | Functionality, user interface, and user experience. | Code logic, paths, branches, and internal structures. | Both functionality and internal structures, with a focus on integration and interactions. |
Advantages | - Identifies missing functionalities. - User-oriented. - No need for programming knowledge. | - Detailed and thorough testing. - High code coverage. - Optimizes code. | - Balanced approach. - More effective in finding defects in integrated systems. - Better coverage than black box testing alone. |
Disadvantages | - Limited coverage of code paths. - Can miss logical errors in the code. | - Time-consuming. - Requires programming skills. - May not reflect user perspective. | - Requires both functional and code knowledge. - Can be complex to design. - May not be as thorough as pure white box testing. |
Typical Testers | Testers, QA engineers, end-users. | Developers, QA engineers with programming skills. | Testers with some programming knowledge, developers. |
Testing Techniques | Equivalence partitioning, boundary value analysis, decision table testing. | Statement coverage, branch coverage, path coverage, condition coverage. | Matrix testing, regression testing, pattern testing, orthogonal array testing. |
Tools Used | Selenium, QTP, LoadRunner, TestComplete. | JUnit, NUnit, CUnit, Emma, Clover, SonarQube. | Selenium, QTP, Rational Functional Tester, integration tools. |
Use Cases | Acceptance testing, system testing, functional testing. | Unit testing, integration testing, security testing, code optimization. | Integration testing, penetration testing, system testing. |
The general white box testing process includes 7 steps:
The key benefits of white box testing include:
Control flow testing is a white-box testing technique that focuses on creating and executing test cases to cover pre-identified execution paths through the program code.
While indeed control flow testing grants testers a high degree of thoroughness, it has some drawbacks:
def calculate_grade(score):
if score >= 90: grade = 'A'
elif score >= 80: grade = 'B'
elif score >= 70: grade = 'C'
elif score >= 60: grade = 'D'
# Missing implementation path for scores below 60 return grade
return grade
def calculate_inventory(order_dispatched, current_inventory):
if order_dispatched:
return current_inventory + 1
To do control flow testing, there are several key concepts you need to understand the Control Flow Graph.
A Control Flow Graph (CFG) is a graphical representation of all the paths that might be traversed through a program during its execution. It is used extensively in control flow testing to analyze and understand the program’s structure.
A control flow graph has 3 major components:
1. Node: these are individual statements or blocks of code. For example, in the code snippet below we have 3 nodes:
int a = 0; // Node 1
if (b > 0) { // Node 2
a = b; // Node 3
}
2. Edge: this is the flow of control from one node to another. It indicates the execution flow of the program. There are 2 types of edges:
Here we have a conditional edge from statement ‘if (b > 0)’ to statement ‘a=b’.
int a = 0;
if (b > 0) {
a = b;
}
3. Entry/Exit Points: these represent the start and end of a program in the CFG.
Let’s look at another example.
void exampleFunction(int x, int y) {
if (x > 0) {
if (y > 0) {
printf("Both x and y are positive.\n");
} else {
printf("x is positive, y is non-positive.\n");
}
} else {
printf("x is non-positive.\n");
}
}
Here we have:
There are 3 execution paths in total for this CFG to cover all 3 possible outcomes. Here’s the CFG for the code:
Here's how you can find out all the necessary paths for Control Flow testing:
Of course, in practice, 100% coverage can be difficult to achieve. There can be code that is only executed in exceptional circumstances. Such type of code can often be found in try-catch blocks or conditional statements checking for edge cases. Here are some examples:
1.2. Levels of Coverage
There are 8 levels (Level 0 - Level 7) of coverage in White box testing. Here is a quick table of comparison.
Level | Coverage Level | Description | Advantages | Disadvantages |
0 | Statement Coverage | Ensures each statement is executed | Simple to measure, basic confidence | May miss logical paths |
1 | Branch Coverage | Ensures each branch is executed | Tests decision points | May miss condition combinations |
2 | Condition Coverage | Ensures each condition is tested | Tests each part of conditions | May miss some combinations of conditions |
3 | Multiple Condition Coverage | Ensures all condition combinations are tested | Most thorough condition testing | Complex and time-consuming |
4 | Path Coverage | Ensures all control flow paths are executed | High confidence in code correctness | Infeasible for complex code |
5 | Function Coverage | Ensures each function is called | Simple to measure, ensures functions are invoked | Does not test internal function paths |
6 | Loop Coverage | Ensures loops are executed with different iteration counts | Targets loop-related errors | May not be comprehensive for very complex loops |
7 | Data Flow Coverage | Ensures all variable definitions and uses are tested | Focuses on variable lifecycle, catches data-related errors | Complex to track in large codebases |
Structured testing or basic path testing is a white box testing technique aimed to identify and test all independent paths within the software. The goal is to ensure that all possible execution paths in a program are tested at least once.
A typical structured testing process happens in the following steps:
Cyclomatic Complexity is a software metric used to measure the complexity of a program's control flow. It was developed by Thomas J. McCabe, Sr. in 1976 and is a key indicator of the number of linearly independent paths through a program's source code. Here’s the formula:
C = Edges - Nodes + 2
Let’s look at this example once more. There are 8 edges (8 arrows) and 7 nodes. We have a Cyclomatic Complexity of 8 - 7 + 2 = 3. This means there are 3 linearly independent paths through the program.
Interpretation of Cyclomatic Complexity:
We can now identify the independent paths through the system:
From this, we can start designing the test cases:
Test case 1:
Test case 2:
Test case 3:
Control flow testing is particularly effective in finding logical errors and ensuring all conditions are tested. However, testers need to understand the code to identify paths and create appropriate test cases. Generating and executing test cases for all paths can also be labor-intensive, especially in complex systems with numerous control flow branches.
Data flow testing is a white-box testing technique that focuses on the flow of data within a program and its use and definition at different points in the program's execution. The primary goal of data flow testing is to identify and test the paths that data values take through the code, ensuring that data is correctly defined, used, and released.
Here are the data flow anomalies that data flow testing usually target:
Here the variable x is used in the if condition without being defined.
#include <stdio.h>
int main() {
int x;
if (x == 42) { // Use of undefined variable x
printf("x is 42\n");
}
return 0;
}
Here a variable is used multiple times without being redefined when a change was expected between uses.
#include <stdio.h>
void updateValue(int *val) {
*val = 100; // Update the value
}
int main() {
int z = 10; // Initial definition
printf("%d\n", z); // First use
// Forgot to update z here
printf("%d\n", z); // Second use without redefinition
return 0;
}
Here a variable is conditionally defined but used unconditionally.
#include <stdio.h>
int main() {
int x;
if (some_condition) {
x = 42; // Conditional definition
}
printf("%d\n", x); // Unconditional use
return 0;
}
Another important concept to understand is that there are 3 stages of a variable:
These annotations can be used to construct data flow graphs that visually represent how variables are manipulated throughout the program's execution paths.
There are 2 approaches to Data flow testing:
1. Static data flow testing
Static data flow testing is a method used in software testing to analyze the flow of data through a program's source code without executing it. We will create these diagrams and check that the define-use-kill patterns are correct. After that, we do a static analysis, which means examining the diagram either formally through inspections or informally by reviewing it. Next, we will perform dynamic tests on the module by creating and running test cases.
Here are the data flow anomalies to consider:
2. Dynamic data flow testing
Dynamic data flow testing is a white-box testing method that checks how data moves and changes while the program runs. It tracks data as it flows through different parts of the program, helping to find errors that happen during execution, like when variables aren't properly set or when data is handled incorrectly. This type of testing is crucial for making sure that the program handles data correctly under different conditions and that it works as expected in real-world use.
Here's a quick comparison of the 2 approaches:
Aspect | Dynamic Data Flow Testing | Static Data Flow Testing |
Focus | Actual data values and transformations during runtime. | Potential data paths and dependencies without execution. |
Techniques | Execution tracing, input generation, dynamic analysis tools. | Static code analysis, control flow analysis, data flow analysis. |
Objectives | Validate data handling logic, detect runtime errors. | Identify issues like uninitialized variables, data misuse. |
Benefits | Insights into actual program behavior, early error detection. | Early issue detection, code quality improvement. |
Challenges | Execution overhead, non-deterministic behaviors. | Limited dynamic behavior coverage, potential false positives. |
Suitability | Best for detecting runtime errors, dynamic behavior analysis. | Best for early-stage code quality checks, static dependency analysis. |
Data flow testing builds upon and extends control flow testing methods. Similar to control flow testing, it is essential for testing modules of code that cannot be adequately verified through reviews and inspections alone. However, data flow testing requires testers to possess sufficient programming skills to comprehend the code, its control flow, and its variables. Like control flow testing, data flow testing can be highly time-consuming due to the numerous modules, paths, and variables involved in a system.