White Box Testing: All You Need To Know
White box testing is a testing method where testers evaluate the quality of a system with full knowledge of its internal structures. Here, the testers have access to the system's source code and understand how it operates internally. They know not only what the software does but also how it achieves those results.
In this article, we’ll learn more about white box testing in-depth, the common techniques used, and white box testing best practices.
The Nature of White Box Testing
White box testing is a crucial testing technique because it examines the system from the developer's perspective. It requires testers to look inside the software's "white box" to understand its inner workings.
Testers can then verify that all internal operations function as expected and identify any security vulnerabilities, logic errors, or gaps in code coverage. This method ensures that the system performs not only its intended functions but does so efficiently and securely.
White box testing is the opposite of black box testing. With black box testing, testers do not need to know the internal working of the system. With white box testing, testers need to understand the internal workings. This is the opacity of the system.
Read More: What is Black Box Testing? Definition, Techniques, Best Practices
Here's a quick comparison of the 3 approaches:
Feature | Black Box Testing | White Box Testing | Grey Box Testing |
Definition | Testing without knowing the internal code structure. Focuses on input-output and functionality. | Testing with full knowledge of the internal code structure. Focuses on code logic and coverage. | Testing with partial knowledge of the internal code structure. Combines elements of both black and white box testing. |
Testers' Knowledge | No knowledge of the internal workings of the software. | Complete knowledge of the internal workings of the software. | Partial knowledge of the internal workings of the software. |
Focus | Functionality, user interface, and user experience. | Code logic, paths, branches, and internal structures. | Both functionality and internal structures, with a focus on integration and interactions. |
Advantages | - Identifies missing functionalities. - User-oriented. - No need for programming knowledge. | - Detailed and thorough testing. - High code coverage. - Optimizes code. | - Balanced approach. - More effective in finding defects in integrated systems. - Better coverage than black box testing alone. |
Disadvantages | - Limited coverage of code paths. - Can miss logical errors in the code. | - Time-consuming. - Requires programming skills. - May not reflect user perspective. | - Requires both functional and code knowledge. - Can be complex to design. - May not be as thorough as pure white box testing. |
Typical Testers | Testers, QA engineers, end-users. | Developers, QA engineers with programming skills. | Testers with some programming knowledge, developers. |
Testing Techniques | Equivalence partitioning, boundary value analysis, decision table testing. | Statement coverage, branch coverage, path coverage, condition coverage. | Matrix testing, regression testing, pattern testing, orthogonal array testing. |
Tools Used | Selenium, QTP, LoadRunner, TestComplete. | JUnit, NUnit, CUnit, Emma, Clover, SonarQube. | Selenium, QTP, Rational Functional Tester, integration tools. |
Use Cases | Acceptance testing, system testing, functional testing. | Unit testing, integration testing, security testing, code optimization. | Integration testing, penetration testing, system testing. |
White Box Testing Process
The general white box testing process includes 7 steps:
- Analysis of the SUT’s implementation: testers launch a thorough analysis of the software’s source code, architecture, and internal workings. They can work with the dev team or product team to gather information.
- Identification of paths through the SUT: once they have analyzed the code, they need to identify all possible paths that data can take through the application. This requires them to understand the branches, loops, and any conditional statements that can affect the execution flow.
- Path Sensitization: now testers chose specific inputs designed to traverse the identified paths. The goal is to ensure that each path is executed at least once during the process.
- Determination of Expected Results: for each selected input, testers can determine what the expected outcomes should be based on code implementation.
- Execution of Tests: the chosen inputs are then fed into the SUT to be executed. The paths identified earlier are traversed by the test cases. This allows testers to monitor how the system behaves internally.
- Comparison of Actual and Expected Outputs: the actual output produced by the SUT is compared with the expected output determined in the previous step. Any discrepancies are noted since they can indicate bugs.
- Determination of SUT’s proper functioning: based on the comparison, testers can decide whether the software is functioning properly or not.
Advantages of White Box Testing
The key benefits of white box testing include:
- Code-centric: white-box testing allows for a thorough examination of internal structures, logic, and code paths, giving testers a much deeper understanding of the system as compared to black box testing where they merely “scratch the surface”.
- Granularity: since testers now have complete knowledge of the system, they can tailor the test cases specifically to the internal logic. This leads to higher code coverage, including edge cases.
- Optimization: going beyond just “finding the bug”, testers can now join the optimization process where they give recommendations to eliminate redundant code.
- Automation: white-box testing opens up opportunities for automation testing, saving a lot of time and resources.
Disadvantages of White Box Testing
- Large number of execution paths: for more complex systems, the number of execution paths can be astronomically huge. Attempting to test all of those paths takes up a lot of time and resources.
- Control flow assumption: as testers go into white box testing, they generally assume that the control flow is already correctly designed. They don’t want to focus on validating these flows, but bugs can totally arise from that.
- Non-existent paths: since white box testing focuses on the paths that are present in the code, it cannot identify paths that are missing altogether. For example, if there is a logical condition that is never checked or a scenario that is never handled in the code, white box testing won't reveal this absence because it only tests what is already there.
- Technical skills: testers must have adequate knowledge and technical expertise to do white box testing properly.
White Box Testing Techniques
1. Control Flow Testing
Control flow testing is a white-box testing technique that focuses on creating and executing test cases to cover pre-identified execution paths through the program code.
While indeed control flow testing grants testers a high degree of thoroughness, it has some drawbacks:
- The number of paths can reach huge figures. Modern applications are usually complex enough to render an attempt at exhaustive testing of all control flow paths quite impossible.
- White box testing only works with implemented paths. If a path is missing, it will never be found. For example, here we want to calculate the grade based on the score. The code here misses the path for scores below 60.
def calculate_grade(score):
if score >= 90: grade = 'A'
elif score >= 80: grade = 'B'
elif score >= 70: grade = 'C'
elif score >= 60: grade = 'D'
# Missing implementation path for scores below 60 return grade
return grade
- There can also be errors within the code despite the correct flow. For example, here we want to calculate the inventory when an order is successfully dispatched. The flow is correct, but current_inventory is supposed to decrease (- 1) instead of (+ 1).
def calculate_inventory(order_dispatched, current_inventory):
if order_dispatched:
return current_inventory + 1
To do control flow testing, there are several key concepts you need to understand the Control Flow Graph.
A Control Flow Graph (CFG) is a graphical representation of all the paths that might be traversed through a program during its execution. It is used extensively in control flow testing to analyze and understand the program’s structure.
A control flow graph has 3 major components:
1. Node: these are individual statements or blocks of code. For example, in the code snippet below we have 3 nodes:
int a = 0; // Node 1
if (b > 0) { // Node 2
a = b; // Node 3
}
2. Edge: this is the flow of control from one node to another. It indicates the execution flow of the program. There are 2 types of edges:
- Unconditional edge: direct flow from one statement to another without any condition
- Conditional edge: these are branches based on a condition (e.g. True/False outcomes of an if-statement)
Here we have a conditional edge from statement ‘if (b > 0)’ to statement ‘a=b’.
int a = 0;
if (b > 0) {
a = b;
}
3. Entry/Exit Points: these represent the start and end of a program in the CFG.
Let’s look at another example.
void exampleFunction(int x, int y) {
if (x > 0) {
if (y > 0) {
printf("Both x and y are positive.\n");
} else {
printf("x is positive, y is non-positive.\n");
}
} else {
printf("x is non-positive.\n");
}
}
Here we have:
- 7 nodes (1 entry point, 1 exit point, 5 decisions)
- 1: Entry point of exampleFunction.
- 2: if (x > 0).
- 3: if (y > 0).
- 4: printf("Both x and y are positive.\n");.
- 5: printf("x is positive, y is non-positive.\n");.
- 6: printf("x is non-positive.\n");.
- 7: Exit point of exampleFunction.
- Edges:
- (1 -> 2)
- (2 -> 3) if x > 0
- (2 -> 6) if x <= 0
- (3 -> 4) if y > 0
- (3 -> 5) if y <= 0
- (4 -> 7)
- (5 -> 7)
- (6 -> 7)
There are 3 execution paths in total for this CFG to cover all 3 possible outcomes. Here’s the CFG for the code:
Here's how you can find out all the necessary paths for Control Flow testing:
- Identify where the module begins executing (entry point) and where it completes its execution or returns control (exit point).
- Start by tracing the leftmost path through the module from the entry point to the exit point. This path follows the sequence of statements and branches as defined in the code.
- Once you've traced the leftmost path, return to the entry point and vary the first branching condition. This means taking the alternative path(s) that were not taken in the leftmost path due to branching conditions (e.g., if-else statements, switch-case statements).
- Repeat this process for each subsequent branching condition in the module. For example, after varying the first branching condition, vary the second branching condition, then the third, and so on, until all possible control flow paths (or significant paths) through the module are covered.
- List down each distinct path taken through the module, including combinations of different branching conditions.
Of course, in practice, 100% coverage can be difficult to achieve. There can be code that is only executed in exceptional circumstances. Such type of code can often be found in try-catch blocks or conditional statements checking for edge cases. Here are some examples:
- A disk read/write operation fails due to a bad sector on a hard drive.
- Intermittent network connectivity or complete loss of network connection during a critical transaction.
- Deadlocks or race conditions in multi-threaded applications.
- Gradual memory leak leading to out-of-memory conditions after extended usage.
1.2. Levels of Coverage
There are 8 levels (Level 0 - Level 7) of coverage in White box testing. Here is a quick table of comparison.
Level | Coverage Level | Description | Advantages | Disadvantages |
0 | Statement Coverage | Ensures each statement is executed | Simple to measure, basic confidence | May miss logical paths |
1 | Branch Coverage | Ensures each branch is executed | Tests decision points | May miss condition combinations |
2 | Condition Coverage | Ensures each condition is tested | Tests each part of conditions | May miss some combinations of conditions |
3 | Multiple Condition Coverage | Ensures all condition combinations are tested | Most thorough condition testing | Complex and time-consuming |
4 | Path Coverage | Ensures all control flow paths are executed | High confidence in code correctness | Infeasible for complex code |
5 | Function Coverage | Ensures each function is called | Simple to measure, ensures functions are invoked | Does not test internal function paths |
6 | Loop Coverage | Ensures loops are executed with different iteration counts | Targets loop-related errors | May not be comprehensive for very complex loops |
7 | Data Flow Coverage | Ensures all variable definitions and uses are tested | Focuses on variable lifecycle, catches data-related errors | Complex to track in large codebases |
2. Structured Testing (Basic Path Testing)
Structured testing or basic path testing is a white box testing technique aimed to identify and test all independent paths within the software. The goal is to ensure that all possible execution paths in a program are tested at least once.
A typical structured testing process happens in the following steps:
- Derive the control flow graph from the software module.
- Compute the graph's Cyclomatic Complexity (C).
- Select a set of C basis paths.
- Create a test case for each basis path.
- Execute these tests.
Cyclomatic Complexity is a software metric used to measure the complexity of a program's control flow. It was developed by Thomas J. McCabe, Sr. in 1976 and is a key indicator of the number of linearly independent paths through a program's source code. Here’s the formula:
C = Edges - Nodes + 2
Let’s look at this example once more. There are 8 edges (8 arrows) and 7 nodes. We have a Cyclomatic Complexity of 8 - 7 + 2 = 3. This means there are 3 linearly independent paths through the program.
Interpretation of Cyclomatic Complexity:
- C = 1-10: Simple program, low risk, straightforward to test and maintain.
- C = 11-20: Moderate complexity, requires more careful testing and review.
- C = 21-50: High complexity, increased risk of errors, needs thorough testing and documentation.
- C > 50: Very high complexity, high risk, challenging to test and maintain, likely needs refactoring.
We can now identify the independent paths through the system:
- Path 1: If x > 0 → if y > 0 → Both x and y are positive
- Path 2: If x > 0 → if y < 0 → x is positive, y is non-positive
- Path 3: If x < 0 → x is non-positive
From this, we can start designing the test cases:
Test case 1:
- Inputs: x = 5, y = 2
- Expected Output: Both x and y are positive
Test case 2:
- Inputs: x = 5, y = -2
- Expected Output: x is positive, y is non-positive
Test case 3:
- Inputs: x = -2
- Expected Output: x is non-positive
Control flow testing is particularly effective in finding logical errors and ensuring all conditions are tested. However, testers need to understand the code to identify paths and create appropriate test cases. Generating and executing test cases for all paths can also be labor-intensive, especially in complex systems with numerous control flow branches.
3. Data Flow Testing
Data flow testing is a white-box testing technique that focuses on the flow of data within a program and its use and definition at different points in the program's execution. The primary goal of data flow testing is to identify and test the paths that data values take through the code, ensuring that data is correctly defined, used, and released.
Here are the data flow anomalies that data flow testing usually target:
Use of Undefined Variable (DU Pair)
Here the variable x is used in the if condition without being defined.
#include <stdio.h>
int main() {
int x;
if (x == 42) { // Use of undefined variable x
printf("x is 42\n");
}
return 0;
}
Use-Use Without Redefinition (UU Pair)
Here a variable is used multiple times without being redefined when a change was expected between uses.
#include <stdio.h>
void updateValue(int *val) {
*val = 100; // Update the value
}
int main() {
int z = 10; // Initial definition
printf("%d\n", z); // First use
// Forgot to update z here
printf("%d\n", z); // Second use without redefinition
return 0;
}
Conditional Definition and Use
Here a variable is conditionally defined but used unconditionally.
#include <stdio.h>
int main() {
int x;
if (some_condition) {
x = 42; // Conditional definition
}
printf("%d\n", x); // Unconditional use
return 0;
}
Another important concept to understand is that there are 3 stages of a variable:
- Definition (d): Variables are defined when they are declared or first assigned a value.
- Use (u): Variables are used in computations or conditionals.
- Kill (k): Variables are destroyed when they go out of scope or when the program ends.
These annotations can be used to construct data flow graphs that visually represent how variables are manipulated throughout the program's execution paths.
There are 2 approaches to Data flow testing:
1. Static data flow testing
Static data flow testing is a method used in software testing to analyze the flow of data through a program's source code without executing it. We will create these diagrams and check that the define-use-kill patterns are correct. After that, we do a static analysis, which means examining the diagram either formally through inspections or informally by reviewing it. Next, we will perform dynamic tests on the module by creating and running test cases.
Here are the data flow anomalies to consider:
- dd (Defined and Defined again): Redefining a variable without using it in between.
- du (Defined and Used): Proper and expected usage where a variable is defined before its use.
- dk (Defined and Killed): Defining a variable but never using it.
- ud (Used and Defined): Using a variable and then redefining it, potentially for resetting its value.
- uu (Used and Used again): Multiple uses of a variable without redefinition.
- uk (Used and Killed): Using a variable and then destroying it, indicating proper cleanup.
- kd (Killed and Defined): Destroying a variable and then redefining it, possibly for re-initialization.
- ku (Killed and Used): Using a variable after it has been destroyed, which is usually an error.
- kk (Killed and Killed again): Redundant destruction of a variable, indicating potential logic flaws.
2. Dynamic data flow testing
Dynamic data flow testing is a white-box testing method that checks how data moves and changes while the program runs. It tracks data as it flows through different parts of the program, helping to find errors that happen during execution, like when variables aren't properly set or when data is handled incorrectly. This type of testing is crucial for making sure that the program handles data correctly under different conditions and that it works as expected in real-world use.
Here's a quick comparison of the 2 approaches:
Aspect | Dynamic Data Flow Testing | Static Data Flow Testing |
Focus | Actual data values and transformations during runtime. | Potential data paths and dependencies without execution. |
Techniques | Execution tracing, input generation, dynamic analysis tools. | Static code analysis, control flow analysis, data flow analysis. |
Objectives | Validate data handling logic, detect runtime errors. | Identify issues like uninitialized variables, data misuse. |
Benefits | Insights into actual program behavior, early error detection. | Early issue detection, code quality improvement. |
Challenges | Execution overhead, non-deterministic behaviors. | Limited dynamic behavior coverage, potential false positives. |
Suitability | Best for detecting runtime errors, dynamic behavior analysis. | Best for early-stage code quality checks, static dependency analysis. |
Conclusion
Data flow testing builds upon and extends control flow testing methods. Similar to control flow testing, it is essential for testing modules of code that cannot be adequately verified through reviews and inspections alone. However, data flow testing requires testers to possess sufficient programming skills to comprehend the code, its control flow, and its variables. Like control flow testing, data flow testing can be highly time-consuming due to the numerous modules, paths, and variables involved in a system.