Table of Contents

White Box Testing: All You Need To Know

White box testing is a testing method where testers evaluate the quality of a system with full knowledge of its internal structures. Here, the testers have access to the system's source code and understand how it operates internally. They know not only what the software does but also how it achieves those results.

In this article, we’ll learn more about white box testing in-depth, the common techniques used, and white box testing best practices.

What is White Box Testing?

black box testing vs white box testing vs grey box testing

White box testing is a crucial testing technique because it examines the system from the developer's perspective. It requires testers to look inside the software's "white box" to understand its inner workings.

Testers can then verify that all internal operations function as expected and identify any security vulnerabilities, logic errors, or gaps in code coverage. This method ensures that the system performs not only its intended functions but does so efficiently and securely.

White Box Testing vs Black Box Testing vs Grey Box Testing

White box testing is the opposite of black box testing. With black box testing, testers do not need to know the internal working of the system. With white box testing, testers need to understand the internal workings. This is the opacity of the system.

Read More: What is Black Box Testing? Definition, Techniques, Best Practices

Here's a quick comparison of the 3 approaches:

Feature	Black Box Testing	White Box Testing	Grey Box Testing
Definition	Testing without knowing the internal code structure. Focuses on input-output and functionality.	Testing with full knowledge of the internal code structure. Focuses on code logic and coverage.	Testing with partial knowledge of the internal code structure. Combines elements of both black and white box testing.
Testers' Knowledge	No knowledge of the internal workings of the software.	Complete knowledge of the internal workings of the software.	Partial knowledge of the internal workings of the software.
Focus	Functionality, user interface, and user experience.	Code logic, paths, branches, and internal structures.	Both functionality and internal structures, with a focus on integration and interactions.
Advantages	- Identifies missing functionalities. - User-oriented. - No need for programming knowledge.	- Detailed and thorough testing. - High code coverage. - Optimizes code.	- Balanced approach. - More effective in finding defects in integrated systems. - Better coverage than black box testing alone.
Disadvantages	- Limited coverage of code paths. - Can miss logical errors in the code.	- Time-consuming. - Requires programming skills. - May not reflect user perspective.	- Requires both functional and code knowledge. - Can be complex to design. - May not be as thorough as pure white box testing.
Typical Testers	Testers, QA engineers, end-users.	Developers, QA engineers with programming skills.	Testers with some programming knowledge, developers.
Testing Techniques	Equivalence partitioning, boundary value analysis, decision table testing.	Statement coverage, branch coverage, path coverage, condition coverage.	Matrix testing, regression testing, pattern testing, orthogonal array testing.
Tools Used	Selenium, QTP, LoadRunner, TestComplete.	JUnit, NUnit, CUnit, Emma, Clover, SonarQube.	Selenium, QTP, Rational Functional Tester, integration tools.
Use Cases	Acceptance testing, system testing, functional testing.	Unit testing, integration testing, security testing, code optimization.	Integration testing, penetration testing, system testing.

Note: The traditional distinction between white-box and black-box testing is becoming less important. In modern testing, tests are derived from many different sources, not just the code or requirements. Testers now use abstract structures like:

Input spaces (possible inputs to the system)
Graphs (representing workflows or data flows)
Logical rules (to check how the system should behave)

These structures can come from source code, requirements, design models, or other documents. The key focus now is what level of abstraction the tests are designed from, not whether they are "white-box" or "black-box."

Therefore, the white-box vs. black-box distinction is less relevant in modern software testing practices.

Levels of White Box Testing

1. Unit Testing

This is the first level of testing, where individual pieces of code (called units) are tested by themselves to make sure they work correctly before being combined with other parts of the program. White-box testing at this stage helps catch mistakes early, which can prevent bigger problems later when the code is integrated into the rest of the system.

2. Integration Testing

Once individual units are tested, they are combined to see if they work together as expected. White-box testing at this stage checks how different parts of the program interact with each other. It ensures that the connections between different components work correctly, based on what the programmer knows about how they should behave.

3. Regression Testing

After updates or changes are made to the program, it’s important to make sure that nothing else broke as a result. White-box testing during regression testing reuses the same tests from unit and integration testing to ensure that the program still works as expected after changes are made.

White Box Testing Process

The general white box testing process includes 7 steps:

Analyze the SUT:
Testers examine the software’s source code, architecture, and internal workings, collaborating with the dev or product teams for more insights.
Identify Paths:
They map out all possible data paths through the system, focusing on branches, loops, and conditions that affect execution flow.
Path Sensitization:
Testers select inputs that trigger each identified path to ensure every route is tested at least once.
Determine Expected Results:
For each input, testers predict the expected outcomes based on the code.
Execute Tests:
They run the selected inputs through the system to observe how the software behaves internally.
Compare Outputs:
The actual output is compared to the expected results to spot any discrepancies.
Assess Functionality:
Based on the comparison, testers decide whether the software is working correctly.

Advantages of White Box Testing

The key benefits of white box testing include:

Code-centric: white-box testing allows for a thorough examination of internal structures, logic, and code paths, giving testers a much deeper understanding of the system as compared to black box testing where they merely “scratch the surface”.
Granularity: since testers now have complete knowledge of the system, they can tailor the test cases specifically to the internal logic. This leads to higher code coverage, including edge cases.
Optimization: going beyond just “finding the bug”, testers can now join the optimization process where they give recommendations to eliminate redundant code.
Automation: white-box testing opens up opportunities for automation testing, saving a lot of time and resources.

Disadvantages of White Box Testing

Large number of execution paths: for more complex systems, the number of execution paths can be astronomically huge. Attempting to test all of those paths takes up a lot of time and resources.
Control flow assumption: as testers go into white box testing, they generally assume that the control flow is already correctly designed. They don’t want to focus on validating these flows, but bugs can totally arise from that.
Non-existent paths: since white box testing focuses on the paths that are present in the code, it cannot identify paths that are missing altogether. For example, if there is a logical condition that is never checked or a scenario that is never handled in the code, white box testing won't reveal this absence because it only tests what is already there.
Technical skills: testers must have adequate knowledge and technical expertise to do white box testing properly.

White Box Testing Techniques

1. Control Flow Testing

1.1. Concept

Control flow testing is a white-box testing technique that focuses on creating and executing test cases to cover pre-identified execution paths through the program code.

While indeed control flow testing grants testers a high degree of thoroughness, it has some drawbacks:

The number of paths can reach huge figures. Modern applications are usually complex enough to render an attempt at exhaustive testing of all control flow paths quite impossible.
White box testing only works with implemented paths. If a path is missing, it will never be found. For example, here we want to calculate the grade based on the score. The code here misses the path for scores below 60.

def calculate_grade(score):

if score >= 90: grade = 'A'

elif score >= 80: grade = 'B'

elif score >= 70: grade = 'C'

elif score >= 60: grade = 'D'

# Missing implementation path for scores below 60 return grade

return grade

There can also be errors within the code despite the correct flow. For example, here we want to calculate the inventory when an order is successfully dispatched. The flow is correct, but current_inventory is supposed to decrease (- 1) instead of (+ 1).

def calculate_inventory(order_dispatched, current_inventory):

if order_dispatched:

return current_inventory + 1

1.2. Control Flow Graph

To do control flow testing, there are several key concepts you need to understand the Control Flow Graph.

A Control Flow Graph (CFG) is a graphical representation of all the paths that might be traversed through a program during its execution. It is used extensively in control flow testing to analyze and understand the program’s structure.

A control flow graph has 3 major components:

1. Node: these are individual statements or blocks of code. For example, in the code snippet below we have 3 nodes:

int a = 0; // Node 1

if (b > 0) { // Node 2

a = b; // Node 3

}

2. Edge: this is the flow of control from one node to another. It indicates the execution flow of the program. There are 2 types of edges:

Unconditional edge: direct flow from one statement to another without any condition
Conditional edge: these are branches based on a condition (e.g. True/False outcomes of an if-statement)

Here we have a conditional edge from statement ‘if (b > 0)’ to statement ‘a=b’.

int a = 0;

if (b > 0) {

a = b;

}

3. Entry/Exit Points: these represent the start and end of a program in the CFG.

Let’s look at another example.

void exampleFunction(int x, int y) {

    if (x > 0) {

        if (y > 0) {

            printf("Both x and y are positive.\n");

        } else {

            printf("x is positive, y is non-positive.\n");

        }

    } else {

        printf("x is non-positive.\n");

    }

}

Here we have:

7 nodes (1 entry point, 1 exit point, 5 decisions)
- 1: Entry point of exampleFunction.
- 2: if (x > 0).
- 3: if (y > 0).
- 4: printf("Both x and y are positive.\n");.
- 5: printf("x is positive, y is non-positive.\n");.
- 6: printf("x is non-positive.\n");.
- 7: Exit point of exampleFunction.
Edges:
- (1 -> 2)
- (2 -> 3) if x > 0
- (2 -> 6) if x <= 0
- (3 -> 4) if y > 0
- (3 -> 5) if y <= 0
- (4 -> 7)
- (5 -> 7)
- (6 -> 7)

There are 3 execution paths in total for this CFG to cover all 3 possible outcomes. Here’s the CFG for the code: Control Flow Graph

Here's how you can find out all the necessary paths for Control Flow testing:

Identify where the module begins executing (entry point) and where it completes its execution or returns control (exit point).
Start by tracing the leftmost path through the module from the entry point to the exit point. This path follows the sequence of statements and branches as defined in the code.
Once you've traced the leftmost path, return to the entry point and vary the first branching condition. This means taking the alternative path(s) that were not taken in the leftmost path due to branching conditions (e.g., if-else statements, switch-case statements).
Repeat this process for each subsequent branching condition in the module. For example, after varying the first branching condition, vary the second branching condition, then the third, and so on, until all possible control flow paths (or significant paths) through the module are covered.
List down each distinct path taken through the module, including combinations of different branching conditions.

Of course, in practice, 100% coverage can be difficult to achieve. There can be code that is only executed in exceptional circumstances. Such type of code can often be found in try-catch blocks or conditional statements checking for edge cases. Here are some examples:

A disk read/write operation fails due to a bad sector on a hard drive.
Intermittent network connectivity or complete loss of network connection during a critical transaction.
Deadlocks or race conditions in multi-threaded applications.
Gradual memory leak leading to out-of-memory conditions after extended usage.

1.3. Levels of Coverage

There are 8 levels (Level 0 - Level 7) of coverage in White box testing. Here is a quick table of comparison.

Level	Coverage Level	Description	Advantages	Disadvantages
0	Statement Coverage	Ensures each statement is executed	Simple to measure, basic confidence	May miss logical paths
1	Branch Coverage	Ensures each branch is executed	Tests decision points	May miss condition combinations
2	Condition Coverage	Ensures each condition is tested	Tests each part of conditions	May miss some combinations of conditions
3	Multiple Condition Coverage	Ensures all condition combinations are tested	Most thorough condition testing	Complex and time-consuming
4	Path Coverage	Ensures all control flow paths are executed	High confidence in code correctness	Infeasible for complex code
5	Function Coverage	Ensures each function is called	Simple to measure, ensures functions are invoked	Does not test internal function paths
6	Loop Coverage	Ensures loops are executed with different iteration counts	Targets loop-related errors	May not be comprehensive for very complex loops
7	Data Flow Coverage	Ensures all variable definitions and uses are tested	Focuses on variable lifecycle, catches data-related errors	Complex to track in large codebases

2. Structured Testing (Basic Path Testing)

Structured testing or basic path testing is a white box testing technique aimed to identify and test all independent paths within the software. The goal is to ensure that all possible execution paths in a program are tested at least once.

A typical structured testing process happens in the following steps:

Derive the control flow graph from the software module.
Compute the graph's Cyclomatic Complexity (C).
Select a set of C basis paths.
Create a test case for each basis path.
Execute these tests.

Cyclomatic Complexity is a software metric used to measure the complexity of a program's control flow. It was developed by Thomas J. McCabe, Sr. in 1976 and is a key indicator of the number of linearly independent paths through a program's source code. Here’s the formula:

C = Edges - Nodes + 2

Let’s look at this example once more. There are 8 edges (8 arrows) and 7 nodes. We have a Cyclomatic Complexity of 8 - 7 + 2 = 3. This means there are 3 linearly independent paths through the program.

Control Flow Graph

Interpretation of Cyclomatic Complexity:

C = 1-10: Simple program, low risk, straightforward to test and maintain.
C = 11-20: Moderate complexity, requires more careful testing and review.
C = 21-50: High complexity, increased risk of errors, needs thorough testing and documentation.
C > 50: Very high complexity, high risk, challenging to test and maintain, likely needs refactoring.

We can now identify the independent paths through the system:

Path 1: If x > 0 → if y > 0 → Both x and y are positive
Path 2: If x > 0 → if y < 0 → x is positive, y is non-positive
Path 3: If x < 0 → x is non-positive

From this, we can start designing the test cases:

Test case 1:

Inputs: x = 5, y = 2
Expected Output: Both x and y are positive

Test case 2:

Inputs: x = 5, y = -2
Expected Output: x is positive, y is non-positive

Test case 3:

Inputs: x = -2
Expected Output: x is non-positive

Control flow testing is particularly effective in finding logical errors and ensuring all conditions are tested. However, testers need to understand the code to identify paths and create appropriate test cases. Generating and executing test cases for all paths can also be labor-intensive, especially in complex systems with numerous control flow branches.

3. Data Flow Testing

White Box Testing Techniques: Data Flow Testing

Data Flow Testing is a technique used by testers to track how information moves through a program and make sure it is handled correctly. Think of it as following the life of a package:

First, the package is created (Definition).
Next, it’s used for something, like being opened (Use).
Finally, the package is thrown away (Kill).

3.1. Common Mistakes in Data Flow Testing

Here are a few common mistakes (called data flow anomalies) that testers look for:

Using Data Before It's Ready
Imagine opening a package that hasn’t arrived yet. This would be like trying to use data that hasn’t been created or given a value. It causes confusion and can crash a program.
Using the Same Data Without Updating It
Think of writing a letter, sending it, and then sending the exact same letter again without updating any information. If you’re expecting a different result, this is a mistake.
Using Data That Was Only Created Under Certain Conditions
Picture a flashlight that only works if the batteries are inside. If you assume the flashlight will work without checking the batteries, you’ll be disappointed. Similarly, if a variable is only created under specific conditions but is later used as if it always exists, that’s a problem.

These mistakes can cause bugs, crashes, or unexpected behavior in programs. Data Flow Testing helps catch these errors before they become serious issues.

Here are the data flow anomalies that data flow testing usually target:

Another important concept to understand is that there are 3 stages of a variable:

Definition (d): Variables are defined when they are declared or first assigned a value.
Use (u): Variables are used in computations or conditionals.
Kill (k): Variables are destroyed when they go out of scope or when the program ends.

These annotations can be used to construct data flow graphs that visually represent how variables are manipulated throughout the program's execution paths.

There are 2 approaches to Data flow testing:

1. Static data flow testing

Static data flow testing is a method used in software testing to analyze the flow of data through a program's source code without executing it. We will create these diagrams and check that the define-use-kill patterns are correct. After that, we do a static analysis, which means examining the diagram either formally through inspections or informally by reviewing it. Next, we will perform dynamic tests on the module by creating and running test cases.

Here are the data flow anomalies to consider:

dd (Defined and Defined again): Redefining a variable without using it in between.
du (Defined and Used): Proper and expected usage where a variable is defined before its use.
dk (Defined and Killed): Defining a variable but never using it.
ud (Used and Defined): Using a variable and then redefining it, potentially for resetting its value.
uu (Used and Used again): Multiple uses of a variable without redefinition.
uk (Used and Killed): Using a variable and then destroying it, indicating proper cleanup.
kd (Killed and Defined): Destroying a variable and then redefining it, possibly for re-initialization.
ku (Killed and Used): Using a variable after it has been destroyed, which is usually an error.
kk (Killed and Killed again): Redundant destruction of a variable, indicating potential logic flaws.

2. Dynamic data flow testing

Dynamic data flow testing is a white-box testing method that checks how data moves and changes while the program runs. It tracks data as it flows through different parts of the program, helping to find errors that happen during execution, like when variables aren't properly set or when data is handled incorrectly. This type of testing is crucial for making sure that the program handles data correctly under different conditions and that it works as expected in real-world use.

Here's a quick comparison of the 2 approaches:

Aspect	Dynamic Data Flow Testing	Static Data Flow Testing
Focus	Actual data values and transformations during runtime.	Potential data paths and dependencies without execution.
Techniques	Execution tracing, input generation, dynamic analysis tools.	Static code analysis, control flow analysis, data flow analysis.
Objectives	Validate data handling logic, detect runtime errors.	Identify issues like uninitialized variables, data misuse.
Benefits	Insights into actual program behavior, early error detection.	Early issue detection, code quality improvement.
Challenges	Execution overhead, non-deterministic behaviors.	Limited dynamic behavior coverage, potential false positives.
Suitability	Best for detecting runtime errors, dynamic behavior analysis.	Best for early-stage code quality checks, static dependency analysis.