Data Container Boundaries: where is the weakness?
All the values used by computer programs are classified by type (integer number, floating point number, string, date, etc.), and all data types are internally encoded with numbers. Digital nature of computers makes them specific and discrete with numbers; once-and-forever defined memory cells serve that purpose.
The capacity of a memory cell defines boundaries of a range while the range itself is flexible.
- One Byte data container – Byte, Unsigned Char (0 … 255), Signed Char (-128 … 127).
- Two Bytes data container – Word, Unsigned Short Integer (0 … 65535), Signed Short Integer (-32,768 … 32,767).
- Four Bytes data container – Double Word, Unsigned Integer (0 … 4,294,967,295), Signed Integer (-2,147,483,648 … 2,147,483,647).
Note that an atomic part of a memory cell (a single bit) is used to encode sign of a number.
Generally, memory cells that are to be allocated for storing values are explicitly defined in a program code by a developer (in case of compiler type programming languages) or defined by default settings (in case of interpreter type programming languages).
Now, what happens when a number doesn’t fit the boundaries?
1. Native machine code behavior: arithmetic overflow
Note. It won’t stop CPU from further execution of machine code.
Result. The memory cell will contain an uncertain but definitely wrong value.
Consequences. Any further operations with the value from the cell will produce wrong results. If further execution uses the value from the cell as a condition of some sort then the decision made most likely will be wrong.
An arithmetic overflow destroyed Ariane 5 space rocket (Flight 501).
2. Language-specific machine code behavior: wrapping around or truncation
Note. It won’t stop CPU from further execution of machine code.
- In case of wrapping around the memory cell will contain a value within the initial boundaries and relevant to the original one but definitely wrong.
Example. If Unsigned Short Integer data container is defined for the Amount on User Account, and the Balance is 65K, deposit of 1,000 will change the Balance to 465 – which won’t bring down the program but might bring down the bank.
- In case of truncation the memory cell will always contain a maximum allowed number. In other words, 10 + 1 will be 10 if truncation is in effect and 10 is the top of the range.
Consequences. Any further operations with the value from the cell will produce wrong results. If further execution uses the value from the cell as a condition of some sort then the decision made most likely will be wrong. If the value was supposed to be used for memory allocation purposes or memory addressing purposes the entire operating system could potentially be brought down.
Integer overflow vulnerability is widely (and quite effectively) used in remote attacks affecting both UNIX and Windows operating systems.
3. System exception handling behavior: recovery
Note. System exception handling prevents the wrong operation to be executed in machine code.
Result. Typically, a program generates an error report in a log file, shows to a user some kind of error message and finishes execution. More advanced programs can recover by partially or completely reloading themselves.
Consequences. Typically, the current working session gets closed and all the unsaved data are lost. Server application might suffer from memory leakage, file/disk inconsistencies, or database inconsistencies. Closed Client side application loses a connection to server that might be overtaken.
Generally, system exception handling scenarios are automatically added into program at the compilation time.
4. Business logic behavior: avoidance
Note. In accordance to business logic defined, an application avoids erroneous situations by rejecting wrong data or handling those situations following programmatically defined scenarios.
- In case of data rejection an application typically remains at the same state as it was.
- In case of error-handling scenario an application “rolls back”, typically to a preceding state (or to an idle state), and generates some kind of report.
Consequences. Typically, business logic based error-handling is desired and expected behavior of an application so there is no negative impact.
Testing the boundaries
While all the data container boundaries could be easily found in technical documentation, identifying those boundaries is always a task of great challenge for tester, especially if design specification is not available (or simply there are no design rules mandatory for all developers on the project).
Another typical reason is a drift of business requirements. For example, originally maximum transaction amount was specified as 100 and then it’s changed to 1,000. Here two boundaries were crossed: 127 (Signed Char) and 255 (Unsigned Char).
Nowadays, different wizards, code generators, as well as external libraries and frameworks, bring even more chaos as developers lose control over the code, and may not be aware of data container boundaries.
1. Search for upper boundary
Business requirements and common sense might help identifying the “starting point” of the search. If that can’t help then trying a “magical” numbers, like 128, 256, 32768, 65536 might help.
Using binary search methods we can get to the threshold faster.
2. No immediate visible reaction
An application might accept an (invalid) input and still be alright. Causing it to output the value or use the value in some operations might help revealing a problem.
Boundary testing for a single value might require a whole scenario: post transaction – run processing – verify account balance.
3. Internal transition
A single value may “travel” a pretty complicated path while being passed from one function to another inside the program. If somewhere within that chain the value won’t fit the memory cell the end-result will be wrong despite of the application accepted input. Or value could be properly stored in memory but will be messed up during saving into file or database record.
Test scenarios might be required to reveal those defects.
There are two main weaknesses in digital encoding technology: data container overflow (memory cell overflow, register overflow) and a confusion (type mismatch) during conversion. Programs have multiple layers of defense to protect these weaknesses. Sometimes protection functionalities interfere with business functionalities.
If you know that protection is missing something you can find a method to expose a defect.