Skip to content

Test Result Codes

June 23, 2009

A test result consists of the output from executing a test method which makes a specific assertion. This post concentrates solely on qualitative results which can be identified with specific codes, such as pass or fail for example.

The scope of this post consists of exploring the various test result codes used by existing test environments. These consist not only of test frameworks but also protocols and standards.

Ultimately, the objective is to define integration mechanisms to limit test result codes to a minimal subset encompassing all test environments.


The following table provides an overview of the relation between test environments and their corresponding result codes:

  Checkbox DejaGnu Greg JUnit POSIX Subunit TAP TETware
ERROR       x   x    
FAIL x x x x x x x x
FIP               x
INCOMPLETE         x      
KFAIL   x            
KPASS   x            
NOTINUSE               x
PASS x x x x x x x x
SKIP x         x x  
TODO             x  
UNINITIATED               x
UNREPORTED               x
UNRESOLVED   x x   x     x
UNSUPPORTED   x x   x     x
UNTESTED   x x   x     x
UPASS     x          
WARNING               x
XFAIL   x x     x    
XPASS   x            

Test Environments

These are the test environments from the above table with some additional details:

  • Checkbox: Framework for integrating automated and manual test suites.
  • DejaGnu: Framework for testing other programs. Its purpose is to provide a single front end for all tests. Think of it as a custom library of Tcl procedures crafted to support writing a test harness. A test harness is the testing infrastructure that is created to support a specific program or tool. Each program can have multiple testsuites, all supported by a single test harness.
  • Greg: Framework for testing other programs and libraries. Its purpose is to provide a single front end for all tests and to be a small, simple framework for writing tests. Greg leverages off the Guile language to provide all the power (and more) of other test frameworks with greater simplicity and ease of use.
  • POSIX 1003.3: Information technology — Requirements and Guidelines for Test Methods Specifications and Test Method Implementations for Measuring Conformance to POSIX Standards.
  • JUnit: Simple framework for writing and running automated tests. As a political gesture, it celebrates programmers testing their own software.
  • Subunit: Streaming protocol for test results. The protocol is human readable and easily generated and parsed. By design all the components of the protocol conceptually fit into the xUnit TestCase->TestResult interaction.
  • TAP: The Test Anything Protocol (TAP) is a protocol to allow communication between unit tests and a test harness. It allows individual tests (TAP producers) to communicate test results to the testing harness in a language-agnostic way.
  • TETware: The TETware family of tools are Test Execution Management Systems that takes care of the administration, sequencing, reporting and portability of all of the tests that you develop. This happens to be the framework used by the Linux Standard Base test suite.

Result Codes

These are the test result codes supported by the above test environments. Some of the descriptions have been taken directly from their test environment whereas others have been simplified for the more general context of this post:

  • ERROR: A test executed in an unexpected fashion; this outcome requires that a human being go over results, to determine if the test should have passed or failed.
  • FAIL: A test has produced the bug it was intended to capture. That is, it has demonstrated that the assertion is false.
  • FIP: Further information must be provided manually; this occurs when the test is unable to determine whether the test should pass or fail.
  • INCOMPLETE: The test of the assertion was unable to prove PASS but encountered no FAILs.
  • KFAIL: A bug in the implementation under test is causing a known false assertion.
  • KPASS: A test was expected to fail with KFAIL but passed instead.
  • NOTINUSE: A test might not be required in certain modes or, when there are multiple versions of the test, only one can be used.
  • PASS: A test has succeeded. That is, it demonstrated that the assertion is true.
  • SKIP: There is a test for this assertion but it was purposefully skipped.
  • TODO: A test represents a feature to be implemented or a bug to be fixed. The assertion is expected to be false.
  • UNINITIATED: The particular test in question did not start to execute.
  • UNREPORTED: A major error occurred during the test execution.
  • UNRESOLVED: A test produced indeterminate results. This essentially means the same as an ERROR.
  • UNSUPPORTED: An optional feature is not available or not supported in the implementation under test.
  • UNTESTED: There is no test for this assertion. This is a placeholder, used when there is no real test case yet.
  • UPASS: A test was expected to fail but passed instead.
  • WARNING: A true assertion is currently expected, but later revisions may change the requirements in this area.
  • XFAIL: A bug in the environment running the test is causing a known false assertion. This is outside the control of the test.
  • XPASS: A test was expected to fail with XFAIL but passed instead. Whatever bug that used to exist in the environment was corrected.

Types of Codes

The POSIX 1003.3 standard distinguishes two types of test result codes for test method implementations:

  1. An intermediate test result code is one that requires further processing to determine the final result code. Test method implementations may use additional intermediate codes to provide the user with as much information as possible. The intermediate result codes, as interpreted from the above definitions, would consist of: UNREPORTED, UNRESOLVED, INCOMPLETE and UNINITIATED.
  2. The final test result codes require no further processing to determine the result of testing an assertion. As opposed to intermediate test result codes, test method implementations should not give any other meaning to the final test result codes. These would basically consist of all the other codes.

In other words, these types of test result codes can be expressed otherwise as incomplete and complete. In the context of integrating test environments, this provides a mechanism to coerce any number of codes into two types which serve a clear purpose.

Groups of Codes

The TETware framework distinguishes two groups of test result codes for reporting purposes. Only considering the codes from this test framework, the groups consist of:


These groups are particularly relevant in the context of the TETware framework in order to simplify the management of such a large set of test result codes. However, this can prove to be even more relevant in the context of integrating test environments which must innevitably support an even larger set of codes.

When considering this mechanism in combination with the types of codes mentionned above, it should be noted that the TETware framework assumes that incomplete codes are failures. An alternative approach could be to only group complete codes as either passing or failing.

Coercing Codes

The DejaGnu framework prides itself of being conforming to the POSIX standard explored in this post. However, it provides support for a few test result codes which are not mentionned in the standard. This is accomplished by coercing the following codes into the corresponding POSIX codes:


This type of coercion compresses multiple test result codes into a single code. This should be considered a lossy compression because the outcome loses granularity and innevitably becomes more ambiguous.

Another type of coercion translates the test result code from one test environment to an equivalent code from another environment. However, as with compression, some granularity can potentially be lost in the process. For example, if the SKIP and UNTESTED codes were translated into the same code, it would no longer be possible to determine whether the underlying test method existed or not.


This post has identified three integration mechanisms which can be used to limit a set of test result codes. The common thread is that this limiting factor depends on the level of granularity required during integration. This is a sliding rule which essentially defines what can potentially constitute the minimal subset of codes.

One Comment leave one →
  1. July 4, 2009 16:56

    In addition to test result codes, there is a similar concept of job status in HTC (High-Throughput Computing) environments such as Condor. The codes are: unexpanded (the job has never run), idle, running, removed, completed, held.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

%d bloggers like this: