user_guide:extend:unit_tests

Unit Tests

Unit tests are indispensable means of quality assurance for any non-trivial software project, especially for one like polymake with its high degree of modularization. Unit tests must be provided for any new autonomous piece of code: a client, a user function, a production rule, a template library class, etc. Data instances having revealed some bugs in existing code must be added as test cases as well, unless they are really huge and require more than 20-30 seconds of computations.

The whole collection of unit tests can be checked at once by executing a script run_testcases from the top directory of the git workspace:

polymake --script run_testcases

It should be a firm rule for each developer and contributor to execute the tests regularly during the new feature development or an upgrade of a relevant third-party software interfaced by polymake.

If you want to constrain the test to selected applications, call the driver script as follows:

polymake --script run_testcases --applications APPLICATION ...

You can also make a quick check of few single tests or even of a single test script:

polymake --script run_testcases --applications APPLICATION --testgroups TESTGROUP ...
polymake --script run_testcases path/to/testsuite/test_SUBGROUP.pl

TESTGROUP can contain shell-style wildcards * and ? (not perl regular expressions)

All testing scenarios can also be executed without leaving the polymake interactive shell, e.g.:

> script("run_testcases", "path/to/testsuite/test_SUBGROUP.pl");

Unit tests are at the heart of the continuous integration system (Jenkins) deployed on the polymake server. One can execute the tests there using different level of sanity checking (assertions, address sanitizer, etc.), various compiler and perl versions. More details about automated testing can be found under Way of Working.

Examples embedded into the online help blocks in rulefiles and C++ clients are verified together with unit tests. If you want to test only the examples but no unit tests, you can run the driver script as follows:

polymake --script run_testcases --applications APPLICATION ... --examples TOPIC ...

where every TOPIC is a name of a user function, method, or property whose examples are to be tested. Shell-style wildcards are supported too. To verify all examples in all applications, run

polymake --script run_testcases --examples '*'

The web tutorials can be transformed into jupyter notebooks using a python script currently located in a separate repository tools/ipynb2dokuwiki. The crawler.py script downloads and converts all tutorials listed on the tutorial overview page. You can also download a single tutorial using

python crawler.py tutorial_name

which will create the file notebooks/tutorial_name.ipynb, where tutorial_name has to match the name of the desired page. The validity of the commands in the so created notebook can then be tested using the run_tutorials script in your polymake directory:

perl/polymake --script run_tutorials path/to/tutorial_name.ipynb

Running this script without arguments will make it test the notebooks located in polymake/resources/jupyter-polymake/tutorials/. In the future, this procedure will happen automatically.

The test driver script offers a couple of options making the testing more rigorous or relaxed. Most of them are designed specifically for Jenkins; the developers might want to make occasional use of them as well:

--validate

Every data file involved in tests as input or expected result is validated against the general datafile schema as well as against a type-specific schema; then a copy of the data is saved in a scratch file, loaded again, and two objects checked for identity. This ensures that no test is executed on outdated or corrupted data.

--shuffle

The test suites are executed in a random order. The random seed value is displayed at the beginning of the test series execution.

--shuffle=SEED

Repeat the shuffled test series with the prescribed random seed value. Useful for debugging problems discovered during a previous shuffled test run.

--random-failures=ignore

Report failures of certain unit tests (those marked as random) separately and do not consider them for the overall test success status. Specifying hide instead of ignore will completely suppress the notifications about such test failures.

--cpperl-root=PATH

Redirects new C++/perl glue code (aka wrappers) generated during test execution to the specified location. If set to /dev/null, new wrapper generation is forbidden. By default, all wrapper updates are stored directly in the source code tree in your workspace.

--allow-exec-time=SEC

Raise the execution time limit for monitored tests. This option is useful for slowly running builds (debug, coverage, sanitizer).

--emacs-style

Produce the on-screen report without any cursor or color control characters; error messages are formatted adhering to the (x)emacs compilation mode convention.

Each polymake application has a testsuite directory where the unit tests are kept. Unit tests are grouped by themes (e.g. testing some special client, family of similar rules, or a visualization method), each residing in a separate subdirectory. Each subdirectory contains one or more test scripts named test.pl or test_SUBGROUP.pl along with optional input data files and files containing expected test results, which can e.g. be created and saved from within the polymake shell. Test scripts are executed sequentially by the driver script run_testcases. A test script may contain any perl/polymake code preparing the tests, e.g. constructing or loading the input data and expected results, and one or more calls to dedicated test checking functions described below which conclude about test success or failure. The test script code is evaluated in its application context; it is not allowed to switch applications there.

Every unit test must have a unique ID within its test script. This allows for easier problem tracking and more legible test reports. In most cases this ID will be derived from the name of the file with expected results; in the tests not using any files the ID must be specified explicitly as an argument of a test checking function. Should you have several unit tests producing identical expected data stored in a file, please create symbolic links pointing to that file and give them unique names.

check_boolean('ID', expression);

Test is succeeded if expression is in fact boolean and evaluates to TRUE.

compare_values('ID', expected, expression);

Test is succeeded if expected and expression are of same type and have equal values. For C++ objects, the overloaded operator == is used. String values are compared with operator eq . polymake `big' objects may not be tested this way; use compare_object as described below.

Caution: if you want to compare values of different type, e.g. a Rational result of expression with an integral constant, you must either create the expected value of a proper type:

compare_values('ID', new Rational(42), expression);

or write:

check_boolean('ID', 42==expression);

because otherwise the test will be treated as failed due to type mismatch.

There are two convenience wrappers for compare_values, taking the expected result from:

compare_data('ID', expression);

… the data file ID.OK . The file must be created using save_data.

compare_attachment(object, 'att_name', expression);

… the named attachment of the given object. The object name serves as test ID here.

check_rules <<'---', [ options ];
HEADER_1
files_1

HEADER_2
files_2
...
---

The first (and only mandatory) argument is a multi-line string containing production rule headers and object file patterns on adjacent line pairs. Around the pairs empty and comment lines are allowed.

Rule headers must exactly match (up to white spaces) the definitions in the rulefiles, including all labels. The rules are applied to all objects contained in the listed files. Application-specific file suffixes like .poly can be omitted. You can use usual shell wildcards as well, e.g. [1-7], F?, or *. The object names serve as test IDs, therefore they must be different. However, every rule being tested is considered as a separate subgroup, thus the same object can be reused for several rules without introducing any symbolic links or copies.

During the test only the given production rule and its preconditions are executed, therefore all required source and target properties must be present in the data files.

The following options are recognized:

on => "PROPERTY_NAME"

Apply the rule to a subobject under the given property name (a dotted path is also allowed) rather than to the topmost object of the data file.

with_multi => [ "PROPERTY_NAME", selecting expression ]

Select multiple subobject instances the rule should operate on, for the case there are several instances of the same type in every data file. A selecting expression can be anything understood by give(), that is, a subobject name, a property name with a distinctive value, or a subroutine evaluating an instance passed in $_ .

after_cleanup => 1

Remove all temporary properties created by the rule before comparing the object with the original state as stored in the data file.

permuted => [ "PROPERTY_NAME" ... ]

Expect the rule to produce data not exactly equal to those in the test object but in a permuted form. The specified properties must be targets of the rule being tested. Their values are used together with corresponding values from the test object file to calculate the permutation; usually a single property like FACETS or RAYS suffice. Then the permutation is applied to all results of the rule and they are compared to the expected values.

expected_failure => "Error message"

Expect the rule to fail and emit the specified error message. Source file name and line number are stripped off the message, they should not be specified in the option.

Note 1: The production rules must be tested in the application where they are defined, regardless of the provenience of the objects they are applied to.

Note 2: If there are several rules with identical headers, either defined for different object types, or equipped with different preconditions, they all will be tested on suitable subsets of the test objects. At the end of the day, every matching rule and every object must be involved in at least one successful execution, otherwise the test script will be treated as failed. If you want to restrict the testing to a specific rule instance, you can disambiguate the header by adorning it with a precondition and/or the name of the rulefile where the rule is defined:

check_rules <<'---';
CUBICAL_H_VECTOR : F_VECTOR && precondition: CUBICAL
file_patterns

---

check_rules <<'---';
INDEX_OF : DOMAIN  @ representations.rules
file_patterns
---

Rulefile name may be further disambiguated with a partial path, if it occurs in several applications or bundled extensions. If both adornments are used, the precondition must come first.

compare_object( "file", expression );

file must contain the expected resulting object (standard suffix may be omitted). The comparison is done very meticulously. The property sets of the expected and created objects must be equivalent, and even the textual descriptions must equal verbatim.

If the construction involves random numbers, please be sure to provide a constant seed value to the function.

For properties without unique representation, you can request a permutation to be applied to the test result before comparing it with the specimen:

compare_object( "file", expression, permuted => [ "PROPERTY" ]);

will compute a permutation transforming the specified property of the test result into the corresponding property of the specimen, and apply it to all properties of the test result.

If there are several applicable permutations, the ambiguity must be resolved in one of two ways. Either you explicitly specify the permutation type:

compare_object( "file", expression, permuted => [ "PermType", "PROPERTY" ]);

Or you list several permutable properties, such that the intersection of all permutations affecting any of them leads to exactly one permutation type:

compare_object( "file", expression, permuted => [ "PROPERTY_1", "PROPERTY_2", ... ]);

All listed properties must be present in both test result and specimen objects; which one will eventually be used to compute the permutation, is up to the rule scheduler.

In exceptional cases, when creation of the new object involves rule scheduling, which may lead to different sets of properties obtained on different platforms, you can suppress comparing some “volatile” properties:

compare_object( "file", expression, ignore => [ "PROP1", "SUBOBJ.PROP2", ... ] );

These should be really products of side effects only; please, don't suppress comparing properties relevant for the construction being tested.

check_schedule( object, rule_list, "TARGET", ...);

For the given object, the cheapest rule chain providing all TARGETS is determined and compared with rule_list. The headers of the rules in the list should be cited exactly as obtained via print join("\n", $object->get_schedule("TARGET",...)->list); up to white spaces between property names and permutations in the rule order. (The latter relaxation indeed manifests a deficiency of the test driver script, since many possible permutations of rules would yield an invalid schedule. Currently we have to put up with this fact.)

If several equivalent schedules can be expected, you should list them all, separating the rule lists with lines containing at least three dashes in a row: --- (and any optional comments behind them). The rule scheduler is known to be non-deterministic when considering schedules with equal total weights.

rendering_engine( expression, File => diff_with( "file" ) );

rendering_engine is a function explicitly calling some visualization back-end, e.g. javaview or metapost. expression is the proper method to be tested. The expected output is stored in a file named file.OK.

You can pass filters to the function diff_with as trailing arguments after "file". A collection of useful filters can be loaded by calling script("test_filters"); at the very beginning of the test script. The purpose of the filters is to suppress false alarms caused by inherently volatile parts of the output, like timestamps, user names, or random point coordinates. The script collection resides in apps/common/scripts; look for numerous examples of their usage in application polytope, everywhere under VISUAL* .

compare_output { some code ... } "file";

compares what the perl code prints to STDOUT with the contents of file.OK . This can also be used to test C++ clients writing directly to cout.

compare_expected_error { some code ... } "file";

The code block is expected to raise an exception, either via perl die or croak functions or via C++ throw statement. file.OK contains the exact text of the error message, including the reference to the line of the test script where this code block appears. The top-level polymake source directory should be replaced with <TOP> . Look for examples in application polytope under readonly or canonical_coord.

check_completion( "partial input", "completion_1", "completion_2", ...);

simulates the TAB completion in the interactive shell, as if the TAB key was pressed after having entered the given partial input. completion_1 etc. are the expected completions of the last word of the expression; they should be given in alphabetical order. The very last string in the argument list may be the expected character to be appended if the completion is unique.

check_context_help( "partial input", "help/topic/header", ... );

simulates the F1 context help function of the interactive shell. Expected help topic headers should be listed in the order they would appear in the real session, that is, describing incomplete expressions from right to left.

Each time when you introduce an incompatible data model change and introduce upgrade rules, you should add testcases for them in the testgroup upgrade of the affected application. Preparing such a testcase is extremely simple: you take a data file in an old (pre-conversion) format, store it in two copies, e.g. Name-OldVersion.poly and Name-OldVersion-in.poly, and load the Name-Version.poly into polymake, which automatically applies the upgrade rules and stores it in the updated form. Once you verified the correctness of data transformation, add the following line to test.pl:

compare_transformed_object('Name-OldVersion');

For more complicated test cases it is useful to include in each test case folder an additional file comments.txt. This file should explain how the data of the test case was created and why it was added (i.e. bug reported in the forum, test convex hull for unbounded polytopes with nontrivial linear span). If the test case fails in later versions of polymake, this will help understand what went wrong.

Test scripts must not introduce any side effects. All test objects should be kept in lexical variables (my $x) or localized package-scope variables (declare local $x), all settings changes should be localized using local operator or prefer_now command. All files created during the test must be made temporary: new Tempfile() or placed in a temporary directory: new Tempdir(). Please keep in mind that Jenkins must be able to execute every testcase in several parallel instances (e.g. with different perl versions), therefore creation of files with fixed names or making any other globally visible changes may lead to a test failure which would be extremely difficult to investigate as it won't be reproducible in a single test execution.

Another important design guideline for test scripts is to avoid changing the objects involved in different test steps. All comparisons of results and expected values (compare_values, compare_data, compare_object, etc.) are performed after execution of the complete test script body (it's when the black testcase IDs on the screen are painted green or red). If you would supply a complex object like a Matrix or a Set to compare_values() or compare_data() and change the object further down in the script, the comparison would fail, because such objects are passed by reference! Hence, always pass immutable copies of objects to comparison functions.

The temporary files and directories are automatically destroyed when the test script body finishes, thus before the comparisons are done. If a test object has to be stored in a temporary file used in comparison operations, the lifetime of the file must be prolonged until the comparison finishes. To do so, use test_cleanup function with a code block performing the necessary destruction:

my $temp_dir=new Tempdir();
...
save($x, "$temp_dir/filename");
...
compare_object("$temp_dir/filename", $y);
...
test_cleanup { undef $temp_dir; }

Testing a function which involves a construction based on the random number generator (RNG) is an additional challenge. Although the new implementation of the RNG uses a numerically stable MPFR library, and all such functions accept an optional seed argument, an unexpected deviation of test results may still happen after the move to a new hardware platform or new compiler version. For a migration period, you can tell the script run_testcases to ignore differences detected in such testcases, by running it with an option --ignore-random-failures .

To mark a test group as random, put the statement expect_random_failures(); at the beginning of its test script.

Note that this option is not used in automated runs on Jenkins, because otherwise some tests, once they come out of sync, will never be repaired!

For any single testcase, you can specify an upper limit for its execution time. Should the time measured in a test run exceed the limit, the test will be reported as failed, regardless of the computed results. All test checking functions accept an option max_exec_time => SEC specifying the user CPU time in seconds. Please be aware that the time budget accounts for any activity taken place since the execution of the preceding test checking function (for the very first testcase: since the beginning of the test script), that is, not only the productive code being tested but also preparation of input data, reading the test expectations from data files, and result comparison. For check_rules, the time is separately measured on every application of a rule on a single object.

If a rule or a function being tested depend on third-party software, e.g. coming from a bundled extension, the test script must be protected against failures caused by missing dependency. In particular, some Jenkins jobs are occasionally executed on nodes lacking some third-party packages. For checking the availability of a configurable feature, you can use the following function returning a boolean value:

check_if_configured("bundled:NAME")

check if the named bundled extension has been successfully configured and activated

check_if_configured("RULEFILE")

check if the specified rulefile (presumably having a CONFIGURE section) has been loaded;

the rulefile must belong to the application being tested or to one of the USE'd or IMPORT'ed applications

check_if_configured("APPNAME::RULEFILE")

check if the specified rulefile belonging to a named application has been loaded

If several configurable features are necessary for the test execution, combine the check_if_configured calls with boolean AND. If one of features is sufficient, list their names in a single call: check_if_configured("first.rules", "second.rules")

You can protect the entire test script or single testcases. In the first case, place an expression at the very beginning of the test script:

check_if_configured("FEATURE") or return;

Then the test subgroup will be reported as skipped in the test run summary.

If you protect single testcases, they will be silently skipped and not counted in any statistics. This has still a benefit of executing the rest of the script regardless the actual configuration:

if (check_if_configured("bundled:ppl")) {
  check_rules ...;
}

Under rare occasions the expected results of a unit test may vary for different hardware platforms (most of all when floating-point calculations are involved) or operation systems. Should you observe this discrepancy, please provide several OK files adding special suffixes like .x86_64 or .darwin.i386 . The suffix must be the same as in the name of the build directory, as created by ./configure .

Extensions should provide their own unit tests too. They are organized in exactly the same way as the tests in the main source tree. To run all tests in an extension, change into its top directory and call make test . If you want to execute only some selected tests, call the driver script with following arguments:

polymake --script run_testcases --extensions . --applications APPLICATION --testgroups TESTNAME ... 

Running tests from several extensions at once is also possible:

polymake --script run_testcases --extensions EXTENSION_DIR_1 EXTENSION_DIR_2 ... 

Finally, you may combine the core and extension's tests in a single run (although it is not recommended because of possible influence of the extension on the results of some testcases like Scheduler or TAB completion):

polymake --script run_testcases --extensions ALL 

Unit tests for core library components are kept in testscenarios/core_lib_tests. They are based on googletest framework which should be installed on your computer separately.

Please be aware that these tests may not involve any components depending on perl, which for the time being includes BigObjects.

1. Put the unit tests in a c++ file into testscenarios/core_lib_tests/src/.

2. Change into testscenarios/core_lib_tests, build and run the entire unit test suite once by issuing ./build_test.sh

3. Add more unit tests to the same c++ file and run them exclusively:

ninja -C work/build/Opt
work/build/Opt/all_tests --gtest_filter=SUITABLE_PATTERN

You can still use the script ./build_test.sh for repeated test runs, it will just take longer because it starts a full clean build every time.

4. If you want to debug failing tests, build them in Debug mode:

./build_test.sh --build-mode=Debug
gdb -args work/build/Debug/all_tests --gtest_filter=SUITABLE_PATTERN

Again, to avoid repeated full clean builds after fixing the library code or the tests, you can use ninja -C work/build/Debug

The driver script run_testcases always prints the summary about all tests run. If it states that all tests are successful, you can continue your work or commit the last changes with pure conscience. If it lists some test as skipped, then your test machine is lacking some software required for these particular tests. You can ignore this if you are absolutely certain that these tests are not relevant for checking your last changes, and rely on full test coverage on Jenkins. For failed tests, the detailed explanation of the failure will be printed; it can be a discrepancy between the expectations and the computed results, an exception raised in the production code, or a syntactic error in the test script itself. In any case, you'll have to investigate the reasons.

If there are many failing tests, the test report can be made more IDE-friendly by running the test driver with an option --emacs-style; then you can make use of emacs and other tools capability of browsing compilation logs and jump to the places where errors occurred by a single mouse click.

After having fixed the errors (either by debugging the productive code or by correcting the test data) it is important to repeat the whole test run, because sometimes a (bad) fix for one case introduces a regression for other ones. After all, the pleasant feeling of deep contentment caused by the message all tests are successful should not be missed, after all the hard work :-)

Tests failing because of a known bug or deficiency which is not going to be fixed immediately should be disabled in order to maintain clean builds on Jenkins and by all participating developers. The test script should not be deleted, renamed, or abridged. Instead, two lines must be inserted at the very beginning:

disable_test("Short reason. Ticket #NNN.");
return;

Every disabled test (or group of tests) must be linked to a Trac ticket dedicated to the problem. This ticket should have a phrase “disabled unit test” in its Keywords field; this allows for better monitoring and reminds the resolver to re-enable the tests once the problem is fixed.

Disabled tests appear in the “skipped” statistics of the test runs on Jenkins and on your display.

If you want to merely disable single steps in a test script, just comment them out. Please still remember to mention this in the corresponding Trac ticket and place the ticket number in the comments.

  • user_guide/extend/unit_tests.txt
  • Last modified: 2023/05/17 12:13
  • by lkastner