====== Unit Tests ======
Unit tests are indispensable means of quality assurance for any non-trivial software project, especially for one like ''polymake'' with its high degree of modularization. Unit tests must be provided for any new autonomous piece of code: a client, a user function, a production rule, a template library class, etc. Data instances having revealed some bugs in existing code must be added as test cases as well, unless they are really huge and require more than 20-30 seconds of computations.
The whole collection of unit tests can be checked at once by executing a script ''run_testcases'' from the top directory of the git workspace:
polymake --script run_testcases
It should be a firm rule for each developer and contributor to execute the tests regularly during the new feature development or an upgrade of a relevant third-party software interfaced by ''polymake''.
If you want to constrain the test to selected applications, call the driver script as follows:
polymake --script run_testcases --applications APPLICATION ...
You can also make a quick check of few single tests or even of a single test script:
polymake --script run_testcases --applications APPLICATION --testgroups TESTGROUP ...
polymake --script run_testcases path/to/testsuite/test_SUBGROUP.pl
TESTGROUP can contain shell-style wildcards * and ? (//not// perl regular expressions)
All testing scenarios can also be executed without leaving the polymake interactive shell, e.g.:
> script("run_testcases", "path/to/testsuite/test_SUBGROUP.pl");
Unit tests are at the heart of the continuous integration system (Jenkins) deployed on the polymake server. One can execute the tests there using different level of sanity checking (assertions, address sanitizer, etc.), various compiler and perl versions. More details about automated testing can be found under [[dev_corner:way_of_working#publish_the_changes_and_run_jenkins_tests|Way of Working]].
====Examples and Tutorials====
[[user_guide:extend:help_formatting#examples|Examples]] embedded into the online help blocks in rulefiles and C++ clients are verified together with unit tests. If you want to test only the examples but no unit tests, you can run the driver script as follows:
polymake --script run_testcases --applications APPLICATION ... --examples TOPIC ...
where every TOPIC is a name of a user function, method, or property whose examples are to be tested. Shell-style wildcards are supported too. To verify all examples in all applications, run
polymake --script run_testcases --examples '*'
The [[user_guide:start|web tutorials]] can be transformed into jupyter notebooks using a python script currently located in a separate repository ''tools/ipynb2dokuwiki''. The ''crawler.py'' script downloads and converts all tutorials listed on the tutorial overview page. You can also download a single tutorial using
python crawler.py tutorial_name
which will create the file ''notebooks/tutorial_name.ipynb'', where ''tutorial_name'' has to match the name of the desired page. The validity of the commands in the so created notebook can then be tested using the run_tutorials script in your polymake directory:
perl/polymake --script run_tutorials path/to/tutorial_name.ipynb
Running this script without arguments will make it test the notebooks located in ''polymake/resources/jupyter-polymake/tutorials/''.
In the future, this procedure will happen automatically.
==== Additional Options for Running Tests ====
The test driver script offers a couple of options making the testing more rigorous or relaxed. Most of them are designed specifically for Jenkins; the developers might want to make occasional use of them as well:
? ''%%--validate%%''
:: Every data file involved in tests as input or expected result is validated against the general datafile schema as well as against a type-specific schema; then a copy of the data is saved in a scratch file, loaded again, and two objects checked for identity. This ensures that no test is executed on outdated or corrupted data.
? ''%%--shuffle%%''
:: The test suites are executed in a random order. The random seed value is displayed at the beginning of the test series execution.
? ''%%--shuffle=SEED%%''
:: Repeat the shuffled test series with the prescribed random seed value. Useful for debugging problems discovered during a previous shuffled test run.
? ''%%--random-failures=ignore%%''
:: Report failures of certain unit tests (those marked as //random//) separately and do not consider them for the overall test success status. Specifying ''hide'' instead of ''ignore'' will completely suppress the notifications about such test failures.
? ''%%--cpperl-root=PATH%%''
:: Redirects new C++/perl glue code (aka wrappers) generated during test execution to the specified location. If set to ''/dev/null'', new wrapper generation is forbidden. By default, all wrapper updates are stored directly in the source code tree in your workspace.
? ''%%--allow-exec-time=SEC%%''
:: Raise the execution time limit for monitored tests. This option is useful for slowly running builds (debug, coverage, sanitizer).
? ''%%--emacs-style%%''
:: Produce the on-screen report without any cursor or color control characters; error messages are formatted adhering to the (x)emacs compilation mode convention.
===== How to Write Unit Tests =====
Each polymake application has a ''testsuite'' directory where the unit tests are kept. Unit tests are grouped by themes (e.g. testing some special client, family of similar rules, or a visualization method), each residing in a separate subdirectory. Each subdirectory contains one or more test scripts named ''test.pl'' or ''test_SUBGROUP.pl'' along with optional input data files and files containing expected test results, which can e.g. be [[user_guide:tutorials:data| created and saved from within the polymake shell]]. Test scripts are executed sequentially by the driver script ''run_testcases''. A test script may contain any perl/polymake code preparing the tests, e.g. constructing or loading the input data and expected results, and one or more calls to dedicated test checking functions described below which conclude about test success or failure. The test script code is evaluated in its application context; it is not allowed to switch applications there.
Every unit test must have a unique ID within its test script. This allows for easier problem tracking and more legible test reports. In most cases this ID will be derived from the name of the file with expected results; in the tests not using any files the ID must be specified explicitly as an argument of a test checking function. Should you have several unit tests producing identical expected data stored in a file, please create symbolic links pointing to that file and give them unique names.
==== Testing Expression Results ====
check_boolean('ID', expression);
Test is succeeded if //expression// is in fact boolean and evaluates to TRUE.
compare_values('ID', expected, expression);
Test is succeeded if //expected// and //expression// are of same type and have equal values. For C++ objects, the overloaded operator == is used. String values are compared with operator ''eq'' . polymake `big' objects may not be tested this way; use ''compare_object'' as described below.
**Caution:** if you want to compare values of different type, e.g. a Rational result of expression with an integral constant, you must either create the expected value of a proper type:
compare_values('ID', new Rational(42), expression);
or write:
check_boolean('ID', 42==expression);
because otherwise the test will be treated as failed due to type mismatch.
There are two convenience wrappers for ''compare_values'', taking the expected result from:
compare_data('ID', expression);
... the data file ''ID.OK'' . The file must be created using ''save_data''.
compare_attachment(object, 'att_name', expression);
... the named attachment of the given object. The object name serves as test ID here.
==== Testing Production Rules ====
check_rules <<'---', [ options ];
HEADER_1
files_1
HEADER_2
files_2
...
---
The first (and only mandatory) argument is a multi-line string containing production rule headers and object file patterns on adjacent line pairs. Around the pairs empty and comment lines are allowed.
Rule headers must exactly match (up to white spaces) the definitions in the rulefiles, including all labels. The rules are applied to all objects contained in the listed files. Application-specific file suffixes like ''.poly'' can be omitted. You can use usual shell wildcards as well, e.g. ''[1-7]'', ''F?'', or ''*''. The object names serve as test IDs, therefore they must be different. However, every rule being tested is considered as a separate subgroup, thus the same object can be reused for several rules without introducing any symbolic links or copies.
During the test only the given production rule and its preconditions are executed, therefore all required source and target properties must be present in the data files.
The following options are recognized:
? ''%%on => "PROPERTY_NAME"%%''
:: Apply the rule to a subobject under the given property name (a dotted path is also allowed) rather than to the topmost object of the data file.
? ''%%with_multi => [ "PROPERTY_NAME", selecting expression ]%%''
:: Select multiple subobject instances the rule should operate on, for the case there are several instances of the same type in every data file. A selecting expression can be anything understood by ''give()'', that is, a subobject name, a property name with a distinctive value, or a subroutine evaluating an instance passed in ''$_'' .
? ''%%after_cleanup => 1%%''
:: Remove all temporary properties created by the rule before comparing the object with the original state as stored in the data file.
? ''%%permuted => [ "PROPERTY_NAME" ... ]%%''
:: Expect the rule to produce data not exactly equal to those in the test object but in a permuted form. The specified properties must be targets of the rule being tested. Their values are used together with corresponding values from the test object file to calculate the permutation; usually a single property like FACETS or RAYS suffice. Then the permutation is applied to all results of the rule and they are compared to the expected values.
? ''%%expected_failure => "Error message"%%''
:: Expect the rule to fail and emit the specified error message. Source file name and line number are stripped off the message, they should not be specified in the option.
**Note 1:** The production rules must be tested in the application where they are defined, regardless of the provenience of the objects they are applied to.
**Note 2:** If there are several rules with identical headers, either defined for different object types, or equipped with different preconditions, they all will be tested on suitable subsets of the test objects. At the end of the day, every matching rule and every object must be involved in at least one successful execution, otherwise the test script will be treated as failed. If you want to restrict the testing to a specific rule instance, you can disambiguate the header by adorning it with a precondition and/or the name of the rulefile where the rule is defined:
check_rules <<'---';
CUBICAL_H_VECTOR : F_VECTOR && precondition: CUBICAL
file_patterns
---
check_rules <<'---';
INDEX_OF : DOMAIN @ representations.rules
file_patterns
---
Rulefile name may be further disambiguated with a partial path, if it occurs in several applications or bundled extensions. If both adornments are used, the precondition must come first.
==== Testing Functions Creating New Objects ====
compare_object( "file", expression );
//file// must contain the expected resulting object (standard suffix may be omitted). The comparison is done very meticulously. The property sets of the expected and created objects must be equivalent, and even the textual descriptions must equal verbatim.
If the construction involves random numbers, please be sure to provide a constant ''seed'' value to the function.
For properties without unique representation, you can request a permutation to be applied to the test result before comparing it with the specimen:
compare_object( "file", expression, permuted => [ "PROPERTY" ]);
will compute a permutation transforming the specified property of the test result into the corresponding property of the specimen, and apply it to all properties of the test result.
If there are several applicable permutations, the ambiguity must be resolved in one of two ways.
Either you explicitly specify the permutation type:
compare_object( "file", expression, permuted => [ "PermType", "PROPERTY" ]);
Or you list several permutable properties, such that the intersection of all permutations affecting any of them leads to exactly one permutation type:
compare_object( "file", expression, permuted => [ "PROPERTY_1", "PROPERTY_2", ... ]);
All listed properties must be present in both test result and specimen objects; which one will eventually be used to compute the permutation, is up to the rule scheduler.
In exceptional cases, when creation of the new object involves rule scheduling, which may lead to different sets of properties obtained on different platforms, you can suppress comparing some "volatile" properties:
compare_object( "file", expression, ignore => [ "PROP1", "SUBOBJ.PROP2", ... ] );
These should be really products of side effects only; please, don't suppress comparing properties relevant for the construction being tested.
==== Testing Rule Chains ====
check_schedule( object, rule_list, "TARGET", ...);
For the given //object//, the cheapest rule chain providing all //TARGETS// is determined and compared with //rule_list//. The headers of the rules in the list should be cited exactly as obtained via ''%%print join("\n", $object->get_schedule("TARGET",...)->list);%%'' up to white spaces between property names and permutations in the rule order. (The latter relaxation indeed manifests a deficiency of the test driver script, since many possible permutations of rules would yield an invalid schedule. Currently we have to put up with this fact.)
If several equivalent schedules can be expected, you should list them all, separating the rule lists with lines containing at least three dashes in a row: ''%%---%%'' (and any optional comments behind them). The rule scheduler is known to be non-deterministic when considering schedules with equal total weights.
==== Testing Visualization Methods ====
rendering_engine( expression, File => diff_with( "file" ) );
//rendering_engine// is a function explicitly calling some visualization back-end, e.g. ''javaview'' or ''metapost''. //expression// is the proper method to be tested. The expected output is stored in a file named //file//.OK.
You can pass filters to the function ''diff_with'' as trailing arguments after ''%%"file"%%''. A collection of useful filters can be loaded by calling ''%%script("test_filters");%%'' at the very beginning of the test script. The purpose of the filters is to suppress false alarms caused by inherently volatile parts of the output, like timestamps, user names, or random point coordinates. The script collection resides in ''apps/common/scripts''; look for numerous examples of their usage in application polytope, everywhere under ''VISUAL*'' .
==== Testing Functions Producing Screen Output ====
compare_output { some code ... } "file";
compares what the perl code prints to ''STDOUT'' with the contents of //file//.OK . This can also be used to test C++ clients writing directly to ''cout''.
==== Testing Error Detection ====
compare_expected_error { some code ... } "file";
The code block is expected to raise an exception, either via perl ''die'' or ''croak'' functions or via C++ ''throw'' statement.
//file//.OK contains the exact text of the error message, including the reference to the line of the test script where this code block appears. The top-level polymake source directory should be replaced with '''' . Look for examples in application polytope under ''readonly'' or ''canonical_coord''.
==== Testing Interactive Shell Functions ====
check_completion( "partial input", "completion_1", "completion_2", ...);
simulates the TAB completion in the interactive shell, as if the TAB key was pressed after having entered the given //partial input//. //completion_1// etc. are the expected completions of the last word of the expression; they should be given in alphabetical order.
The very last string in the argument list may be the expected character to be appended if the completion is unique.
check_context_help( "partial input", "help/topic/header", ... );
simulates the F1 context help function of the interactive shell. Expected help topic headers should be listed in the order they would appear in the real session, that is, describing incomplete expressions from right to left.
==== Testing Data Upgrade Rules ====
Each time when you introduce an incompatible data model change and introduce upgrade rules, you should add testcases for them in the testgroup ''upgrade'' of the affected application. Preparing such a testcase is extremely simple: you take a data file in an old (pre-conversion) format, store it in two copies, e.g. ''Name-OldVersion.poly'' and ''Name-OldVersion-in.poly'', and load the ''Name-Version.poly'' into polymake, which automatically applies the upgrade rules and stores it in the updated form. Once you verified the correctness of data transformation, add the following line to ''test.pl'':
compare_transformed_object('Name-OldVersion');
==== Documentation of Tests ====
For more complicated test cases it is useful to include in each test case folder an additional file ''comments.txt''. This file should explain how the data of the test case was created and why it was added (i.e. bug reported in the forum, test convex hull for unbounded polytopes with nontrivial linear span). If the test case fails in later versions of polymake, this will help understand what went wrong.
==== Side Effects in Test Scripts ====
Test scripts must not introduce any side effects. All test objects should be kept in lexical variables (''my $x'') or localized package-scope variables (''declare local $x''), all settings changes should be localized using ''local'' operator or ''prefer_now'' command. All files created during the test must be made temporary: ''new Tempfile()'' or placed in a temporary directory: ''new Tempdir()''. Please keep in mind that Jenkins must be able to execute every testcase in several parallel instances (e.g. with different perl versions), therefore creation of files with fixed names or making any other globally visible changes may lead to a test failure which would be extremely difficult to investigate as it won't be reproducible in a single test execution.
Another important design guideline for test scripts is to avoid changing the objects involved in different test steps. All comparisons of results and expected values (''compare_values'', ''compare_data'', ''compare_object'', etc.) are performed //after// execution of the complete test script body (it's when the black testcase IDs on the screen are painted green or red). If you would supply a complex object like a Matrix or a Set to ''compare_values()'' or ''compare_data()'' and change the object further down in the script, the comparison would fail, because such objects are passed by reference! Hence, always pass immutable copies of objects to comparison functions.
The temporary files and directories are automatically destroyed when the test script body finishes, thus before the comparisons are done. If a test object has to be stored in a temporary file used in comparison operations, the lifetime of the file must be prolonged until the comparison finishes. To do so, use ''test_cleanup'' function with a code block performing the necessary destruction:
my $temp_dir=new Tempdir();
...
save($x, "$temp_dir/filename");
...
compare_object("$temp_dir/filename", $y);
...
test_cleanup { undef $temp_dir; }
===== Random Unit Tests =====
Testing a function which involves a construction based on the random number generator (RNG) is an additional challenge. Although the new implementation of the RNG uses a numerically stable MPFR library, and all such functions accept an optional seed argument, an unexpected deviation of test results may still happen after the move to a new hardware platform or new compiler version.
For a migration period, you can tell the script ''run_testcases'' to ignore differences detected in such testcases, by running it with an option ''%%--ignore-random-failures%%'' .
To mark a test group as random, put the statement ''expect_random_failures();'' at the beginning of its test script.
Note that this option is not used in automated runs on Jenkins, because otherwise some tests, once they come out of sync, will never be repaired!
===== Monitoring Execution Time =====
For any single testcase, you can specify an upper limit for its execution time. Should the time measured in a test run exceed the limit, the test will be reported as failed, regardless of the computed results. All test checking functions accept an option ''%%max_exec_time => SEC%%'' specifying the user CPU time in seconds. Please be aware that the time budget accounts for any activity taken place since the execution of the preceding test checking function (for the very first testcase: since the beginning of the test script), that is, not only the productive code being tested but also preparation of input data, reading the test expectations from data files, and result comparison.
For ''check_rules'', the time is separately measured on every application of a rule on a single object.
===== Testing Configuration-Dependent Code =====
If a rule or a function being tested depend on third-party software, e.g. coming from a bundled extension, the test script must be protected against failures caused by missing dependency. In particular, some Jenkins jobs are occasionally executed on nodes lacking some third-party packages. For checking the availability of a configurable feature, you can use the following function returning a boolean value:
? ''%%check_if_configured("bundled:NAME")%%''
:: check if the named bundled extension has been successfully configured and activated
? ''%%check_if_configured("RULEFILE")%%''
:: check if the specified rulefile (presumably having a CONFIGURE section) has been loaded;
.. the rulefile must belong to the application being tested or to one of the USE'd or IMPORT'ed applications
? ''%%check_if_configured("APPNAME::RULEFILE")%%''
:: check if the specified rulefile belonging to a named application has been loaded
If several configurable features are necessary for the test execution, combine the ''check_if_configured'' calls with boolean AND.
If //one of features// is sufficient, list their names in a single call: ''%%check_if_configured("first.rules", "second.rules")%%''
You can protect the entire test script or single testcases. In the first case, place an expression at the very beginning of the test script:
check_if_configured("FEATURE") or return;
Then the test subgroup will be reported as skipped in the test run summary.
If you protect single testcases, they will be silently skipped and not counted in any statistics. This has still a benefit of executing the rest of the script regardless the actual configuration:
if (check_if_configured("bundled:ppl")) {
check_rules ...;
}
===== Testing in Different Environments =====
Under rare occasions the expected results of a unit test may vary for different hardware platforms (most of all when floating-point calculations are involved) or operation systems. Should you observe this discrepancy, please provide several ''OK'' files adding special suffixes like ''.x86_64'' or ''.darwin.i386'' . The suffix must be the same as in the name of the build directory, as created by ''./configure'' .
===== Testing in Extensions =====
Extensions should provide their own unit tests too. They are organized in exactly the same way as the tests in the main source tree. To run all tests in an extension, change into its top directory and call ''make test'' . If you want to execute only some selected tests, call the driver script with following arguments:
polymake --script run_testcases --extensions . --applications APPLICATION --testgroups TESTNAME ...
Running tests from several extensions at once is also possible:
polymake --script run_testcases --extensions EXTENSION_DIR_1 EXTENSION_DIR_2 ...
Finally, you may combine the core and extension's tests in a single run (although it is not recommended because of possible influence of the extension on the results of some testcases like Scheduler or TAB completion):
polymake --script run_testcases --extensions ALL
===== Testing core library C++ components =====
Unit tests for core library components are kept in ''testscenarios/core_lib_tests''. They are based on [[https://github.com/google/googletest/blob/master/docs/index.md | googletest framework]] which should be installed on your computer separately.
Please be aware that these tests may not involve any components depending on perl, which for the time being includes BigObjects.
1. Put the unit tests in a c++ file into ''testscenarios/core_lib_tests/src/''.
2. Change into ''testscenarios/core_lib_tests'', build and run the entire unit test suite once by issuing ''./build_test.sh''
3. Add more unit tests to the same c++ file and run them exclusively:
ninja -C work/build/Opt
work/build/Opt/all_tests --gtest_filter=SUITABLE_PATTERN
You can still use the script ''./build_test.sh'' for repeated test runs, it will just take longer because it starts a full clean build every time.
4. If you want to debug failing tests, build them in Debug mode:
./build_test.sh --build-mode=Debug
gdb -args work/build/Debug/all_tests --gtest_filter=SUITABLE_PATTERN
Again, to avoid repeated full clean builds after fixing the library code or the tests, you can use ''ninja -C work/build/Debug''
===== Investigating Failures =====
The driver script ''run_testcases'' always prints the summary about all tests run. If it states that //all tests are successful//, you can continue your work or commit the last changes with pure conscience. If it lists some test as //skipped//, then your test machine is lacking some software required for these particular tests. You can ignore this if you are absolutely certain that these tests are not relevant for checking your last changes, and rely on full test coverage on Jenkins. For failed tests, the detailed explanation of the failure will be printed; it can be a discrepancy between the expectations and the computed results, an exception raised in the production code, or a syntactic error in the test script itself. In any case, you'll have to investigate the reasons.
If there are many failing tests, the test report can be made more IDE-friendly by running the test driver with an option ''%%--emacs-style%%''; then you can make use of emacs and other tools capability of browsing compilation logs and jump to the places where errors occurred by a single mouse click.
After having fixed the errors (either by debugging the productive code or by correcting the test data) **it is important to repeat the whole test run**, because sometimes a (bad) fix for one case introduces a regression for other ones. After all, the pleasant feeling of deep contentment caused by the message //all tests are successful// should not be missed, after all the hard work :-)
===== Disabling Tests =====
Tests failing because of a known bug or deficiency which is not going to be fixed immediately should be disabled in order to maintain clean builds on Jenkins and by all participating developers. The test script should not be deleted, renamed, or abridged. Instead, two lines must be inserted at the very beginning:
disable_test("Short reason. Ticket #NNN.");
return;
Every disabled test (or group of tests) must be linked to a Trac ticket dedicated to the problem. This ticket should have a phrase "disabled unit test" in its ''Keywords'' field; this allows for better monitoring and reminds the resolver to re-enable the tests once the problem is fixed.
Disabled tests appear in the "skipped" statistics of the test runs on Jenkins and on your display.
If you want to merely disable single steps in a test script, just comment them out. Please still remember to mention this in the corresponding Trac ticket and place the ticket number in the comments.