Testing is an important part, and best practice, of software development. In many cases testing frameworks are used to confirm that the software conforms to an expected behavior. This can happen by running the tests after a number changes, or fully automated after every commit (think: Concourse CI, Jenkins, GitHub Travis CI, just to name a few). The latter approach is named “Continuous integration”.
At the same time, another more important part is often left out, and not tested: coverage for faults.
What happens if there is a network outage, memory or disks are full, or the software encounters rare circumstances? Such code paths cannot be tested for conformance, but rather faults must be injected and then verified that the software behaves as expected.
Greenplum Database integrates a testing framework called “Fault Injector”. It can be used to simulate many different aspects of faults in the system, and then verify that the product handles the problems.
The Fault Injector framework is able to inject a number of different “problems” (fault types) into the running application. The full list of types is defined in src/backend/utils/misc/faultinjector.c, in FaultInjectorTypeEnumToString. A few common examples:
There are more possible faults which can be injected, but the list above gives a good picture of what is possible.
In order to inject and test faults, first a new fault must be created, using one of the existing fault types. This happens in src/backend/utils/misc/faultinjector.c in FaultInjectorIdentifierEnumToString, a new unique name must be added to the array. A new unique name must also be added to src/include/utils/faultinjector.h, in the FaultInjectorIdentifier_e typedef. In the faultinjector.c file, the FaultInjector_NewHashEntry() function handles the different kind of faults, the new name must be added to the section handling the desired fault type.
And finally, the fault injection code must added in the software, where the fault is supposed to happen. Look for FaultInjector_InjectFaultIfSet() calls in the existing code.
The gp_inject_fault extension is used to test certain error cases - obviously this should not be used in production but only for testing.
CREATE EXTENSION IF NOT EXISTS gp_inject_fault;
After adding the new fault injection, and recompiling the database, the new fault can be tested:
Some of the parameters are required, some are optional. There is a long version of this function, which requires 8 parameters, and a shorter, more convenient function which requires the 3 most used parameters.
SELECT gp_inject_fault('executor_run_high_processed', 'skip', dbid) FROM gp_segment_configuration WHERE role = 'p';
SELECT gp_inject_fault('executor_run_high_processed', 'skip', '', '', '', 0, 0, dbid) FROM gp_segment_configuration WHERE role = 'p';
Recently the fault injection framework was used to test the SPI 64 bit counter. The counter of processed rows (ROW_COUNT) was changed from 32 bit to 64 bit, in order to hold more than 2 billion processed rows in a pl/pgSQL function. Testing this change is complicated, because a large number of rows (more than 4 billion rows, or 2^32+1 rows) must be created (inserted), updated, selected and deleted. This requires a huge amount of disk space, and takes a while. Nothing a developer can run on a laptop.
The fault injector framework is used to test this change. If the system encounters the code for the fault injector, and a certain number of rows is already processed (10000 rows), the number of processed rows is bumped up to a number short before the maximum integer range (2^32-1 rows). The next few operations will exceed the integer range, and if everything works as expected, flow into the bigint range (2^64-1 rows). If the output is not the expected number, and rather shows a negative number, something along the way is not using a 64 bit variable.
The tests creates a table, and then inserts a number rows under fault injector conditions. Only if the number of processed rows exceeds the integer range, the test is valid. Number rows in the table are counted, then every row in the table is updated, and then deleted. In every test, the number of processed rows is again artificially increased and will exceed the integer range.
Test results are compared against a result file, the test is only valid if all results match the expectation.
Fault injection is a powerful tool to test the system with respect to soft errors. It helps creating confidence that the product is able to handle even such errors which can’t be tested using traditional methods.