In order to introduce fault-tolerance through CPPC, the code of your application needs to be changed so that it communicates with the runtime library, passing information about variables that need to be dumped in the next checkpoint, where to create the state files, etc. Also, flow-control structures are placed to control the re-execution of certain critical portions of code at restart. This will enable the recovery of certain non-portable parts of data, such as MPI communicators or open files, that cannot be just stored as binary data in a state file.
As the insertion of these function calls and flow-control structures would imply significant effort by the end user, the CPPC runtime library is distributed along with a compiler that helps the user by automatically performing necessary transformations to the original application code.
The automatic communication analysis and checkpoint insertion sported from v0.7.x of the compiler are fairly stable, and a huge improvement over v0.6.x, but might still not work well with all applications. You can deactivate both and rely on manual directive insertion if you experience trouble with the analyses or the output code.
The compiler-provided directives, in case the user wants to manually guide the compiler operation, are:
cppc execute/end execute: These mark a block of code that needs to be re-executed upon application restart. This directive should be inserted when you want to recover state by re-execution, instead of saving/reading it to/from disk.
cppc checkpoint: This directive may be used for manually marking points where the state is dumped to a state file. If so, it must be inserted at safe points in the application: locations where there are neither in-transit, nor orphan messages between processes. In a typical example, a checkpoint should not be placed in between an MPI_Send() and its matching MPI_Recv(). If this happened, the message would not be resent upon application restart, but the destination process would still expect to receive it.
cppc checkpoint loop: This directive is the same as the previous one, except that you mark a loop in whose body you want the checkpoint inserted. The compiler will take into account communications between processes and insert a checkpoint in the first safe point it can find inside the loop body.
Decide where you want to dump the state. Place checkpoint or checkpoint loop directives in those spots. If the communication analysis is not used, be sure to check that those spots are safe points as defined above. Bear in mind that the code after a checkpoint and up to the end of the application will be the code being executed upon application restart.
Compile the application linking with the appropriate CPPC dynamic library.