Providing Reproducibility for Uncovering Non-deterministic Errors in
Runs on Supercomputers
Reproducibility is highly desirable for parallel applications, but as they are
run on increasingly large and heterogeneous platforms, reproducibility of
numerical results or code behaviors is becoming less and less obtainable. The
same code can produce different results or occasional failures such as a crash
on different systems or even across different runs on the same hardware.
PRUNERS is a research and development project that aims at innovating scalable
tools and techniques to aid applications in obtaining reproducibility.
Specifically, our strategy is to accomplish this by developing a multi-level
analysis and control toolset called the PRUNERS toolset, which combines static
and dynamic analysis techniques to detect, control and eliminate targeted
sources of non-determinism, as introduced through parallel programming
libraries and APIs.