Performance Modeling and Prediction for Dense Linear Algebra
Outline
The accurate prediction of performance for numerical algorithms has so far proven elusive. The difficulties come from the ever-increasing complexity of the memory hierarchies and CPUs architecture. Extremely involved scheduling and caching policies, due to the sharing of resources among multiple cores, have the effect that multiple executions of the same program on the same data result in significantly different measurements even in the case of serial executions. Also, the execution time of a sequence of operations does not equal the sum of the parts. These facts make general analytical models well beyond our reach, and render both the extrapolation-based and the composition-based approaches unreliable.
We look at the problem from a different angle. Instead of attempting accurate predictions for one target algorithm, our interest lays in algorithm ranking: Given a set of (mathematically equivalent) algorithms, the objective is to rank them, separating those that are not competitive from one or more best-performing candidates. Because of the aforementioned uncertainties in the measurements, in most cases no definite winner exists, and the ranking must be accompanied with a corresponding likelihood.