EU Regional School - Quintana-Orti Seminar

Location: AICES Seminar Room 115, 1st floor, Schinkelstr. 2, 52062 Aachen

Prof. Dr. Quintana-Orti - Modern Linear Algebra Libraries for Graphics Processors

Department of Engineering and Computer Science
Universidad Jaume I 


In response to the combined hurdles of maximum power dissipation, large memory latency, and little instruction-level parallelism left to be exploited, all major chip manufacturers have finally adopted multi-core designs as the only means to exploit the increasing number of transistors dictated by Amdahl's Law. Thus, desktop systems equipped with general-purpose four-core processors and graphics processors (GPUs) with hundreds of fine-grained cores are routine today. While these new architectures can potentially yield a much higher performance, this usually will come at the expense of tuning codes in a few cases or a complete rewrite in many others. Dense linear algebra, which is ubiquitous in scientific and engineering applications, is currently undergoing this change. 
In this course we will review practical aspects of existing sequential and parallel dense linear algebra libraries for these new architectures. In particular, we will briefly examine the traditional approach to compute dense matrix operations on shared-memory multiprocessors, consisting in the use of LAPACK and a multi-threaded implementation of BLAS, and we will also briefly address more recent efforts to increase the degree of scalability of the algorithms for multi-core processors by reviewing the proposals in libflame and PLASMA, and the general-purpose parallelizing tool SMPS Superscalar. 
The course will be specially focused on GPUs, inspecting the implementation of BLAS for NVIDIA processors, and evaluating the implementation of LAPACK on top of these kernels. We will also describe how dynamic data-driven scheduling also yields a higher degree of parallelism for multi-GPU platforms and how to hide the PCI-e latency by borrowing cache coherence techniques well-known in computer architecture. Finally, we will offer a glimpse on the parallelization of dense linear algebra libraries for clusters of nodes equipped with GPUs.

Lecture Material