Dr. Zoltán Péter Szebenyi
Advisor / Co. Advisor
Prof. Wolf / Prof. Behr
Capturing Parallel Performance Dynamics
M.Sc. Zoltán Péter Szebenyi
52056 Aachen, Germany
Tel. (0241) 80 28497
Fax (0241) 80 626498
since 09/2007 Student of Aachen Institute for Advanced Study in Computational Engineering Science (AICES), RWTH Aachen
10/2006 - 02/2007 Master Courses in Computer Science, Erasmus Scholarship, Friedrich-Alexander University of Erlangen-Nürnberg, Germany
09/2002 - 06/2007 Program Designer Mathematician Diploma, University Szeged, Hungary
07/2009 - 11/2009 Student Intern at Lawrence Livermore National Laboratory, USA
09/2005 - 01/2006 Tutor activity at Faculty of Science and Informatics, University of Szeged, Hungary
Trace-based Perfomance Analysis of Long-running Applications
Often, a significant fraction of the time parallel applications spend in communication or synchronization operations can be attributed to wait states that occur when processes or threads fail to reach synchronization points in a timely manner or when access to shared resources is temporarily denied. Especially when trying to scale communication-intensive applications to thousands of processors, such wait states can present severe challenges to achieving good performance. As a first step in reducing their impact, application developers need a diagnostic method that allows their localization, classification, and quantification especially at larger scales.
Such wait states can be effectively measured by analyzing event traces, which record timestamped runtime events, such as entering a function or sending a message. However, as these events occur frequently, the amount of trace data generated can easily exceed the processing capabilities of trace processing tools. Storage and bandwidth restrictions of the underlying file system impose further limits on the overall trace file size. The total amount of trace data usually depends on both the number of application processes and the number of events generated by individual processes. Whereas larger numbers of application processes can be matched by larger numbers of analysis processes, limiting the amount of trace data per process is more difficult.
One option is to exploit redundancy in the event data caused by the iterative behavior of many scientific and engineering codes. This, however, requires profound knowledge of the target application, which often has to be obtained manually in time-consuming trial-and-error procedures. The aim of the research is to develop automatic approaches to reduce the amount of collected trace data per process needed for the successful analysis of parallel applications.
Zoltán Szebenyi, Felix Wolf, Brian J. N. Wylie: Space-Efficient Time-Series Call-Path Profiling of Parallel Applications. In Proc. of the ACM/IEEE Conference on Supercomputing (SC09), Portland, Oregon, ACM, November 2009.
Zoltán Szebenyi, Brian J. N. Wylie, Felix Wolf: Scalasca Parallel Performance Analyses of PEPC. In Proc. of the 1st Workshop on Productivity and Performance (PROPER) in conjunction with Euro-Par 2008, volume 5415 of Lecture Notes in Computer Science, pages 305-314, Springer, 2009.
Zoltán Szebenyi, Brian J. N. Wylie, Felix Wolf: SCALASCA Parallel Performance Analyses of SPEC MPI2007 Applications. In Proc. of the 1st SPEC Int'l Performance Evaluation Workshop (SIPEW), volume 5119 of Lecture Notes in Computer Science, pages 99-123, Darmstadt, Germany, Springer, June 2008.
Zoltán Szebenyi, János Csirik: String Classification Using Medians. XXVIII. National Students´Scientific Conference p.30, Miskolc, Hungary, April 2007.