XSEDE16 has ended
Back To Schedule
Thursday, July 21 • 10:30am - 11:00am
AD: Towards a Methodology for Cross-Accelerator Performance Profiling

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

The computing requirements of scientific applications have influenced processor design, and have motivated the introduction and use of accelerator architectures for high performance computing (HPC). Consequently, it is now common for the compute nodes of HPC clusters to be comprised of multiple processing elements, including accelerators. Although execution time can be used to compare the performance of different processing elements, there exists no standard way to analyze application performance across processing elements with very different architectural designs and, thus, understand why one outperforms another. Without this knowledge, a developer is handicapped when attempting to effectively tune application performance as is a hardware designer when trying to understand how best to improve the design of processing elements. In this paper, we use the LULESH 1.0 proxy application to compare and analyze the performance of three different accelerators: the Intel Xeon Phi and the NVIDIA Kepler and Fermi GPUs. Our study shows that LULESH 1.0 exhibits similar runtime behavior across the three accelerators, but runs up to 7x faster on the Kepler. Despite the significant architectural differences between the Xeon Phi and the GPUs, and the differences in the metrics used to characterize the performance of these architectures, we were able to quantify why the Kepler outperforms both the Fermi and the Xeon Phi. To do this, we compared their achieved instructions per cycle and vectorization efficiency, as well as their memory behavior and power and energy consumption.

Thursday July 21, 2016 10:30am - 11:00am EDT
Sevilla InterContinental Miami