XSEDE16 has ended
Back To Schedule
Wednesday, July 20 • 3:30pm - 4:00pm
TECH: A Quantitative Analysis of Node Sharing on HPC Clusters Using XDMoD Application Kernels

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

In this investigation, we study how application performance is affected when jobs are permitted to share compute nodes. A series of application kernels consisting of a diverse set of benchmark calculations were run in both exclusive and node-sharing modes on the Center for Computational Research’s high-performance computing (HPC) cluster. Very little increase in runtime was observed due to job contention among application kernel jobs run on shared nodes. The small differences in runtime were quantitatively modeled in order to characterize the resource contention and attempt to determine the circumstances under which it would or would not be important. A machine learning regression model applied to the runtime data successfully fitted the small differences between the exclusive and shared node runtime data; it also provided insight into the contention for node resources that occurs when jobs are allowed to share nodes. Analysis of a representative job mix shows that runtime of shared jobs is affected primarily by the memory subsystem, in particular by the reduction in the effective cache size due to sharing; this leads to higher utilization of DRAM. Insights such as these are crucial when formulating policies proposing node sharing as a mechanism for improving HPC utilization.

Wednesday July 20, 2016 3:30pm - 4:00pm EDT
Sevilla InterContinental Miami