Tuesday, July 19 • 11:30am - 12:00pm
TECH: Experiences and Benefits of Running RDMA Hadoop and Spark on SDSC Comet

Big Data problems dealing with a variety of large data sets are now common in a wide range of domain science research areas such as bioinformatics, social science, astronomical image processing, weather and climate dynamics, and economics. In some cases, the data generation and computation is done on high performance computing (HPC) resources, thus presenting an incentive for developing/optimizing big data middleware and tools to take advantage of the existing HPC infrastructures. Data-intensive computing middleware (such as Hadoop, Spark) can potentially benefit greatly from the hardware already designed for high performance and scalability with advanced processor technology, large memory/core, and storage/filesystems. SDSC Comet represents such a resource with large numbers of compute nodes with fast node local SSD storage, and high performance Lustre filesystems. This paper discusses experiences and benefits of using optimized Remote Direct Memory Access (RDMA) Hadoop and Spark middleware on the XSEDE Comet HPC resource, including some performance results of Big Data benchmarks and applications. Comet is a general purpose HPC resource so some work is needed to integrate the middleware to run within the HPC scheduling framework. This aspect of the implementation is also discussed in detail.

Sevilla InterContinental Miami

