Jetstream is undergoing its acceptance review by the National Science Foundation (NSF) at the beginning of May. We expect the system to be accepted by the NSF in short order and this abstract is written with the assumption that acceptance will be complete by the time final versions of papers are due. Here, we present the acceptance test criteria and results that define the key performance characteristics of Jetstream, describe the experiences of the Jetstream team in standing up and operating an OpenStack-based cloud environment, and describe some of the early scientific results that have been obtained by researchers and students using this system. Jetstream is a distributed production cloud resource and, as such, is a first-of-a-kind system for the NSF; it is scaled at an investment and computational capability – 0.5 PetaFLOPS peak – that is consistent with this status. While the computational capability does not stand out within the spectrum of resources funded by the NSF and supported by XSEDE, the functionality does. Jetstream offers interactive virtual machines (VMs) provided through the user-friendly Atmosphere interface. The software stack consisting of Globus for authentication and data transfer, OpenStack as the basic cloud environment, and Atmosphere as the user interface has proved very effective although the OpenStack change cycle and intentional lack of backwards compatibility creates certain implementation challenges. Jetstream is a multi-region deployment that operates as a single integrated system and is proving effective in supporting modes and subdisciplines of research traditionally underrepresented on larger XSEDE-supported clusters and supercomputers. Already researchers in biology, network science, economics, earth science, and computer science have used it to perform research – much of it research in the “long tail of science.”
Subscriber Engagement Manager, Globus, University of Chicago
I can tell you all about Globus, our services, and our subscriptions for campuses and other research organizations. I am also part of the XSEDE team, focused on user requirements management and integrating other services with XSEDE.
Hardware virtualization has been gaining a significant share of computing timein the last years. Using virtual machines (VMs) for parallel computing is anattractive option for many users. A VM gives users a freedom of choosing anoperating system, software stack and security policies, leaving the physicalhardware, OS management, and billing to physical cluster administrators. Thewell-known solutions for cloud computing, both commercial (Amazon Cloud, GoogleCloud, Yahoo Cloud, etc.) and open-source (OpenStack, Eucalyptus) provideplatforms for running a single VM or a group of VMs. With all the benefits,there are also some drawbacks, which include reduced performance when runningcode inside of a VM, increased complexity of cluster management, as well as theneed to learn new tools and protocols to manage the clusters. At SDSC, we have created a novel framework and infrastructure by providing virtualHPC clusters to projects using the NSF sponsored Comet supercomputer.Managing virtual clusters on Comet is similar to managing a bare-metal cluster in terms of processes and tools that are employed. This is beneficial becausesuch processes and tools are familiar to cluster administrators. Unlikeplatforms like AWS, Comet's virtualization capability supportsinstalling VMs from ISOs (i.e., a CD-ROM or DVD image) or via an isolatedmanagement VLAN (PXE). At the same time, we're helping projects take advantageof VMs by providing an enhanced client tool for interaction with our managementsystem called Cloudmesh client. Cloudmesh client can also be used to managevirtual machines on OpenStack, AWS, and Azure.
Big Data problems dealing with a variety of large data sets are now common in a wide range of domain science research areas such as bioinformatics, social science, astronomical image processing, weather and climate dynamics, and economics. In some cases, the data generation and computation is done on high performance computing (HPC) resources, thus presenting an incentive for developing/optimizing big data middleware and tools to take advantage of the existing HPC infrastructures. Data-intensive computing middleware (such as Hadoop, Spark) can potentially benefit greatly from the hardware already designed for high performance and scalability with advanced processor technology, large memory/core, and storage/filesystems. SDSC Comet represents such a resource with large numbers of compute nodes with fast node local SSD storage, and high performance Lustre filesystems. This paper discusses experiences and benefits of using optimized Remote Direct Memory Access (RDMA) Hadoop and Spark middleware on the XSEDE Comet HPC resource, including some performance results of Big Data benchmarks and applications. Comet is a general purpose HPC resource so some work is needed to integrate the middleware to run within the HPC scheduling framework. This aspect of the implementation is also discussed in detail.