Diverse areas of science and engineering are increasingly driven by high-throughput automated data capture and analysis. Modern acquisition technologies, used in many scientific applications (e.g., astronomy, physics, materials science, geology, biology, and engineering) and often running at gigabyte per second data rates, quickly generate terabyte to petabyte datasets that must be stored, shared, processed and analyzed at similar rates. The largest datasets are often multidimensional, such as volumetric and time series data derived from various types of image capture. Cost-effective and timely processing of these data requires system and software architectures that incorporate on-the-fly processing to minimize I/O traffic and avoid latency limitations. In this paper we present the Virtual Volume File System, a new approach to on-demand processing with file system semantics, combining these principles into a versatile and powerful data pipeline for dealing with some of the largest 3D volumetric datasets. We give an example of how we have started to use this approach in our work with massive electron microscopy image stacks. We end with a short discussion of current and future challenges.
Brown Dog is an extensible data cyberinfrastructure, that provides a set of extensible and distributed data conversion and metadata extraction services to enable access and search within unstructured, un-curated and inaccessible research data across different domains of sciences and social science, which ultimately aids in supporting reproducibility of results. We envision that Brown Dog, as a data cyberinfrastructure, is an essential service in a comprehensive cyberinfrastructure which includes data services, high performance computing services and more that would enable scholarly research in a variety of disciplines that today is not yet possible. Brown Dog focuses on four initial use cases, specifically, addressing the conversion and extraction needs in the research areas of ecology, civil and environmental engineering, library and information science, and use by the general public. In this paper, we describe an architecture that supports contribution of data transformation tools from users, and automatic deployment of the tools as Brown Dog services in diverse infrastructures such as cloud or high performance computing (HPC) based on user demands and load on the system. We also present results validating the performance of the initial implementation of Brown Dog.
Reliable mesh-based PDE simulations are needed to solve complex engineering problems. Mesh adaptivity can increase reliability by reducing discretization errors, but requires two or more software components to exchange information. Often, components exchange information by reading and writing a common file format. On massively parallel computers filesystem bandwidth is a critical performance bottleneck. Our data stream and component interface approaches avoid the filesystem bottleneck. In this paper we present the approaches and discuss their use within the PHASTA computational fluid dynamics solver and Albany Multiphysics framework. Information exchange performance results are reported on up to 2048 cores of a BlueGene/Q system.
With clouds becoming a standard target for deploying applications, it is more important than ever to be able to seamlessly utilize resources and services from multiple providers. Proprietary vendor APIs make this challenging and lead to conditional code being written to accommodate various API differences, requiring application authors to deal with these complexities and to test their applications against each supported cloud. In this paper, we describe an open source Python library called CloudBridge that provides a simple, uniform, and extensible API for multiple clouds. The library defines a standard ‘contract’ that all supported providers must implement, and an extensive suite of conformance tests to ensure that any exposed behavior is uniform across cloud providers, thus allowing applications to confidently utilize any of the supported clouds without any cloud-specific code or testing.
Optimization and design of nuclear fusion devices is a complex task with large computational requirements. The complexity is defined by the number of parameters involved in every single possible optimization function that focuses on the different aspects of plasma confinement. This paper presents a possible optimization of an existing nuclear fusion device. The optimization process is carried out by a parallel algorithm specifically designed to work with large scale problems. While the focus of the paper is fusion, the approach used can be applied to any other large scale problem. We have run our experiments on an HPC cluster. The results show the validity of our approach and how complex scientific problems can benefit from the outcomes of this work.
Historically, large scale computing and interactivity have been at odds. This is a particularly sore spot for data analytics applications, which are typically interactive in nature. To help address this problem, we introduce a new client/server framework for the R language. This framework allows the R programmer to remotely control anywhere from one to thousands of batch servers running as cooperating instances of R. And all of this is done from the user's local R session. Additionally, no specialized software environment is needed; the framework is a series of R packages, available from CRAN. The communication between client and server(s) is handled by the well-known ZeroMQ library. To handle computations, we use the established pbdR packages for large scale distributed computing. These packages utilize HPC standards like MPI and ScaLAPACK to handle complex, tightly-coupled computations on large datasets. In this paper, we outline the client/server architecture, discuss the pros and cons to this approach, and provide several example workflows which bring
This paper describes an XSEDE Extended Collaboration Support Service (ECSS) effort on scaling a campus developed online BLAST service (BLASTer) into an XSEDE gateway bridging the gap between genomic researchers and advanced computing and data environments like those found in the Extreme Science and Engineering Discovery Environment (XSEDE) network. Biologists and geneticists all over the world use the suite of Basic Local Alignment Search Tools (BLAST) developed by the National Center for Biotechnology Information (NCBI) throughout the full spectrum of genomic research. It has become one of the defacto bioinformatics applications used in all variety of computing environments. BLASTer allows researchers to achieve those tasks faster and without expert computing knowledge by converting BLAST jobs to a parallel model. It handles all of the details of computation submission, execution, and database access for users through an intuitive web-based interface provided by the unique features of the HUBzero gateway platform. This paper details the core development of BLASTer for campus computing resources at Purdue University, some of its successes among the user community, and the current efforts by an ECSS scientific gateways project from XSEDE to include data-intensive use of resources like Wrangler at the Texas Advanced Computing Center (TACC) within the XSEDE network. The lessons learned from this project will be used to bring other XSEDE computing resources to BLASTer in the future and other programs like BLASTer to XSEDE users.
Geospatial data, also known as spatial data or geographic information, is data or information representing physical objects with an explicit geographic component on the surface or near-surface of the Earth. The increased volume and diversity of geospatial data have caused serious usability challenges to researchers in various scientific domains, which calls for a cyberGIS solution. To address these issues, this paper presents a cyberGIS community data service framework to facilitate big geospatial data access, processing, and sharing based on a hybrid supercomputer architecture. Through the collaboration between the CyberGIS Center at the University of Illinois at Urbana-Champaign (UIUC) and the U.S. Geological Survey (USGS), a community data service for accessing, customizing, and sharing digital elevation model (DEM) and its derived datasets from the 10m national elevation dataset, namely TopoLens, is developed to demonstrate the pipelined integration of big geospatial data sources, computation needed for customizing the original dataset for end user needs, and a highly usable online user environment. TopoLens provides online access to precomputed and on-demand computed high-resolution elevation data by leveraging the ROGER supercomputer. The need for building such services for GIScientists and the usability of this prototype service have been acknowledged in community evaluation.
The SEAGrid Science Gateway provides researchers and educators a unified access to computational resources in a seamless fashion. SEAGrid has evolved from Computational Chemistry Grid infrastructure and is celebrating 10 years of service to the community that initially provided computational chemistry tools. This scope has since broadened to encompass other areas of Science and Engineering. SEAGrid is built on the Apache Airavata science gateway framework. This paper describes the technical architecture and internal underpinnings of the SEAGrid Science Gateway.
Suresh Marru is a Member of the Apache Software Foundation and is the current PMC chair of the Apache Airavata project. He is the deputy director of Science Gateways Research Center at Indiana University. Suresh focuses on research topics at the intersection of application domain... Read More →
For the past few years, OpenACC has been the primary directive-based API for programming accelerator devices like GPUs. OpenMP 4.0 is now a competitor in this space, with support from different vendors. In this paper, we describe an algorithm to convert (a subset of) OpenACC to OpenMP 4; we implemented this algorithm in a prototype tool and evaluated it by translating the EPCC Level 1 OpenACC benchmarks. We discuss some of the challenges in the conversion process and propose what parts of the process should be automated, what should be done manually by the programmer, and what future research and development is necessary in this area.
The Open OnDemand Project is an open-source software project, based on the proven OSC OnDemand platform, to allow HPC centers to install and deploy advanced web and graphical interfaces for their users. The Open OnDemand team is completing the first year of the project and releasing its first version this summer. In this paper, we describe the user experience and design of Open OnDemand and discuss next steps for the open source project.
Batch environments are notoriously unfriendly because it's not easy to interactively diagnose the health of a job. A job may be terminated without warning when it reaches the end of an allotted runtime slot, or it may terminate even sooner due to an unsuspected bug that occurs only at large scale. Two strategies are proposed that take advantage of DMTCP for system-level checkpointing. First, we describe how to easily implement extended batch sessions that overcome the typical limitation of 24 hours maximum for a single batch job on large HPC resources. This removes the necessity for the application-specificcheckpointing found in many long-running codes. Second, we describe a three-phase debugging strategy that permits one to interactively debug long-running MPI applications that were developed for non-interactive batch environments.
The XSEDE User Portal (XUP) is a web interface providing a set of user specific XSEDE services and documentation to a diverse audience. The XUP architecture started out depending on monolithic services provided by large Java libraries, but continues to evolve to use an application programming interface (API) powered by a set of microservices. The goal is to have the XUP API provide development and deployment environments that are agile, sustainable, and capable of handling feature changes. In making this transition, we have developed guidelines for API services that balance complexity and reuse needs with flexibility requirements. In doing so, we have also created our own set of best practices on how to convert to using microservices. In this paper we will use the XSEDE User Portal API development as a case study to explain our rationale, approach, and experiences in working with microservices in a real production environment to provide better and more reliable science services for end users.
The CIPRES Science Gateway (CSG) is a public resource created to provide access to community phylogenetics codes on high performance computing resources. The CSG has been in operation since 2009, and has a large and growing user base. As a popular resource, the CSG provides an opportunity to study user behavior and job submissions in a Gateway environment. Here we examine CSG user and data turnover, jobs submissions success rates, and causes for job failures. The results of our investigation provide a better understanding of the populations that use the CSG, and point to areas where improvements can be made in meeting user needs and using resources more efficiently.
In this paper, we first present a brief summary of the Neuroscience Gateway (NSG) which has been in operation since 2013. NSG is providing computational neuroscientists access to Extreme Science and Engineering Discovery Environment (XSEDE) high performance computing (HPC) resources. As a part of running the NSG we have interacted closely with the neuroscience community. This has given us the opportunity to receive input and feedback from the neuroscience researchers regarding their cyberinfrastructure needs. This is now more important given the context of the BRAIN (Brain Research through Advancing Innovative Neurotechnologies) Initiative which is a national initiative announced in 2013. Based on this interaction with the neuroscience community and the input we have received for the last three years, we analyze the comprehensive cyberinfrastructure needs of the neuroscience community in the second part of the paper.
Rapid secure data sharing and private online discussion are requirements for coordinating today’s distributed science teams using High Performance Computing (HPC), visualization, and complex workflows. Modern HPC infrastructures do a good job of enabling fast computation, but the data produced remains within a site’s storage and network environment tuned for performance rather than broad easy access. To share data and visualizations among distributed collaborators, manual efforts are required to move data out of HPC environments, stage data locally, bundle data with metadata and descriptions, manage versions, build animations, encode videos, and finally post it all online somewhere for secure access and discussion among project colleagues. While some of these tasks can be scripted, the effort remains cumbersome, time-consuming, and error prone. A more streamlined approach is needed. In this paper we describe SeedMe – the Stream Encode Explore and Disseminate My Experiments platform for web-based scientific data sharing and discussion. SeedMe provides streamlined and application-controlled automatic data movement from HPC and desktop environments, metadata management, data descriptions, video encoding, secure data sharing, threaded discussion, and, optionally, public access for education and outreach.
As the computational movement gains more traction in the scientific community, there is an increasing need to understand what drives adoption and diffusion of tools. This investigation reveals what makes a computational tool more easily adopted by users within the e-science community. Guided by Rogers’s [1] Diffusion of Innovations theory, we set out to identify the innovation attributes of a range of computational tools across domains. Based on 135 interviews with domain scientists, computational technologists, and supercomputer center administrators across the U.S. and a small portion from Europe, systematic analysis revealed 10 key attributes. They are: driven by needs, organized access, trialability, observability, relative advantage, simplicity, compatibility, community-driven, well-documented, and adaptability. We discuss the attributes in the form of questions stakeholders should keep in mind while designing and promoting the tools. We also present strategies associated with each attribute. The 10 attributes and associated questions can serve as a checklist for e-science projects that aim to promote their computation tools beyond the incubators. This paper is submitted to the "Software and Software Environments" track because it has implications for engagement of user communities.
Michelle Williams is an MS candidate in health & strategic communication at Chapman University in Orange, California. She is also a graduate research assistant in the Chapman's OCT (Organizing, Communication, & Technology) Research Group. Besides this paper, Michelle researches... Read More →
Thursday July 21, 2016 11:00am - 11:30am EDT
Brickell
HPC centers run a diverse set of applications from a variety of scientific domains. Every application has different resource requirements, but it is difficult for domain experts to find out what these requirements are and how they impact performance. In particular, the utilization of shared resources such as parallel file systems may influence application performance in significant ways that are not always obvious to the user. We present a tool, REMORA, that is designed to provide the information that is most critical for running efficiently an application in HPC systems. REMORA collects runtime resource utilization data for a particular job execution and presents a user-friendly summary on completion. The information provided forms a complete view of the application interaction with the system resources, which is typically missing from other profiling and analysis tools. Setting up and running REMORA requires trivial effort, and can be done as a regular user with no special permissions. This enables both users and administrators to download the tool and identify a particular application’s resource requirements within minutes, helping in the diagnosis of errors and performance issues. REMORA is designed to be scalable and have minimal impact on application performance, and includes support for NVIDIA GPUs and Intel Xeon Phi coprocessors. It is open source, modular, and easy to modify to target a large number of HPC resources.