When it comes to research, having a strong cyberinfrastructure that supports advanced data acquisition, storage, management, integration, mining, visualization, and computational processing services, can be vital. However, building cyberinfrastructures (CI) — especially ones that aim to support multiple varied and complex scientific facilities — is a challenge.
In 2018, a team of researchers from institutions across the country came together to launch a pilot program aimed at creating a model for a cyberinfrastructure center of excellence for the National Science Foundation’s (NSF) Major Facilities (MFs). The goal was to identify how the center could serve as a forum for the exchange of cyberinfrastructure knowledge across varying fields and facilities, establish best practices for different NSF Major Facilities’ cyberinfrastructure, provide CI expertise, and address CI workforce development and sustainability.
“Over the past few years, my colleagues and I have worked to provide expertise and support for the NSF Major Facilities in a way that accelerates the data lifecycle and ensures the integrity and effectiveness of the cyberinfrastructure,” said Ewa Deelman, research professor of computer science and research director at the University of Southern California’s Information Sciences Institute and lead principal investigator. “We are proud to contribute to the overall NSF cyberinfrastructure ecosystem and to work with the NSF Major Facilities on solving their cyberinfrastructure challenges together, understanding that our work may help support the sustainability and progress of the MFs’ ongoing research and discovery.”
Five NSF Major Facilities were selected for the pilot: the Arecibo Observatory, the Geodetic Facility for the Advancement of Geoscience, the National Center for Atmospheric Research, the National Ecological Observatory Network, and the Seismological Facilities for the Advancement of Geoscience and EarthScope. As the pilot progressed, the program expanded to engage additional NSF Major Facilities.
The pilot found that MFs differ in types of data captured, scientific instruments used, data processing and analyses conducted, and policies and methods for data sharing and use. However, the study also found that there are commonalities between the various MFs in terms of the data lifecycle (DLC). As a result, the pilot developed a DLC model that captured the stages that data within an MF goes through. The model includes stages for 1) data capture; 2) initial processing near the instrument(s); 3) central processing at data centers or clouds; 4) data storage, curation, and archiving; and 5) data access, dissemination, and visualization. Finding these commonalities helped the pilot program develop common challenges and standardized practices for establishing overarching CI requirements and to develop a blueprint for a CI center of excellence that can address the pressing MF DLC challenges.
Now, with a new NSF award, the pilot program has begun phase two and become CI CoE: CI Compass, An NSF Center of Excellence dedicated to navigating the Major Facilities’ data lifecycle. CI Compass will apply its three years of initial evaluation and analyses for an improved cyberinfrastructure, as needed for the NSF’s Major Facilities.
“Cyberinfrastructure is a critical element for fulfilling the science missions for the NSF Major Facilities and a primary goal of CI Compass is to partner with MFs to enhance and evolve their CI,” said Anirban Mandal, assistant director for network research and infrastructure at the Renaissance Computing Institute at University of North Carolina at Chapel Hill, and co-principal investigator of the project. “In the process, CI Compass will not only act as a ‘knowledge sharing’ hub for brokering connections between CI professionals at MFs, but also will disseminate the knowledge to the broader NSF CI community.”
Angela P. Murillo, program director of applied data and information science, assistant professor in the School of Informatics and Computing at Indiana University-Purdue University Indianapolis, and co-principal investigator, continued, “By advising on, curating, and preserving the data collected by the NSF Major Facilities, we will help safeguard critical data for current and future scientists, enabling and ensuring scientific research and discovery for generations to come.”
CI Compass will enhance the overall NSF CI ecosystem by providing expertise where needed to enhance and evolve the MF CI, capturing and disseminating CI knowledge and best practices that power MF scientific breakthroughs, and brokering connections to enable knowledge sharing between and across MF CI professionals and the broader CI community.
“To accomplish their mission, the NSF Major Facilities need to rely on an efficient and reliable cyberinfrastructure that incorporates the difficult balance between offering access to most advanced technologies available while providing the robust services that stakeholders expect from a production environment,” said Valerio Pascucci, director of the Center for Extreme Data Management Analysis and Visualization and the John R. Parks Endowed Chair in Computer Science at the University of Utah, and co-principal investigator of the project. “The CI Compass team has proven the ability to reach this difficult balance by partnering with the NSF Major Facilities and providing the critical expertise needed to collaboratively design a solution that each facility needs for its particular use case.”
“Having a state-of-the-art cyberinfrastructure and related computational tools is necessary for each NSF Major Facility to conduct their day-to-day work and deliver data to a broader scientific community, both nationally and internationally,” said Jarek Nabrzyski, director of the University of Notre Dame’s Center for Research Computing, concurrent professor of computer science and engineering, and co-principal investigator of the project. “This project brings together a diverse group of experts who are able to assess the data lifecycle challenges and other related needs of each NSF Major Facility in order to help them accomplish their goals.”
Kerk F. Kee, associate professor of media and communication at Texas Tech University and senior personnel on the project, continued that sentiment, stating, “CI Compass will serve as the hub for knowledge sharing between and across NSF Major Facilities and the broader cyberinfrastructure community about technical solutions and best practices. At the heart of knowledge sharing is the inter-organizational communication and cross-disciplinary collaborations that will lead to scientific discoveries and impactful innovations in our society.”
The research institutions collaborating on CI Compass include Indiana University, Texas Tech University, the University of North Carolina at Chapel Hill, the University of Notre Dame, the University of Southern California, and the University of Utah.
This project is funded by the NSF Office of Advanced Cyberinfrastructure in the Directorate for Computer and Information Science and Engineering under grant number 2127548. The pilot effort was funded by CISE/OAC and the Division of Emerging Frontiers in the Directorate for Biological Sciences under grant number 1842042.
— Brandi Wampler and Joanne Fahey, Notre Dame Research