Award Abstract # 1931174
Collaborative Research: Environmental Data Initiative: Sustaining the Legacy of Scientific Data

NSF Org: DBI
Div Of Biological Infrastructure
Recipient: UNIVERSITY OF WISCONSIN SYSTEM
Initial Amendment Date: July 29, 2019
Latest Amendment Date: March 7, 2023
Award Number: 1931174
Award Instrument: Standard Grant
Program Manager: Steven Ellis
stellis@nsf.gov
 (703)292-7876
DBI
 Div Of Biological Infrastructure
BIO
 Direct For Biological Sciences
Start Date: August 1, 2019
End Date: July 31, 2023 (Estimated)
Total Intended Award Amount: $1,986,265.00
Total Awarded Amount to Date: $1,986,265.00
Funds Obligated to Date: FY 2019 = $1,986,265.00
History of Investigator:
  • Paul Hanson (Principal Investigator)
    pchanson@wisc.edu
  • Margaret O'Brien (Co-Principal Investigator)
  • Corinna Gries (Former Principal Investigator)
  • Paul Hanson (Former Co-Principal Investigator)
Recipient Sponsored Research Office: University of Wisconsin-Madison
21 N PARK ST STE 6301
MADISON
WI  US  53715-1218
(608)262-3822
Sponsor Congressional District: 02
Primary Place of Performance: Center for Limnology
680 N Park St
Madison
WI  US  53706-1413
Primary Place of Performance
Congressional District:
02
Unique Entity Identifier (UEI): LCLSJAGTNZQ7
Parent UEI:
NSF Program(s): SABI: Sustained Availability o
Primary Program Source: 01001920DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1165
Program Element Code(s): 086Y00
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.074

ABSTRACT

The Environmental Data Initiative (EDI) facilitates the publication of environmental data generated by publicly funded research projects. With a mission to ensure the long-term viability and legacy of publicly funded scientific data, EDI is committed to making environmental data Findable, Accessible, Interoperable, and Reusable (FAIR). EDI provides support, training, and resources to help archive and publish high-quality data and metadata, providing accountability and transparency to data providers, while opening the door to answering new questions through Big Data analyses. EDI is actively engaged in the national and international community of data curators to promote data management best practices and stewardship. Programs served include, but are not limited to, Long Term Research in Environmental Biology (LTREB), Organization for Biological Field Stations (OBFS), MacroSystems Biology (MSB), and Long Term Ecological Research (LTER) within the NSF Division of Environmental Biology.

EDI is a collaborative effort among data practitioners, software developers, and research scientists at the University of New Mexico and the University of Wisconsin-Madison to provide a comprehensive data archive and publication service for ecological researchers. To achieve the overall mission of EDI, the project focuses on (1) curation and training services tailored to the needs of the environmental sciences community and (2) the support and maintenance of a state-of-the-art data repository. EDI services include direct management of data documentation by practitioners experienced in environmental and ecological data science, community training in data management practices and data archive workflows for contributing data to the repository, and software development for the creation of scientific metadata. Team members are experienced in data science best practices and software frameworks, including RStudio and Jupyter Notebooks, and are certified data and software Carpentry instructors. The PASTA+ software behind the EDI data repository is based on a Service Oriented Architecture (SOA). The EDI data repository is designed to be simple, durable, extensible, and congruent with the FAIR guiding principles. EDI has created a feature-rich, public-facing Data Portal that provides a user-friendly web-browser interface to the repository, allowing users to evaluate and upload data, and discover and view data and associated metadata. The Data Portal also serves as a complete reference implementation in Java for guiding other software developers on the use of the PASTA+ application programming interface which enables developers to interact directly with the repository at its lowest level. These technologies and services are designed to promote the persistence and use of scientific data and to ensure that all environmental scientists have access to high-quality data curation and publication services. The EDI repository may be found at https://environmentaldatainitiative.org/.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Vanderbilt, Kristin and Ide, Jon and Gries, Corinna and Grossman?Clarke, Susanne and Hanson, Paul and O'Brien, Margaret and Servilla, Mark and Smith, Colin and Waide, Robert and Zollo?Venecek, Kyle "Publishing Ecological Data in a Repository: An Easy Workflow for Everyone" The Bulletin of the Ecological Society of America , v.103 , 2022 https://doi.org/10.1002/bes2.2018 Citation Details
Gries, Corinna and Hanson, Paul C. and O'Brien, Margaret and Servilla, Mark and Vanderbilt, Kristin and Waide, Robert "The Environmental Data Initiative: Connecting the past to the future through data reuse" Ecology and Evolution , v.13 , 2023 https://doi.org/10.1002/ece3.9592 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Transparency in science is the most critical factor in generating trust between the research community performing scientific investigations and the general public, who validate the efficacy of such research by measuring the impact of funded outcomes on their livelihoods. The Environmental Data Initiative (EDI) is an integral component of scientific transparency by providing an open and accessible data repository to host and maintain data collected through publicly funded research. EDI was established in July 2016 with the transition of information management that began under the National Science Foundation’s (NSF) Long Term Ecological Research (LTER) Network program, specifically the LTER Network Information System. With its second award from the NSF, EDI continued its mission to ensure that environmental and ecological research data collected under grants from the Directorate of Biological Sciences had a secure digital repository to be archived and made available to any interested party, whether academic, governmental, or general public.

This most recent performance period, between August 2019 and July 2023, allowed EDI to refine its focus on two primary goals. The first goal was to maintain the stability and resilience of EDI’s data repository operations, while the second goal was to significantly improve the data documentation and curation process applied to the archive and publication of research data. EDI’s data repository infrastructure is operated at the University of New Mexico and managed at the Center for Advanced Research Computing. The data repository consists of multiple virtual servers configured to communicate with one another in a service-oriented architecture (SOA), including a specific server that stores and maintains the digital data. Each server runs specific software written in Java or Python and provides a pivotal service to the repository. The collective group of servers represents the entirety of the repository. The hardware that provides the repository services and storage capacity consists of two high-performance blade servers and 120 Terabytes of high-speed storage. A small number of servers that require high availability operate in the AWS cloud environment; these servers provide operational oversight of the EDI data repository and report any deviation in the repository’s state of health. The EDI data repository has provided continuous service to the science community with negligible interruptions. The repository is home to just over 85,000 data packages (an aggregate of one science metadata document and one or more science data files) and 258,000 individual data files totaling nearly 15 Terabytes of volume. A total of 15,239 data packages were added to the repository, along with 3.7 Terabytes of data, during the performance period of this grant. Almost all data are publicly accessible through the EDI Data Portal website (https://portal.edirepository.org). There is a small exception for data that may expose the location or critical information about endangered species or are under review for manuscript publication.

EDI has dramatically impacted the process of documenting scientific data through the services provided by a team of experienced data curators at the University of Wisconsin-Madison. These curators, 2 regular staff and 3 graduate students offer case-by-case assistance with creating science metadata, which describes the data and its detailed collection history using a very expressive structured metadata language called the Ecological Metadata Language. Assistance varies from general counsel on documentation best practices to reviewing and editing specific content, including content concerning collection methodologies and standard scientific units. EDI has also championed the development of curation software that helps automate the generation of EML metadata. Two software packages have been developed during the performance period: the EMLAssemblyLine (EAL), which is an R software library to create EML from information stored in a Microsoft Word document, and ezEML (https://ezeml.edirepository.org), a web browser-based application that guides a user to enter the correct information required by EML. Between the support of curators and the use of the EDI curation software packages, EDI provided direct assistance to customers by helping to archive 1,512 data packages during this performance period. Moreover, EDI scientists continued the investigation of best practices for data documentation through improved metadata to make science data Findable, Accessible, Interoperable, and Reusable (FAIR) - a strong recommendation of all science communities worldwide.

 


Last Modified: 11/27/2023
Modified by: Paul C Hanson

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page