William Miller OAC Office of Advanced Cyberinfrastructure (OAC)
CSE Direct For Computer & Info Scie & Enginr
Start Date:
October 1, 2018
End Date:
September 30, 2021 (Estimated)
Awarded Amount to Date:
$299,973.00
Investigator(s):
Jose Fortes fortes@ufl.edu (Principal Investigator)
Sponsor:
University of Florida
1 UNIVERSITY OF FLORIDA
GAINESVILLE, FL
32611-2002
(352)392-3516
NSF Program(s):
NSF Public Access Initiative
Program Reference Code(s):
7916
Program Element Code(s):
7414
ABSTRACT
Biodiversity research investigates the variety and variability of life on Earth. This field of science crosses many research disciplines such as genetics, studies of organisms, plants and animals, habitats and ecosystems, and their interactions. A long-standing challenge for biodiversity researchers is to find, access, "mine", and integrate complex and diverse information from those disciplines. New approaches have now become possible with the increasing availability of "big data" techniques and infrastructure. This project will explore and employ such advanced techniques for retrieval and mining of a wide range of available open biodiversity data sources, with the aim of generating an improved holistic picture or "knowledge graph" of Earth's biodiversity. The project will also identify the data practices and discovered relationships that were needed to accomplish this graph-building task, with the aim of informing the development of future data systems and training on these techniques.
Many attempts have been made to link together biodiversity knowledge using linked identifiers coupled with data standards and taxonomies, but satisfactory results with such "exact matching" approaches have been elusive. This project aims to develop new methods of relating records across datasets that do not rely on matching identifiers but instead employ inferred rather than explicit relationships between data records. This is an experimental approach that has not yet been attempted at scale. Linkages between publicly available biodiversity, genetic, literature, and other data will be explored; and software infrastructure will be developed to combine and link multiple biodiversity datasets. Another goal is to quantify the relationship between identifier practices and the ability to construct links between available biodiversity, genetic, literature, and other data. This project will draw on and complement other large ongoing collaborative efforts that contribute to broad integration of biodiversity knowledge, data science, and infrastructure such as the Encyclopedia of Life (EOL) and the NSF-supported iDigBio project. The ultimate aim is to understand which data practices provide the most value to the biodiversity community and thereby inform policy, standards, and training on identifiers. This, in turn, can enable the exploration of new fundamental and cross-disciplinary research questions, and potentially improve practices of a wide range of US and international data aggregators and data producers.
This project is supported by the National Science Foundation's Public Access Initiative which is managed by the NSF Office of Advanced Cyberinfrastructure on behalf of the Foundation.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
Poelen, Jorrit H. and Schulz, Kayja and Trei, Kelli J. and Rees, Jonathan A. "Finding Identification of Keys in the Biodiversity Heritage Library," Biodiversity Heritage Library (BHL) and Global Names (GN) Workshop. Champaign, Illinois, 2019.
Poelen, J. H.. "Global Biotic Interactions: Benefits of Pragmatic Reuse of Species Interaction Datasets, 10.17605/OSF.IO/9JT24," Slides of Seminar at Leibniz Institute for Zoo and Wildlife Research Berlin, Germany on 9 January 2020, 2020.
Poelen, J. H.. "To connect is to preserve: on frugal data integration and preservation solutions, 10.17605/OSF.IO/A2V8G," Society for Preservation of Natural History Collections (SPNHC) Annual Meeting. Chicago, 2019.
Poelen, J. H.. "Reliable Data Use In R, 10.17605/OSF.IO/VKJ9Q," 4th Annual Digital Data in Biodiversity Research, 1-3 June 2020, 2020.
Elliott, M. and Poelen, J.H. and Fortes, J.A.B.. "Reliable Dataset Identifiers Are Essential Building Blocks For Reproducible Research," 4th Annual Digital Data in Biodiversity Research, 1-3 June 2020, 2020.
Elliott, Michael J. and Poelen, Jorrit H. and Fortes, José A.B.. "Toward reliable biodiversity dataset references," Ecological Informatics, v.59, 2020.
Please report errors in award information by writing to: awardsearch@nsf.gov.