Award Abstract # 1931297
Collaborative Research: Elements: Advancing Data Science and Analytics for Water (DSAW)

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Awardee: UTAH STATE UNIVERSITY
Initial Amendment Date: September 9, 2019
Latest Amendment Date: April 3, 2020
Award Number: 1931297
Award Instrument: Standard Grant
Program Manager: Alan Sussman
alasussm@nsf.gov
 (703)292-7563
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Direct For Computer & Info Scie & Enginr
Start Date: October 1, 2019
End Date: September 30, 2022 (Estimated)
Total Intended Award Amount: $568,496.00
Total Awarded Amount to Date: $568,496.00
Funds Obligated to Date: FY 2019 = $568,496.00
History of Investigator:
  • Jeffery  Horsburgh (Principal Investigator)
    jeff.horsburgh@usu.edu
  • Tianfang  Xu (Co-Principal Investigator)
  • Brian  Crookston (Co-Principal Investigator)
  • Alfonso  Torres-Rua (Co-Principal Investigator)
Awardee Sponsored Research Office: Utah State University
1000 OLD MAIN HILL
LOGAN
UT  US  84322-1000
(435)797-1226
Sponsor Congressional District: 01
Primary Place of Performance: Utah State University
1415 Old Main Hill
Logan
UT  US  84322-1415
Primary Place of Performance
Congressional District:
01
Unique Entity Identifier (UEI): SPE2YDWHDYU4
Parent UEI: SPE2YDWHDYU4
NSF Program(s): Hydrologic Sciences,
Special Initiatives,
EnvS-Environmtl Sustainability,
Software Institutes,
EarthCube
Primary Program Source: 040100 NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 026Z, 077Z, 7923, 8004
Program Element Code(s): 1579, 1642, 7643, 8004, 8074
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Scientific challenges in hydrology and water resources such as understanding impacts of variable climate, sustainability of water supply with population growth and land use change, and impacts of hydrologic change on ecosystems and humans are increasingly data intensive. The volume of data produced by environmental scientists to study hydrologic systems requires advanced software tools for effective data visualization, analysis, and modeling. Scientists spend much of their time accessing, organizing, and preparing datasets for analyses, which can be a barrier to efficient analyses and hinders scientific inquiries and advances. This project will develop new software that will enhance scientists' ability to apply advanced data visualization and analysis methods (collectively referred to as "data science" methods) in the hydrology and water resources domain. The project will promote standardized software tools and data formats to help scientists enhance the consistency, share-ability, and reproducibility of the analyses they perform - all of which are important in building trust in scientific results. The software developed in the project will make data loading and organization for analysis easier, reducing the time spent by scientists in choosing appropriate data structures and writing computer code to read and parse data. It will enable users to automatically retrieve data from the HydroShare system, which is a hydrology domain data repository, as well as from important national water data sources like the United States Geological Survey's National Water Information System. The software will automatically load data from these sources into standardized and high performance data structures targeted to specific scientific data types and that integrate with visualization, analysis, and other data science capabilities commonly used by scientists in the hydrology and water resources domains. The project will also reduce the technical burden for water scientists associated with creating a computational environment within which to execute their analyses by installing and maintaining the Python packages developed within the Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) HydroShare-linked JupyterHub environment. Finally, the project will demonstrate the functionality and use of the software by producing a set of educational modules based on real water-data science applications that provide a specific mechanism for delivering the software to the community and promoting its use in classroom and research environments.

Scientific and related management challenges in the water domain are inherently multi-disciplinary, requiring synthesis of data of multiple types from multiple domains. Many data manipulation, visualization, and analysis tasks performed by water scientists are difficult because (1) datasets are becoming larger and more complex; (2) standard data formats for common data types are not always agreed upon, and, when they are, they are not always mapped to an efficient structure for visualization and/or analysis within an analytical environment; and (3) water scientists generally lack training in data intensive scientific methods that would enable them to use new and existing tools to efficiently tackle large and complex datasets. This project will advance Data Science and Analytics for Water (DSAW) by developing: (1) an advanced object data model that maps common water-related data types to high performance data structures within the object-oriented Python language and analytical environment based upon standard file, data, and content types established by the Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) HydroShare system; (2) two new Python packages that enable users to write Python code for automating retrieval of desired water data, loading it into high performance memory objects specified by the object data model designed in the project, and performing analysis in a reproducible way that can be shared, collaborated around, and formally published for reuse. The project will use domain-specific data science applications to demonstrate how the new Python packages can be paired with the powerful data science capabilities of existing Python packages like Pandas, numpy, and scikit-learn to develop advanced analytical workflows within cloud and desktop environments. The project aims to extend the data access, collaboration, and archival capabilities of the HydroShare data and model repository and promote its use as a platform for reproducible water-data science. The project also aims to overcome barriers associated with accessing, organizing, and preparing datasets for data science intensive analyses. Overcoming these barriers will be an enabler for transforming scientific inquiries and advancing application of data science methods in the hydrology and water resources domains.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Xu, Tianfang and Liang, Feng "Machine learning for hydrologic sciences: An introductory overview" WIREs Water , v.8 , 2021 https://doi.org/10.1002/wat2.1533 Citation Details

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page