Committee on Strategy and Budget (CSB)
Task Force on Data Policies (DP)
The increasing ease of gathering large amounts of varied data—including digital data, research specimens, artifacts, etc. - and funding of large-scale collaborative projects, have caused the broad policy issues surrounding the management of scientific and engineering research data to become critically important. How data collected with National Science Foundation (NSF) funding are shared and managed to ensure broad, timely, and long-term availability and accessibility to the entire research community is an important issue. A determination of what, if any, NSF policies related to data sharing and management would be in the best interests of the Nation’s scientific and engineering enterprise warrants careful examination by the National Science Board (NSB).
Significant policy debate on this broad set of issues is ongoing at both national and international levels, with many stakeholders and organizations involved. Past and ongoing efforts by the Board, NSF as a whole, and other organizations could inform the current effort. In addition to reports from the National Science and Technology Council (NSTC) and the National Research Council (NRC)1, especially relevant to this effort is the NSB Report Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century (NSB-05-40, September 2005).
Given that sharing and managing research data are problematic for the entire international research community, the NSB, in taking up this topic, has a real opportunity to contribute productively to a significant and ongoing policy discussion. The policy issues surrounding data are critically important at both national and international levels and for NSF as we carry out our mission to promote the progress of science.
The issues surrounding data sharing and management - of which there are many - are complex and include broad and timely access to data, sustainability of data (particularly of digital data), the cost burdens associated with data management, and openness of data generated with taxpayer dollars, to name a few.
Charge to the NSB CSB Task Force on Data Policies
The NSB CSB Task Force on Data Policies was established at the February 3-4, 2010 NSB meeting with the charge of further defining the issues and outlining possible options to make the use of data more effective in meeting NSF's mission.
Membership on the NSB CSB Task Force on Data Policies: Dr. José-Marie Griffiths, chairman, and Drs. Mark Abbott, Camilla Benbow, John Bruer, Bud Peterson, Diane Souvaine, Thomas Taylor, and Mr. Arthur Reilly, members, with Executive Secretary Dr. Philip Bogden, NSF. NSF Liaison members on the Task Force are Drs. Myron Gutmann (Assistant Director, SBE) and Ed Seidel (Assistant Director, MPS).
Process and Strategies
This work plan describes the process and strategies for gaining input from stakeholders regarding their understanding of the NSF data policies along with current data sharing and management practices. The stakeholder groups are both internal and external to NSF and mainly include research communities and their institutions (external) and NSF program officers (internal). The input gained from this study will inform the task force on how best to proceed with follow-up action, which includes detailing the findings, deliberating recommendations, discussing recommendations with NSF leadership, and working together to find the best solutions.
The first step for the Task Force is to hear from the NSF Data Working Group. Then it will work with the Board and NSF senior staff to further define the issues and outline possible options to make the use of data more effective in meeting NSF’s mission. During this period, the Task Force will solicit input widely from the research and stakeholder communities and may solicit special studies as appropriate.
The Task Force's strategy on developing Data Policies is multi-phased:
- NSF updated implementation of long-standing data policy – the Data Management Plan requirement – should go into effect in
January 2011 and will become a starting point for the Task Force. The Task Force will monitor the impact of this implementation
change in order to inform a review of NSF policy.
- Considering issues of data policy, Open Data movements, and related issues, the Task Force will then develop a "Statement of
- Provide guidance to subsequent Board efforts to develop specific actionable policy recommendations focused, initially, on NSF,
but that could potentially promulgate through other Federal agencies in a national and international context.
This effort requires significant background material on current NSF data policies; data policies at other Federal agencies; data policies at international counterparts to NSF; and the views of NSF awardees on the value of data policies and the impact on the administrative burden. A survey of researchers/PIs may also need to be considered.
The steps in the process are as follows:
- Receive update from Dr. Edward Seidel on NSF's plans to enhance the enforcement of existing data policy.
- Determine the way the current data policies, and their instructions, are interpreted and utilized by both proposers and NSF
program staff. Solicit input of Program Directors.
- Interviews with key stakeholders conducted by Task Force leads.
- Prepare a Statement of Principles.
- Assess further need for NSB study.
Attached are a Proposed Timeline and an appendix of possible Data Policy Issues.
Data Policies Task Force Timeline
|April – May 2010
||Task Force members consider the questions they want answered; the information necessary to
attain the answers; and the means by which to gather the information
|May 4-5, 2010
||Task Force meeting at Board meeting to discuss next steps in proceeding with internal and
|May – August 2010
||Develop a Statement of Principles
|August 25-26, 2010
||Task Force meeting at Board meeting to approve charge, review and revise plan, review
draft Statement of Principles, discuss plans for workshop of key stakeholders to be held in winter
|August – Sept. 2010
||Review and compile findings
||Offsite Board meeting/Informal discussion of progress
|Sept. – Dec. 2010
||Proceed with internal and external research and begin to formulate recommendations
|Dec. 1-2, 2010
||Task Force meeting at Board meeting to review and discuss results of research
|Dec. – Feb. 2011
||1- or 2-day Workshop of key stakeholders
|Feb. – May 2011
||Draft final report with findings and recommendations for data policies
1 NSTC Interagency Working Group on Digital Data, Harnessing the Power of Digital Data for Science and Society (January 2009); and NRC's Ensuring the Utility and Integrity of Research Data in a Digital Age (2009).
Appendix: Possible Data Policy Issues
- Internal policies that could be addressed include:
- Defining what constitutes the release of "complete" data. Would complete data release include the original, "raw" data;
cleaned-up, publication-ready data, along with the methods for clean-up; publication-ready data with the meta-data necessary
to reproduce any interpretations of the data; raw data with software to make it usable to others; data organized in a way that
is inter-operable to some standard; etc.?
- Defining what types of "data" are to be shared - should we add specimens, samples, etc.?
- Defining what "sharing" entails - what is expected of principal investigators and awardee institutions? Who is responsible
for ensuring persistent access?
- Defining good data management/curation practices.
- Timeline for release of data (e.g., a certain time period after collection, after publication of results, etc.).
- Timeframe for continued availability of data - forever?
- Balance between acknowledging variations in the expectations of different disciplines and research communities regarding
the proprietary nature of data and setting agency-wide data policies.
- Potential NSF guidelines to awardees relating to management of data that could, for example, require awardees to develop a
data management plan with certain components that is peer-reviewed and considered part of the terms and conditions of the
- Particularly significant impact of the data policies of NSF-funded large facilities and centers on whole research communities.
Merit, if any, of including data policies as part of the site-visits and design reviews of large centers and
- NSF role, if any, in setting standards for meta-data requirements. If processed data is made available, determining what
the requirements should be for making available the work processes performed on the data so that its provenance can be
- NSF role, if any, in setting standards for data formats for sharing and exchange, as well as for long-term curation.
- NSF role, if any, in setting requirements for data "publishing" or deposit.
- NSF role, if any, in off-setting or funding the administrative burden placed on awardee institutions and principal investigators
by any required data management policies.
- Technical considerations in archiving and ensuring the accessibility of many types of data that are becoming more and more complex.
Just as "publications" are often no longer exclusively a printed piece of paper and often involve supplemental material provided in a
variety of electronic media, "data" may not be simply original data or measurements, but raw data in the context of its associated
- What proprietary rights, if any, are appropriate for a principal investigator relating to data retention and usage?
- Accessibility of data for evidence-based policy development.
- Identification of the appropriate party or parties who should be responsible for ensuring the long-term archiving and curation of
data, both for the cost burden and implementation. Possibilities include NSF, awardee institutions, principal investigators, a
combination of the above, etc.
- Merit, if any, of a national repository (or multiple repositories) for data and the appropriateness of NSF's assisting in funding
such repositories, helping set standards for such an effort, and/or requiring awardees to deposit data in such
- Impact of the NSF DataNet program on data management.
- International complexities, particularly for large facilities with international partnerships.
- Legal complexities.
- Potential overlap of policy issues between the curatorship of physical specimens and the management of large, and often digital, datasets.