Afterword: Data Gaps and Needs

Science and Engineering Indicators leaves many questions about the state of the S&E enterprise unanswered. Nationally representative or internationally comparable information is lacking about significant factual aspects of the S&T community in the United States and abroad. Following are some examples.

Chapter 1. Elementary and Secondary Education

  • Informal learning experiences in K–12 education, including advanced courses taken in local colleges or via distance learning; participation in research, science or technology competitions, or internships; advanced coursetaking in engineering; and involvement in informal S&E learning through museums, science centers, zoos, planetariums, aquariums, and similar community-based institutions
  • Teacher preparation and quality, including elementary teacher qualifications in science, technology, engineering, and mathematics (STEM) disciplines and STEM teacher test scores on subject matter knowledge
  • STEM teacher career paths, including better data on teacher mobility across different kinds of schools and districts, reentry into teaching, and teachers on temporary visas or other noncitizen teachers
  • Teacher involvement in informal learning

Chapter 2. Higher Education in Science and Engineering

  • Emergence of multidisciplinary degree programs, new fields, and new institutional forms
  • Student involvement in research experiences or in cooperative learning programs
  • Undergraduate involvement in R&D work
  • Quality indicators for postsecondary STEM teaching

Chapters 1 and 2

  • Internationally comparable indicators of curriculum content or rigor
  • Indicators of achievement or interest in STEM for gifted students at all education levels

Chapter 3. Science and Engineering Labor Force

  • Internationally comparable data on S&T workforce characteristics
  • Worldwide data, including industry breakdowns, on international flows of workers with S&T training, in S&T-related occupations, and/or performing R&D
  • S&T-related skills used in the workforce and non-S&T skills that S&E workers use in their jobs
  • Data on the role of postdoctorates in the nonacademic S&E workforce
  • Employer-provided training and other forms of lifelong learning for S&E workers
  • S&E workforce location relative to employer location

Chapters 4. Research and Development: National Trends and International Linkages, and 6. Industry, Technology, and the Global Marketplace

  • R&D by line of business (For companies with more than one line of business, current industry R&D data attribute R&D to the company as a whole and not necessarily to the part of the company for which the work is done.)
  • R&D in relation to firm or line-of-business characteristics, including profitability, productivity, growth, etc.
  • R&D performance data on very small companies (fewer than five employees), state and local governments, nonprofit organizations, and individuals performing R&D independent of a corporation, university, or other organization
  • Non-S&E R&D outside academic institutions (Other countries collect these data and include them in their national statistics.)
  • R&D in international commerce, including R&D performed in the United States that is financed from foreign sources, characteristics (e.g., basic, applied, or development work; location) of R&D expenditures by U.S. affiliates of foreign multinational corporations, characteristics of R&D expenditures by foreign affiliates of U.S. multinational corporations, and trade in knowledge-intensive service industries
  • Innovation indicators, including technology licensing; numbers, characteristics, R&D activities, and other operations data for business technology alliances; and technology parks, clusters, and incubators
  • Outsourcing and offshoring of S&E jobs

Chapter 5. Academic Research and Development

  • R&D funded from institutional or departmental resources and not separately budgeted, including use of funds for infrastructure, equipment, student support, and other purposes, and ultimate source of institutional or departmental funds
  • R&D expenditures by U.S. corporations at foreign universities and by foreign corporations at U.S. universities
  • Individuals who author S&E articles (Current data attribute articles to institutions or departments and do not include information about the characteristics of individual authors [e.g., employer, employment sector, disciplinary background, national origins, collaborative patterns, career stage, main work activities])
  • Indicators of multidisciplinary S&E research
  • Accessibility, use, and other characteristics of large, curated academic databases

Chapters 4 and 5

  • Indicators of the spread, development, and use of R&D-related cyberinfrastructure
  • Worldwide centers of R&D excellence by discipline and industry

These gaps are descriptive and could be addressed with new data. However, in many cases, gaps are as much analysis gaps as they are data gaps. To understand the global flow of S&E workers, for example, will require not only better, more internationally comparable data about credentials, skills, and migration patterns, but will also require developing models and testing hypotheses based on data that already exist (Regets 2007). Similarly, understanding the determinants of technological innovation involves building theories of innovation, testing them against existing data, and identifying and collecting new data that would be necessary to elaborate and test promising theoretical models (Nelson 1993). Accordingly, as part of a recent White House Office of Science and Technology Policy initiative, the National Science Foundation (NSF) has begun a program to support fundamental research aimed at developing a Science of Science and Innovation Policy. The initial emphases of the program are on analytic tools and model building.

Many other questions relevant to science policy involve a similar interplay among theory, analysis, and data. In addition, compelling answers to the "why" and "what if" questions that policymakers often ask can remain uncertain even when data bearing on these questions are available.

The federal government and its statistical agencies continuously engage in efforts to address significant data gaps or enhance the quality of the data generated from ongoing collections. Current examples include:

  • Redesign of NSF's Survey of Industrial Research and Development to collect data on the line of business to which R&D is attributable in diversified firms, foreign R&D activities of companies that do R&D in the United States, technology licensing activities, and demographic and educational characteristics of the U.S. R&D workforce.
  • A project of NSF's Division of Science Resources Statistics (SRS) to count nonacademic postdoctorates and collect data on the work roles and demographic, career, and educational characteristics of postdoctorates.
  • Collaboration between the Department of Homeland Security and SRS to examine whether immigration records can be made available for use as a basis for collecting more timely and complete data on foreign-educated scientists and engineers.
  • A Department of Commerce advisory committee effort to identify "holes" in the national data collection system that limit the nation's ability to measure innovation.

Collecting high-quality data can be exceedingly expensive, and governments cannot afford to collect all the data they could use productively. Beyond cost, however, there are numerous other persistent obstacles to remedying data gaps:

  • Many concepts in the list of data gaps are difficult to measure. Informal learning experiences, teaching quality, S&E-related workplace training, multidisciplinary research, and innovation are less readily classified and quantified than many of the S&E indicators reported in this volume.
  • For difficult-to-measure concepts, a succession of small-scale studies is usually necessary to refine measures and test them in a variety of situations before national or international data collection is possible. This kind of development work takes time.
  • For S&T data to be meaningful, organizations and individuals must be willing and able to supply reasonably accurate information. In some cases, the burden on survey respondents of supplying such information makes it impossible to secure the necessary cooperation and collect good data.
  • As S&T becomes increasingly globalized, internationally comparable data become increasingly important for mapping personnel and resource flows. Successful efforts under the auspices of the Organisation for Economic Co-operation and Development to coordinate the collection of R&D data across numerous national statistical systems indicate that coordination is feasible, but also that it is difficult and resource intensive.
  • Data are most valuable when they extend back in time as well as outward across national boundaries. New data will not be able to address many questions until several data collection cycles have been completed.
  • Legal and technical obstacles limit opportunities for merging data from different sources and making merged data widely available for analysis. Obstacles associated with merging datasets from different countries are especially daunting.
Right-click on image to save.