nsf.gov - NCSES Design Options for SESTAT for the Current Decade: Statistical Issues - US National Science Foundation (NSF)
text-only page produced automatically by LIFT Text
Transcoder Skip all navigation and go to page contentSkip top navigation and go to directorate navigationSkip top navigation and go to page navigation
National Science Foundation National Center for Science and Engineering Statistics
Design Options for SESTAT for the Current Decade: Statistical Issues



The Scientists and Engineers Statistical Data System (SESTAT) is a database of the employment, education, and demographic characteristics of a sample of scientists and engineers in the United States. SESTAT is maintained by the National Science Foundation to provide data for policy analysis and general research. In the 1990s the database was compiled biennially in 1993, 1995, 1997, and 1999. At each round, the SESTAT integrated database was constructed from data collected in three separate surveys: the National Survey of College Graduates (NSCG), the National Survey of Recent College Graduates (NSRCG), and the Survey of Doctorate Recipients (SDR).

Under the SESTAT definition of scientists and engineers, in 1995 approximately 12 million scientists and engineers were in the United States, compared with a total resident population of 197 million who were age 18 years and older. Sampling this rare population presents a challenge. In the 1990s the starting point for the SESTAT database was to sample college graduates identified in the 1990 census. This sample of college graduates was surveyed in the NSCG in 1993; individuals identified as scientists and engineers became part of a panel that was surveyed for subsequent rounds of SESTAT. To represent the flow of new scientists and engineers after 1990, a survey of bachelor's and master's degree recipients who obtained their degrees in the previous 2 academic years—the NSRCG—was conducted at the same time as the NSCG.[1] Subsamples of the NSRCG respondents from each round were added to the panel for subsequent rounds of the SESTAT surveys. For each round, the panel of the census-based sample and the NSRCG samples from previous rounds is known as the NSCG. The third survey in the SESTAT system is the SDR, which is a panel survey that represents individuals who earned doctorates in the United States.

As the NSCG surveys during the 1990s progressed, the NSCG nonresponse and undercoverage rates increased. To address this problem, the sampling plan called for a "refreshing" when data from the next census became available. The Division of Science Resources Statistics (SRS) investigated various potential redesign alternatives to the 1990s plan to guide in creating the design for the current decade. This report describes and evaluates the designs proposed.[2]

With the 2000 census, different sample design options became available for the NSCG in the following decade. An earlier study of potential sampling frames found that the only functional sampling frames for the SESTAT surveys in the 2000s are the census and a continuation of the existing panels. The most basic approach is to continue with the existing panels and supplements for new entrants to the population of scientists and engineers. An alternative approach is to start afresh with a sample for the NSCG from the 2000 census, repeating the design used in the 1990s. This leads to a consideration of combinations and refinements of these approaches. This report addresses some of the statistical issues related to the four design options considered by the National Science Foundation (NSF) for the SESTAT surveys for the current decade.

The report is organized as follows. "Overview of SESTAT Design of the 1990s" provides an overview of the 1990s SESTAT system and identifies some of the limitations of that system, particularly with respect to issues of undercoverage and nonresponse. "SESTAT Redesign Options" describes the four alternative designs considered and assesses their advantages and limitations. "Implications for Variance Estimation" discusses issues related to variance estimation and "Use of Web-based Data Collection" discusses the use of Web-based data collected for the SESTAT surveys. A summary of the main findings is contained in "Summary." The appendix to the report contains a memo discussing response issues for the SESTAT redesign options. This memo was revised to include comments from NSF.

Top of page. Back to Top


[1] The first NSRCG of the decade included some scientists and engineers who graduated immediately after the census as well as those who graduated in the 2 following academic years.

[2] Although SRS's small technical staff often propose technical changes to surveys (in this case, SRS Chief Statistician Ronald S. Fecso proposed the design options forming the core of this report), it is SRS's practice to obtain the advice of other highly regarded methodological professionals before implementing major changes. Such reviews, generally done through contracts (as is the case with this report) help ensure that methodological proposals represent best practices as viewed by a range of methodologists. This report documents design change proposals and the review of such material that took place in the time period during which decisions were being made for the NSCG design for the 2000s. As such, data for 1999 were not available and are not included in this report.

Design Options for SESTAT for the Current Decade: Statistical Issues
Working Paper | SRS 07-201 | June 2007