text-only page produced automatically by LIFT Text
Transcoder Skip all navigation and go to page contentSkip top navigation and go to directorate navigationSkip top navigation and go to page navigation
National Science Foundation Division of Science Resources Statistics

Sampling Errors for SESTAT

 

Understanding Sampling Errors

Types of Survey Errors

Estimates derived from sample surveys are subject to two types of errors--sampling errors and nonsampling errors. Nonsampling errors [1] can be attributed to many sources, such as response differences, definitional difficulties, differing respondent interpretations, and respondent inability to recall information.

Sampling errors (the focus of this presentation) occur when estimates are derived from a sample rather than a census of the population. The sample used for a particular survey is only one of a large number of possible samples of the same size and design that could have been selected. Even if the same questionnaire and instructions were used, the estimates from each sample would differ from the others. This difference, termed sampling error, occurs by chance, and its variability is measured by the standard error associated with a particular survey.

Estimates of the characteristics of scientists and engineers obtained using SESTAT are based on sample surveys and are thus subject to sampling errors. (Another related term is the variance which is the square of the standard error and is sometimes used in standard error calculations.)

Assessing the Accuracy of Estimates

Having estimated a population quantity such as a mean or total, it is desirable to assess the accuracy of the estimate. The customary approach is to construct a confidence interval within which one is sufficiently sure the true population value lies. The standard error of a survey estimate measures the precision with which an estimate from one sample approximates the true population value, and thus can be used to construct a confidence interval for a survey parameter to assess the accuracy of the estimate. Let theta-hat be an estimator of a parameter of interest theta with a standard error standard error of theta-hat. If the sample size is large, then an approximate (1-alpha)100 percent confidence interval for theta is

confidence interval for theta,

where z sub (alpha divided by 2) is the upper alpha/2 percentage point of the normal distribution with mean zero and variance one.

If the process of selecting a sample from the population were repeated many times and an estimate and its standard error calculated for each sample, then:

  • Approximately 90 percent (alpha=0.10) of the intervals from 1.645 (= z sub .05) standard errors below the estimate to 1.645 standard errors above the estimate will include the true population value.
  • Approximately 95 percent (alpha=0.05) of the intervals from 1.96 (= z sub .025) standard errors below the estimate to 1.96 standard errors above the estimate will include the true population value.
  • Approximately 99 percent (alpha=0.01) of the intervals from 2.575 (=z sub .005) standard errors below the estimate to 2.575 standard errors above the estimate will include the true population value.

With an estimate of the standard error and the factors above (1.645, 1.96, or 2.575), a data user may construct a confidence interval, or range of values, that includes the true population value with the given probability alpha (=0.10, 0.05, or 0.01).


[1] For a general discussion of nonsampling errors, see Nonsampling Errors in Surveys by Judith T. Lessler and William D. Kalsbeek (New York: John Wiley & Sons, 1992).

While the full extent of nonsampling errors is usually unknown, a variety of related research has been conducted for the SESTAT surveys. Some of the information from this research has been summarized in the technical notes associated with the SESTAT data elements, accessible through the SESTAT Home Page.

 

National Science Foundation Division of Science Resources Statistics (SRS)
The National Science Foundation, 4201 Wilson Boulevard, Arlington, Virginia 22230, USA
Tel: (703) 292-8780, FIRS: (800) 877-8339 | TDD: (800) 281-8749
Text Only