text-only page produced automatically by LIFT Text
Transcoder Skip all navigation and go to page contentSkip top navigation and go to directorate navigationSkip top navigation and go to page navigation
National Science Foundation Division of Science Resources Statistics

SESTAT Survey Design and Methodology

 

Weighting Strategy

Unbiased survey estimates depend on estimation procedures that incorporate the selection probabilities for each sampling unit. Selection probabilities for the SESTAT surveys vary greatly from unit to unit because of the extensive oversampling used to facilitate analyses of smaller populations and less common fields of study. Nonresponse and undercoverage can also bias estimates with respect to the population of interest. In the SESTAT data, some of these idiosyncracies associated with survey data analysis were removed by constructing sampling weights -- for each survey -- that reflect differential selection probabilities and by adjusting these weights to compensate for nonresponse and undercoverage in each survey.

Sampling weights were defined as the reciprocal of the probability of selection for each sampled units, and the weights were adjusted by using weighting class or poststratification adjustment procedures. The final adjusted sampling weights become the analysis weights, which have been added to each individual's record in the survey database (as "Z_WEIGHTING_FACTOR_SURVEY"). These weights should be used only in making estimates for the individual surveys.

In the 1993 National Survey of College Graduates (NSCG), poststratification adjustment was used to force the sampling weights for survey respondents to the 1990 Decennial Census Long Form sample estimates. In the 1993 National Survey of Recent College Graduates (NSRCG), the weighting class for the sampling weight was adjusted for nonresponse; a ratio adjustment was also made to reflect known proportions in the population. In the 1993 Survey of Doctorate Recipients (SDR), the weighting class for the sampling weight was adjusted for nonresponse. Similar procedures were followed in developing analysis weights for 1995, 1997, 1999, and 2001.

The analysis weights varied substantially across and within the component surveys, ranging from 1 to 436 for SESTAT as a whole in 1993, 1 to 734 in 1995, , 1 to 884 in 1997 and 1 to 878 in 1999. The median weights were 59, 71, 79 and 85 for 1993, 1995, 1997 and 1999, respectively. The larger weight variation in the later years resulted from the subsampling of mail nonrespondents for CATI/CAPI follow-up.

Each survey database was designed to be combined with the other two surveys to capture the advantages of a larger sample size and greater coverage of the target population. However, combining the three databases meant addressing the issue of cross-survey multiplicity. Scientists and engineers in SESTAT could belong to the surveyed population of more than one component survey, depending upon their degrees and when they received them. For instance, a person with a bachelor's at the time of the 1990 Census who went on to complete a master's degree in 1991 could be selected in the 1993 NSCG and the 1993 NSRCG.

The following unique-linkage rule was devised to remove these multiple selection opportunities: each member of SESTAT's target population is uniquely linked to one and only one component survey, and that individual is included in SESTAT only when he or she is selected for the linked survey.

As a result, each person had only one chance of being selected into the combined SESTAT database. Cases with multiple selection opportunities were first linked to the SDR and then to the NSRCG if the case was not also linked to the SDR. Sampled individuals for each component survey were examined to determine which other component surveys (if any) they could have been selected for. In the NSCG, sampled individuals who also had a chance of being selected for the NSRCG or the SDR in that year were assigned zero as their SESTAT analysis weight. Similarly, sampled individuals in the NSRCG who also had a chance of being selected for the SDR in that year were assigned zero as their SESTAT analysis weight. The component survey's analysis weight for all other cases was brought over as the SESTAT analysis weight. The SESTAT weight on the database (called "Z_WEIGHTING_FACTOR") should be used when analyzing SESTAT data derived from the three component surveys.

National Science Foundation Division of Science Resources Statistics (SRS)
The National Science Foundation, 4201 Wilson Boulevard, Arlington, Virginia 22230, USA
Tel: (703) 292-8780, FIRS: (800) 877-8339 | TDD: (800) 281-8749
Text Only