1995 Survey Methodology
The data on doctoral scientists and engineers contained in this report come from the 1995 Survey of Doctorate Recipients (SDR). The National Research Council (NRC) has conducted the SDR biennially since 1973 for the National Science Foundation (NSF). Additional data on education and demographic information come from the National Research Council's Doctorate Records File (DRF). The DRF contains data from an ongoing census of research doctorates earned in the United States since 1920. This appendix contains an overview of the survey methodology; a more detailed description is available under separate cover.
The sampling frame for the SDR is compiled from the DRF. For the 1995 survey the sampling frame comprised individuals who:
- had earned a doctoral degree from a U.S. college or university in a science or engineering field;
- were U.S. citizens or, if non-U.S. citizens, indicated that they had plans to remain in the United States after degree award; and
- were under 76 years of age.
To develop the frame, graduates who had earned their degrees since 1995 and met the conditions listed above were added to the frame; those who were carried over from 1993 but had attained the age of 76 (or had died) were deleted. A sample of the incoming graduates was drawn and added to the panel sample conveyed from year to year. A maintenance cut was done to keep the sample size roughly the same as it was in 1993. In 1995, the SDR sample size was 49,829.
The basic sample design was a stratified random sample with the goal of proportional sampling across strata. The variables used for stratification were 15 broad fields of degree, 2 genders, and an 8-category "group" variable combining race/ethnicity, handicap status, and citizenship status.
In determining sampling rates the goal was to achieve as much homogeneity as possible while allowing for oversampling of certain small populations (e.g., minority women). In practice, however, the goal of proportional sampling was not consistently achieved. A number of sample size adjustments over the years, in combination with changes to the stratification, led to highly variable sampling rates, sometimes within the same sampling cell. The overall sampling rate was about 1 in 12 (8 percent), applied to a population of 594,300. Across strata, however, the rates ranged from 4 to 67 percent. The range in sampling rates serves to increase the variance of the survey estimates.
In 1995, there were two phases of data collection: a mail survey and telephone follow-up interview for nonrespondents to the mail. Phase 1 consisted of two mailings of the survey questionnaire with a reminder postcard between the mailings. The first mailing was in May 1995 and the second (using Priority Mail) in July 1995. To encourage participation, all survey materials were personalized with the respondent's name and address. The mail survey achieved a response rate of about 62 percent.
Phase 2 consisted of conducting computer-assisted telephone interviewing (CATI) on a 60-percent sample of nonrespondents to the mail survey (the CATI subsample). Telephone numbers were located for about 90 percent of the subsample and interviews were completed with 63 percent. Telephone interviewing was conducted between November 1995 and February 1996.
As completed mail questionnaires were received, they were logged into a receipt control system that kept track of the status of all cases. Coding staff then carried out a variety of checks and prepared the questionnaires for data entry. Specifically, they resolved incomplete or contradictory answers, reviewed "other specify" responses for possible backcoding to a listed response, and assigned numeric codes to open-ended questions (e.g., employer name). A coding supervisor validated the coders' work.
Once cases were coded, they were sent to data entry. The data entry program contained a full complement of range and consistency checks for entry errors and inconsistent answers. The range and consistency checks were also applied to the CATI data via batch processing. Further computer checks were performed to test for inconsistent values; these were corrected and the process repeated until no inconsistencies remained.
At this point, the survey data file was ready for imputation of missing data. As a first step, basic frequency distributions were produced to show nonresponse rates to each question-these were generally less than 3 percent, with the exception of salary, which was 6 percent. Two methods for imputation were adopted. The first, cold decking, was used mainly for demographic variables that are static, i.e., not subject to change. Using this method, historical data provided by respondents in previous years were used to fill a missing response. In cases where no historical data were available, and for non-demographic variables (such as employment status, primary work activity, and salary), hot decking was used. Hot decking involved creating cells of cases with common characteristics (through the cross-classification of auxiliary variables) and then selecting a donor at random for the case with the missing value. As a general rule, no data value was imputed from a donor in one cell to a recipient in another cell.
Weighting and Estimation
The next phase of the survey process involved weighting the survey data to compensate for unequal probabilities of selection to the sample and to adjust for the effects of unit nonresponse. The first step was the construction of sample weights, which were calculated as the inverse of the probability of selection, taking into account all stages of the sample selection process over time. Sample weights varied within cells because different sampling rates were used depending on the year of selection and the stratification in effect at that time.
The second step was to construct a combined weight, which took into account the subsampling of nonrespondents at the CATI phase. All respondents received a combined weight, which for mail respondents was equal to the sample weight and for CATI respondents was a combination of their sample weight and their CATI subsample weight.
The third step was to adjust the combined weights for unit nonresponse. (Unit nonresponse occurs when the sample member refuses to participate or cannot be located.) Nonresponse adjustment cells were created using poststratification. Within each non-response adjustment cell, a weighted nonresponse rate was calculated. This weighted nonresponse rate took into account both mail and CATI nonresponse. The nonresponse adjustment factor was the inverse of this weighted response rate. The initial set of nonresponse adjustment factors was examined and, under certain conditions, some of the cells were collapsed if use of the adjustment factor would create excessive variance.
The final weights for respondents were calculated by multiplying their respective combined weights by the nonresponse adjustment factor. Estimates in this report were developed by summing the final weights of the respondents selected for each analysis.
The unweighted response rate, which is calculated as total returns divided by total sample, was 76 percent. The weighted response rate takes into account the different probabilities for selection to the sample and the CATI subsample and is calculated as the total returns multiplied by their combined weight divided by the total sample cases multiplied by their sampling weights. The weighted response rate was 85 percent. The unweighted response rate is a measure of how well the data collection methodology worked in obtaining responses, while the weighted response rate is an indicator of the potential for nonresponse bias and as such is a somewhat better indicator of data quality.
The statistics in this report are subject to both sampling and nonsampling error. For a detailed discussion of both sources of error in the SDR, see the methodological report referenced in footnote 1 of this appendix. In this methodological report, tables are provided that allow the reader to approximate the standard error associated with various estimates from the survey.
 Brown, Prudence, 1997, Methodological Report of the 1995 Survey of Doctorate Recipients, National Research Council, Washington, DC.