The Survey of Doctorate Recipients (SDR) is a longitudinal study of individuals who received a doctoral degree from a U.S. institution in a science, engineering, or health (SEH) field . The goal of the SDR is to provide policymakers and researchers with high-quality data for making informed decisions related to the educational and occupational achievements and career movement of the nation's doctoral scientists and engineers. This group is of special interest to many decision makers because it represents the highest-educated individuals in the U.S. workforce.
The SDR has been conducted every 2 to 3 years since 1973 for the National Science Foundation's National Center for Science and Engineering Statistics in conjunction with the National Institutes of Health (NIH). The survey follows a sample of individuals with SEH doctorates throughout their careers from the year of their degree award until age 76. The panel is refreshed in each survey cycle with a sample of new SEH doctoral degree earners.
The data from this survey are combined with data from two other NSF surveys of scientists and engineers, the National Survey of College Graduates (NSCG) and the National Survey of Recent College Graduates (NSRCG). The three surveys are closely coordinated and share the same reference date and nearly identical instruments. The database developed from the three surveys, the Scientists and Engineers Statistical Data System (SESTAT), provides a comprehensive picture of the number and characteristics of individuals with training or employment in science, engineering, or related fields in the United States.
SDR respondents included in SESTAT have the following characteristics:
Respondents living outside of the United States during the survey reference week are included in the SDR but are excluded from SESTAT.
The 2010 SDR includes a sample redesign that integrates the International SDR (ISDR) into the main or National SDR (NSDR). Since its inception in 2003 as a methodological study through the 2008 cycle, the ISDR has operated as a totally separate survey from the NSDR using a sampling frame that was non-overlapping with, but complementary to, the NSDR sampling frame. The NSDR and ISDR surveys both target U.S.-trained SEH doctorates younger than 76 years on the survey reference date, but the NSDR target population was restricted to those residing in the United States, and the eligible ISDR target population was restricted to those residing outside of the United States.
Prior to the 2010 survey cycle, the NSDR and ISDR defined eligible doctorates for their sampling frame based upon (1) citizenship status (where only non-U.S. citizens were eligible for ISDR) and (2) predicted residency location (where non-U.S. residents were eligible for ISDR). However, residency location was not always accurately predicted. The ISDR sample contained individuals living in the United States eligible for the NSDR target population; likewise, the NSDR sample contained individuals living outside of the United States who were eligible for the ISDR target population. As a result, the potential of both samples was diminished in terms of coverage and sample sizes.
To solve this problem, an integrated sample design was developed that better aligned the NSDR and ISDR membership criteria with the doctorate recipients' actual residency locations. More importantly, integrating the two separate samples increased the sample size for each sample component and improved population coverage.
The target population of the 2010 SDR consisted of all individuals who were younger than 76 years of age as of the survey reference date (i.e., born on or after 1 October 1934) , who had received a research doctorate in an SEH field from a U.S. academic institution, and who were not institutionalized. Coverage is complete for individuals living in the United States or a U.S. territory during the survey reference week of 1 October 2010. Coverage is complete for two subsets of individuals living outside of the United States during the survey reference week: (1) U.S. citizens at birth and (2) individuals who received their SEH doctorate from a U.S. institution after 1 July 2000, regardless of citizenship.
The sample frame used both to identify the initial panel of respondents and to refresh the panel over time with new SEH doctorate recipients is the Doctorate Records File, maintained by the NSF. The primary source of information for the Doctorate Records File is the Survey of Earned Doctorates (SED).
The 2010 SDR sampling frame included individuals who
The 2010 SDR frame was constructed as two separate cohort frames, an existing 2008 SDR cohort frame and a new cohort frame. The cohorts are defined by the year of receipt of their first U.S.-granted doctoral degree . The existing cohort frame represents individuals who had received their SEH doctorate before 1 July 2007; the new cohort frame represents individuals who had received an SEH doctorate between 1 July 2007 and 30 June 2009. The new cohort frame was a "primary frame" that included all known newly eligible cases; the existing cohort frame was a "secondary frame" that carried forward the SDR cohort from the previous survey cycle and each member's sampling weight from the previous cycle.
Existing Frame. The SDR existing (or old cohort) frame was constructed from the final operational NSDR and ISDR sample files used for data collection in the previous survey cycle less cases determined to be permanently ineligible in that prior cycle (e.g., sample members determined to be deceased or over age 75 during 2008 survey operations). Existing frame cases were originally selected into the SDR as new cohort members who were sampled from the SED.
New Cohort Frame. The data source for constructing the SDR new cohort sampling frame for 2010 was the two most recent doctoral cohorts included in the SED. The most recent SED cohort always lags one year behind the current SDR reference year; the two most recent cohorts for the 2010 SDR were thus the cohorts who received a doctorate degree in academic year (AY) 2008 or AY 2009.
Total Frame. The cases within all of the frame sources were analyzed individually for SDR eligibility requirements. Persons who did not meet the age criteria or who were known to be deceased, terminally ill or incapacitated, or permanently institutionalized in a correctional or health care facility were dropped from the 2010 sampling frames. After ineligible cases were removed from consideration, the remaining cases from the two sources were combined to create the 2010 SDR sampling frame. In total, the 2010 SDR frame included 42,064 existing cohort cases (i.e., a sample of doctorate holders who earned their degrees prior to AY 2008) and 70,573 new cohort cases (i.e., a census of doctorate recipients from AY 2008 and AY 2009).
The goal of the 2010 SDR sample stratification design was to create strata that conformed as closely as possible to the analytic domains and for which the associated subpopulations were large enough to be suitable for separate estimation and reporting.
The revised 2010 sample design integrated the cases that were eligible for either the NSDR or ISDR target populations. The cases were then stratified based upon the cases' last known location. This design reduces undercoverage and provides better control of the operational procedures, as it groups together cases expected to require a similar level of effort to locate and cases with similar employment and earning outcomes.
The 2010 SDR had a stratified probability sampling design that was similar to the 2008 SDR design. The total number of cases selected for the 2010 SDR sample was 45,697. The sample design included 194 strata: 150 strata associated with the NSDR sample component, and 44 strata associated with the ISDR sample component. Regardless of citizenship status, all 2008 ISDR existing cases and any 2008 NSDR existing cases whose last known residence was outside the United States were classified into the 44 ISDR strata together with new cohort cases reporting plans to emigrate in the SED. NSDR existing cases predicted to be U.S. residents and new cohort cases not reporting emigration plans after graduation were assigned to the 150 NSDR strata, regardless of their citizenship status.
The 2010 SDR sample allocation strategy consisted of three main components: (1) ensuring a minimum sample size for the smallest strata through a supplemental stratum allocation, (2) allotting extra sample for specific demographic group-by-sex domains through a supplemental domain allocation, and (3) allocating the remaining sample proportionately across all strata. The final sample allocation was therefore based on the sum of a proportional allocation across all strata, a domain-specific supplement allocated proportionately across strata in that domain, and a stratum-specific supplement added to obtain the minimum stratum size.
The 2010 SDR sample of 45,697 consisted of 40,000 NSDR cases (with 36,543 cases from the existing cohort frame and 3,457 cases from the new cohort frame) and 5,697 ISDR cases (with 4,797 cases from the existing cohort frame and 900 cases from the new cohort frame).
The 2010 SDR was conducted by NORC at the University of Chicago (Chicago, IL), a survey contractor.
Since 2003, the SDR has used a tri-mode data collection approach: self-administered paper questionnaire (via mail), Web survey, and computer-assisted telephone interview (CATI). Sample members are started in one mode depending on their past preference and their available contact information. At any time during data collection, sample members can choose to complete the survey using any of the three modes.
The SDR is based on a complex sampling design and uses sampling weights that are attached to each responding sample member's record to produce accurate population estimates. The primary purpose of the weights is to adjust for unequal sampling probabilities and nonresponse. The final analysis weights were calculated to account for sampling, adjust for unknown location or unknown eligibility, adjust for nonresponse, and align with poststratification control totals.
Estimates based on the total sample have relatively small sampling errors. However, sampling error increases and can be quite substantial when estimating characteristics of small subpopulations. Estimates of the sampling errors associated with various measures will be included in the forthcoming methodology report for the 2010 survey and in the forthcoming publication Characteristics of Doctoral Scientists and Engineers in the United States: 2010.
The SDR has minimal coverage error given the minimal coverage error in the SED, which is the frame for the SDR sample.
Unit Nonresponse. The unweighted response rate for this survey in 2010 was 79.8%. Adjustment for unit nonresponse was based on statistical weighting techniques. The weighted response rate was 79.9%.
Item Nonresponse. In 2010, the item nonresponse rates for key items (employment status, sector of employment, field of occupation, and primary work activity) ranged from 0.0% to 3.0%. Particularly sensitive variables, such as salary and earned income, had item nonresponse rates of 8.9% and 11.4%, respectively. Personal demographic data, such as marital status, citizenship, and race and ethnicity, had item nonresponse rates ranging from 0.0% to 8.6%. Cases missing the primary critical items were classified as survey nonresponse. Primary critical items included working for pay or profit, looking for work, last job, principal job, and living in the United States. All missing data were imputed, except for the primary critical items, verbatim text items, and some coded variables based on verbatim text items.
Imputation. The 2010 SDR used a combination of logical and hot-deck imputation for missing data.
Some of the key variables in this survey can be difficult to measure. For example, individuals do not always know the precise definitions of occupations that are used by experts in the field and, thus, may select occupational fields that are technically incorrect. In order to reduce measurement error, the SDR survey instruments for 2006 were pretested, using cognitive interviews and a mail pretest. The SDR instrument also benefited from extensive pretesting of the NSCG and NSRCG instruments, because most SDR questions also appear on the NSCG and the NSRCG. The 2010 SDR instrument was consistent with the 2006 SDR.
As is true for any multimode survey, it is likely that the measurement errors associated with the different modalities are somewhat different. This possible source of measurement error is especially troublesome, because the proclivity to respond by one mode or another may be associated with variables of interest in the survey. To the extent that certain types of individuals may be relatively likely to respond by one mode compared to another, the multimodal approach may introduce some systematic biases into the data. However, a study of differences across modes was conducted after the 2003 survey and showed that all three modes yielded comparable data for the most critical data items. Furthermore, data captured in the Web mode had lower item nonresponse for contacting variables and more complete verbatim responses for the occupation questions than did the data captured in the self-administered paper questionnaire (mail mode).
There have been a number of changes in the definition of the population surveyed over time. For example, prior to 1991, the survey included some individuals who had received doctoral degrees in fields outside of SEH or had received their degrees from non-U.S. universities. Because coverage of these individuals had declined over time, the decision was made to remove this group from the survey population beginning with the 1991 survey. Because survey improvements made in 1993 were sufficiently substantial, NCSES staff suggest that trend analyses between the data from the surveys after 1991 and the surveys in prior years must be performed very cautiously, if at all. Individuals who wish to explore such analyses are encouraged to discuss this issue further with the survey project officer listed below.
The data from this survey are published biennially in detailed statistical tables in the series Characteristics of Doctoral Scientists and Engineers in the United States, as well as in several InfoBriefs and Special Reports. Information from this survey is also included in Science and Engineering Indicators; Women, Minorities, and Persons with Disabilities in Science and Engineering; and Science and Engineering State Profiles.
Results from this survey are available on the NCSES website. Data can be accessed from the SESTAT website using the SESTAT Data Tool or downloadable public use data files. Access to restricted data for researchers interested in analyzing microdata can be arranged through a licensing agreement.
Additional information about this survey may be obtained by contacting:
Human Resources Statistics Program
National Center for Science and Engineering Statistics
National Science Foundation
4201 Wilson Boulevard, Suite 965
Arlington, VA 22230
Phone: (703) 292-4434
Notes SEH fields include biological, agricultural, and environmental life sciences; computer and information sciences; mathematics and statistics; the physical sciences; psychology; the social sciences; engineering; and health fields.