Characteristics of Scientists and Engineers in the United States: 2003 (SESTAT)
During the production of this report, the America COMPETES Reauthorization Act of 2010 was signed into law. Section 505 of the bill renames the Division of Science Resources Statistics as the National Center for Science and Engineering Statistics (NCSES). The Center retains its reporting line to the Directorate for Social, Behavioral and Economic Sciences within the National Science Foundation. The new name signals the central role of NCSES in the collection, interpretation, analysis, and dissemination of objective data on the science and engineering enterprise.
The 2003 Scientists and Engineers Statistical Data System (SESTAT) is an integrated system of information about employment, education, and demographic characteristics of scientists and engineers in the United States. SESTAT was developed by the National Science Foundation (NSF) in 1993 to provide a comprehensive workforce database for policy analysis and general research.
SESTAT comprises three NSF-sponsored demographic surveys of scientists and engineers: the National Survey of College Graduates (NSCG), the National Survey of Recent College Graduates (NSRCG), and the Survey of Doctorate Recipients (SDR). The three component surveys have been conducted every 2 to 3 years for three decades and began using similar questionnaires, survey reference dates, data collection periods, and data-processing procedures in 1993 to facilitate integration for SESTAT. The three surveys were designed to provide maximum coverage of the target population, namely scientists and engineers, with special emphasis given to relatively rare populations (e.g., doctorate recipients, recent graduates, and minorities). Overall, SESTAT provides a comprehensive picture of the number and characteristics of individuals in the United States with bachelor's level or higher degrees and their employment, with a focus on those having science and engineering (S&E) degrees or working in S&E occupations.
Target Population Coverage
The 2003 SESTAT target population includes individuals who had the following characteristics as of the component surveys' reference week of 1 October 2003:
Individuals who received their first bachelor's degree in a non-SEH field between 1 April 2000 and 1 October 2003 but who worked in a S&E or S&E-related occupation on the survey reference week are not covered in the target population nor are immigrants with non–U.S. earned bachelor's or higher degrees who arrived in the United States after 1 April 2000.
Because the 2003 SESTAT was created from three component surveys, cases identified in one component survey might also be eligible for another survey. For example, a U.S. resident who received a bachelor's degree before 1 April 2000, completed a master's degree in an SEH field in August 2000, and then earned an SEH doctoral degree in June 2002 has a probability of selection for each of the three component surveys in 2003. Consequently, SESTAT uses a unique linkage rule when integrating the component sample surveys in which each survey sample member is weighted according to the frame developed for that survey. Next, a series of overlap variables are calculated and assessed to identify cases that are eligible for more than one survey. To remove these multiple selection opportunities, each case within the SESTAT target population is uniquely linked to one and only one component survey; that individual is included in the SESTAT integrated file only when he or she is selected for that linked survey.
The 2003 National Survey of College Graduates
The NSCG has been conducted by the U.S. Bureau of the Census on behalf of the NSF since 1993 and is the largest of the three component surveys, representing approximately 90% of the SESTAT target population. The NSCG is used to study the occupations and career paths of U.S. residents with a bachelor's degree or higher (particularly in an SEH field). The NSCG is designed as a decade-long panel study of college graduates based on a sample of respondents from each decennial census long-form sample. It is conducted every 2 to 3 years throughout the decade.
The 2003 NSCG was the first cycle (i.e., a baseline) of data collection for the decade-long panel study that used the 2000 decennial census long form as the sample frame. It was designed to follow a nationally representative panel of bachelor's or higher S&E degree holders, including foreign-educated Ph.D.-level scientists and engineers who were in the United States on 1 April 2000. The NSCG panel is updated over the decade with new U.S. bachelor's and master's graduates with SEH degrees through supplemental samples added in from the National Survey of Recent College Graduates, described below.
The 2003 NSCG covered individuals with a bachelor's or higher degree in any field earned before 1 April 2000 who were younger than 76 years of age and resided in the United States during the reference week of 1 October 2003. The baseline NSCG also covered individuals who received a new SEH bachelor's or master's degree between 1 April 2000 and 30 June 2000 based on a supplemental sample from the 2001 NSRCG. Subsequent cycles of the NSCG panel study followed only those who had an S&E degree or had a non-S&E degree but worked in an S&E occupation on 1 October 2003. Stratified probability sampling was used in selecting individuals based on sex, race/ethnicity, disability status, U.S. citizenship, highest degree (bachelor's, master's, doctorate), and occupation. The total sample size for the 2003 NSCG was 170,797. The weighted response rate for the 2003 NSCG was 73.1%.
The 2003 NSCG questionnaire is available at http://www.nsf.gov/statistics/srvygrads/survey2003/grads_2003.pdf. Additional information on the NSCG is available at http://www.nsf.gov/statistics/srvygrads/.
The 2003 National Survey of Recent College Graduates
The NSRCG has been conducted every 2 to 3 years by a survey contractor for the NSF since 1974. The NSRCG is a cross-sectional survey that provides data on continuing education enrollment (e.g., doctoral training), and/or the early employment experiences of recent U.S. SEH graduates, including whether they were able to find employment (particularly in their field of study) and the attributes of that employment. As noted above, the NSRCG also provides a sampling frame from which to replenish the NSCG decade-long panel study.
The 2003 NSRCG target population consisted of individuals who received bachelor's or master's SEH degrees from a U.S. college or university within the preceding two academic years prior to the survey reference date, defined as July 2000 through June 2002. The 2003 NSRCG used a two-stage sample design. In the first stage, a stratified nationally representative sample of 300 colleges and universities was selected from a universe of approximately 1,800 U.S. academic institutions, with probability proportional to size. Each sampled institution was asked to provide lists of graduates for sampling. In the second stage, the graduates with bachelor's or master's degrees in SEH fields were identified and included in the 2003 NSRCG sampling frame. Stratified-probability sampling was used in selecting individuals based on sex, race/ethnicity, highest degree (bachelor's or master's), and major field of study. The total sample size for the 2003 NSRCG was 18,000 graduates (9,000 from each academic year): 13,061 bachelor's and 4,939 master's degree recipients. Of the 300 sampled institutions in the first stage, 296 provided lists of graduates for sampling respondents for the 2003 NSRCG, representing a weighted response rate of 98.7%. Data collection in the second stage resulted in a weighted response rate of 63.3%.
The 2003 NSRCG questionnaire is available at http://www.nsf.gov/statistics/srvyrecentgrads/survey2003/recentgrads_2003.pdf. Additional information on the NSRCG is available at http://www.nsf.gov/statistics/recentgrads/.
The Survey of Doctorate Recipients
SDR has been conducted every 2 to 3 years by a survey contractor for the NSF since 1973. The SDR is a panel study based on a nationally representative cohort of SEH doctorate recipients from U.S. institutions. The purpose of the SDR is to study the career paths of this highly trained cohort of scientists and engineers. Recipients of professional degrees, such as those awarded in medicine, law, or education, are not included in the SDR. The 2003 SDR covered the portion of the SESTAT target population that received doctoral degrees in an SEH field from U.S. academic institutions. Baseline data on education and demographic characteristics among SDR sampled members come from the Survey of Earned Doctorates (SED), an annual census of research doctorates earned in the United States that began with the 1957–58 academic year (http://www.nsf.gov/statistics/srvydoctorates/). The annual SED provides a sampling frame for updating the SDR panel over time with a supplemental sample of new U.S. SEH doctorate recipients added into each survey cycle.
The 2003 SDR target population consisted of individuals who earned an SEH research doctoral degree from a U.S. college or university, were less than 76 years of age, and were residing in the United States as of the reference date of 1 October 2003. Stratified probability sampling was used in selecting individuals based on sex, race/ethnicity, disability status, U.S. citizenship, and major field of degree. The 2003 SDR sample consisted of 39,957 cases, including 36,548 from the existing cohort cases carried over from the 2001 SDR and 3,409 new cohort cases who earned a Ph.D. in the United States between 1 July 2000 and 30 June 2002. The overall weighted response rate was 79.5%.
The 2003 SDR questionnaire is available at http://www.nsf.gov/statistics/srvydoctoratework/survey2003/sdr_2003.pdf. Additional information on the SDR is available at http://www.nsf.gov/statistics/srvydoctoratework/.
Because the three SESTAT component surveys typically are conducted by different survey data collection contractors, NSF uses standardized guidelines for quality assurance in data editing and data processing. In addition, several questionnaire items are deemed critical data elements, such as employment status and type of occupation if employed, and must be completed by the respondent to be considered an acceptable unit response.
Acceptable unit responses undergo general pre-editing and computer-assisted data entry in accordance with standardized guidelines. Multiple coding procedures (occupation, education, other-specify, geographic coding, and educational institution coding) are involved in processing a unit response.
A completed interview must include full reporting on a minimum set of critical data items, such as U.S. residency and occupation. If necessary, telephone follow-ups are used to obtain answers to critical data items as well as other noncritical but important items, such as degree information and employer location. Except for items with verbatim responses, missing data for noncritical items are imputed.
Sample selection probabilities for the SESTAT component surveys vary substantially and reflect the differential rates of stratified sampling in each survey for creating sufficient sample sizes to reliably estimate domains of interest in each survey's target population. For the SESTAT data, sampling weights are developed for respondents in each component survey and for the combined and integrated SESTAT.
Reliability of Estimates
Because SESTAT comprises three sample surveys, estimates are subject to sampling errors.
Standard Error Tables
To measure the precision of the SESTAT estimates, standard errors were calculated for each estimate. Thus, each statistical data table in this report has a corresponding standard error table in this appendix. For example, table A-1 is the standard error table that corresponds to the estimates presented in table 1. The standard errors can be used to construct confidence intervals for the estimates. To construct a 95% confidence interval about an estimate, multiply the standard error of an estimate by a z-score of 1.96. Add the result to the estimate to establish the upper bound of the confidence interval and subtract it from the estimate to establish the lower bound of the confidence interval.
Quality assurance procedures are included throughout the various stages of data collection and data processing to reduce possibilities for nonsampling error. Sources of nonsampling error include (1) nonresponse error, which arises when the characteristics of respondents differ systematically from nonrespondents; (2) measurement error, which arises when the variables of interest cannot be precisely measured; (3) coverage error, which arises when some members of the target population are excluded from the frame and thus do not have a chance to be selected for the sample; (4) respondent error, which occurs when respondents provide incorrect data; and (5) processing error, which can arise at the point of data editing, coding, or data entry. The analyst should be aware of potential nonsampling errors, but these errors are more difficult to detect and quantify than sampling errors.
Definitions and Explanations
Education data. These data were derived from responses to several questions on type of degree and field of study earned by the respondent. The education categories of respondents in each of the three SESTAT component surveys were based on respondents' field of study for the highest degree held in each survey's reference week. The following is a link to the list of major groups of SESTAT education categories for 2003 or later: http://ncsesdata.nsf.gov/docs/ed03maj.html.
Geographic regions. The 9 geographic regions listed in tables 15 and 24 include the 50 states, the District of Columbia, and Puerto Rico. Details on the states included in each division can be found at http://ncsesdata.nsf.gov/docs/location.html.
Occupation data. The occupational classification of the respondent was based on the codified job category for the respondent's principal job held during the reference week, or it was based on the codified job category for last job held, if the respondent was previously employed but not employed in the reference week. The codified job category was derived through a post-data collection coding process in which responses to several employment-related questions were evaluated. The questions used to determine the job category for each respondent included the job title, job description, respondent-selected job category, and primary work activities. The following is a link to the list of major groups of SESTAT occupation categories for 2003 or later: http://ncsesdata.nsf.gov/docs/occ03maj.html.
Race/ethnicity. All graduates, both U.S. citizens and non-U.S. citizens, are included in the race/ethnicity data presented in this report. American Indian/Alaska Native, Asian, black, Native Hawaiian/Other Pacific Islander, white, and persons reporting more than one race refer to non-Hispanic individuals only.
Salary. Median annual salaries are reported for the principal job and are rounded to the nearest $1,000. For individuals employed by educational institutions, no accommodation was made to convert academic-year salaries to calendar-year salaries.
Employment sector. Sector of employment is a derived variable based on responses to multiple survey questions. In the detailed tables, the category 4-year educational institution includes 4-year colleges or universities, medical schools (including university-affiliated hospitals or medical centers), and university-affiliated research institutions. Other educational institution includes 2-year colleges, community colleges, or technical institutes and other educational institutions.
Disability. The SESTAT component surveys ask the degree of difficulty—none, slight, moderate, severe, unable to do—an individual has in seeing (with glasses/contact lenses), hearing (with hearing aid), walking without assistance, or lifting 10 pounds. Those respondents who answered "moderate,"severe," or "unable to do" for any activity were classified as having a disability.