Characteristics of Scientists and Engineers in the United States: 2008
Appendix. Technical Notes
The Scientists and Engineers Statistical Data System (SESTAT) comprises three demographic surveys of scientists and engineers sponsored by the National Science Foundation (NSF): the National Survey of College Graduates (NSCG), the National Survey of Recent College Graduates (NSRCG), and the Survey of Doctorate Recipients (SDR). The three component surveys are conducted every 2–3 years and use similar questionnaires, survey reference dates, data collection periods, and data-processing procedures to facilitate integration for SESTAT. The three surveys are designed to provide maximum coverage of the target population—namely, scientists and engineers—with special emphasis given to relatively rare populations (e.g., doctorate recipients, recent graduates, and minorities). Overall, SESTAT provides a comprehensive picture of the number and characteristics of individuals in the United States with a bachelor's-level or higher degree and their employment, with a focus on those having science and engineering (S&E) degrees or working in S&E occupations. In the 2000s, this definition was expanded to include S&E-related degrees and occupations.
Target Population and Coverage
The 2008 SESTAT target population includes individuals who had the following characteristics as of the component surveys' reference week of 1 October 2008:
Individuals not covered:
Because the 2008 SESTAT was created from three component surveys, cases identified in one component survey might also be eligible for another survey. This frame characteristic is referred to as multiplicity. For example, a U.S. resident who received a bachelor's degree before 1 April 2000, completed a master's degree in an S&E field in July 2003, and then earned an S&E doctoral degree in June 2007 had a probability of selection for each of the three component surveys in 2008. Consequently, SESTAT uses a unique linkage rule when integrating the component sample surveys in which each survey sample member is weighted according to the frame developed for that survey. Next, a series of overlap variables is calculated and assessed to identify cases that are eligible for more than one survey. To remove these multiple selection opportunities, each case within the SESTAT target population is uniquely linked to one and only one component survey; that individual is included in the SESTAT integrated file only when he or she is selected for that linked survey.
The 2008 National Survey of College Graduates
NSCG has been conducted by the U.S. Bureau of the Census on behalf of NSF in its current form since 1993 and is the largest of the three component surveys, representing approximately 90% of the SESTAT target population. NSCG is used to study the occupations and career paths of U.S. residents with a bachelor's-level or higher degree (particularly in an S&E field). NSCG is designed as a decade-long panel study of college graduates based on a sample of respondents from each decennial census long-form sample. It is conducted every 2–3 years.
The 2003 NSCG was the first cycle (i.e., a baseline) of data collection for the decade-long panel study that used the 2000 decennial census long form. It was designed to follow a nationally representative panel of bachelor's degree, master's degree, and foreign-educated scientists and engineers who participated in the 2000 decennial census long form. Subsequent cycles of the NSCG panel study are follow-ups to the 2003 baseline NSCG and include supplemental samples of new U.S. bachelor's and master's graduates with S&E degrees (including the social sciences and health fields) drawn from NSRCG, another SESTAT component survey described below. The 2001 NSRCG was used to supplement the 2003 NSCG new cohort to provide coverage of US degreed people between April and July of 2000.
The 2008 NSCG was a follow-up of the 2006 NSCG cohort members with S&E or S&E-related degrees or with non-S&E degrees but who were working in an S&E occupation on 1 October 2003. The 2008 NSCG also restricted follow-up to those who were younger than 76 years of age and resided in the United States (including the District of Columbia and Puerto Rico or another U.S. territory) during the reference week of 1 October 2008. In addition to the 2003 and 2006 NSCG cohort members who continued to meet criteria for follow-up, the 2008 NSCG also covered individuals who received a new S&E bachelor's or master's degree between 1 April 2000 and 30 June 2007, based on a supplemental samples from the 2003, 2006, and 2008 NSRCG. Stratified probability sampling was used in selecting individuals, based on sex, race and ethnicity, disability status, U.S. citizenship, highest degree (bachelor's, master's, doctorate), and occupation. The total sample size for the 2008 NSCG was 68,000, and the weighted response rate was 88.74%.
The 2008 NSCG questionnaire is available at http://www.nsf.gov/statistics/srvygrads/surveys/srvygrads_2008.pdf. Additional information on the NSCG is available at http://www.nsf.gov/statistics/srvygrads/.
The 2008 National Survey of Recent College Graduates
NSRCG has been conducted every 2–3 years by a survey contractor for NSF since 1974. NSRCG is a cross-sectional survey that provides data on continuing education enrollment (e.g., master's degree and doctoral training) and/or the early employment experiences of recent U.S. S&E graduates, including whether they were able to find employment (particularly in their field of study) and the attributes of that employment.
The 2008 NSRCG target population consisted of individuals who received bachelor's or master's degrees in science, engineering, and health (SEH) fields from a U.S. college or university within the preceding two academic years (defined as July 2005–June 2006 and July 2006–June 2007) prior to the survey reference date of 1 October 2008. The 2008 NSRCG used a two-stage sample design. In the first stage, a stratified, nationally representative sample of 300 colleges and universities was selected from a universe of approximately 1,800 U.S. academic institutions, with probability proportional to size. Each sampled institution was asked to provide lists of graduates for sampling. In the second stage, the graduates with bachelor's or master's degrees in S&E fields were identified and included in the 2008 NSRCG sampling frame. Stratified-probability sampling was used in selecting individuals based on sex, race and ethnicity, highest degree (bachelor's or master's), and major field of study. Of the 302 sampled institutions in the first stage, 288 provided lists of graduates for sampling respondents for the 2008 NSRCG, representing a weighted response rate of 94.2%. Data collection in the second stage resulted in a weighted response rate of 69.7%.
The 2008 NSRCG questionnaire is available at http://www.nsf.gov/statistics/srvyrecentgrads/surveys/srvyrecentgrads_2008.pdf. Additional information on NSRCG is available at http://www.nsf.gov/statistics/recentgrads/.
The 2008 Survey of Doctorate Recipients
SDR has been conducted every 2–3 years by a survey contractor for NSF since 1973. SDR is a panel study based on a nationally representative cohort of S&E doctorate recipients from U.S. institutions. The purpose of SDR is to study the career paths of this highly trained cohort of scientists and engineers. Recipients of professional degrees—such as those awarded in medicine, law, or education—are not included in SDR. The 2008 SDR covered the portion of SESTAT's target population that received doctoral degrees in an SEH field from U.S. academic institutions between 1 January 1948 and 30 June 2007. Baseline data on education and demographic characteristics among SDR sampled members come from the Survey of Earned Doctorates (SED), an annual census of research doctorates earned in the United States that began with the 1957–58 academic year (http://www.nsf.gov/statistics/srvydoctorates/). The annual SED provides a sampling frame for updating the SDR panel over time with a supplemental sample of new U.S. SEH doctorate recipients added into each survey cycle.
The 2008 SDR target population consisted of individuals who earned an S&E research doctoral degree from a U.S. college or university by 30 June 2007, were less than 76 years of age, and were residing in the United States as of the reference date of 1 October 2008. Stratified probability sampling was used in selecting individuals based on sex, race and ethnicity, disability status, U.S. citizenship, and major field of degree. The 2008 SDR sample consisted of 40,093 cases selected systematically across strata, including 36,644 from the existing cohort cases and 3,449 from the new cohort cases. The overall weighted response rate was 80.5%. The 2008 international SDR sample consisted of 2,832 cases, including 1,500 from the existing panel, 384 transferred from the national SDR sample, and 948 from the new cohort. The overall weighted response rate was 68.0%.
The 2008 SDR questionnaire is available at http://www.nsf.gov/statistics/srvydoctoratework/surveys/srvydoctoratework_2008.pdf. Additional information on the SDR is available at http://www.nsf.gov/statistics/srvydoctoratework/.
Editing Guidelines and Procedures
Because the three SESTAT component surveys typically are conducted by different survey data collection contractors, NSF uses standardized guidelines for quality assurance in data editing and data processing. In addition, several questionnaire items are deemed critical data elements, such as residence information, employment status, and type of occupation if employed, and must be completed by the respondent to be considered an acceptable unit response.
Acceptable unit responses undergo general pre-editing and computer-assisted data entry in accordance with standardized guidelines. Multiple coding procedures (occupation, education, other specify, geographic coding, and educational institution coding) are involved in processing a unit response. The editing rules include (1) sample person verification edits, (2) U.S. residency edits, (3) age and critical item edits, (4) new degree edits, (5) range edits, (6) back coding "other specify" information and skip error edits, (7) mark-one edits for questions with more than one response marked, (8) cross-item consistency and cross-year edits, and (9) skip/blanking edits. Guidelines also address editing rules for "refused," "don't know," or "blank" responses and for missing data on questions with a series of "yes/no" responses; rounding rules for decimals or fractions and for number of employees; and coding rules for primary and secondary work activities, for most important and second-most important reason for working outside field of highest degree, and for most important reason for attending training.
A completed interview must include full reporting on a minimum set of critical data items, such as U.S. residency and occupation. If necessary, telephone follow-ups are used to obtain answers to critical data items as well as other noncritical but important items, such as degree information and employer location. Except for items with verbatim responses, missing data for noncritical items are imputed. Imputation does not begin until after all logical editing is complete. Sequential hot deck imputation is used to replace missing data. Before imputation, serpentine sorting is used to ensure that adjacent data records are as similar as possible. After imputation of the data, postimputation edit checks are used to ensure that imputed values remain consistent with nonmissing data and adhere to the editing guidelines and procedures described above.
Sample selection probabilities for SESTAT component surveys vary substantially and reflect the differential rates of stratified sampling in each survey for creating sufficient sample sizes to provide reliable estimates of domains of interest in each survey's target population. For SESTAT data, sampling weights are developed for respondents in each component survey and for the combined and integrated SESTAT. For each component survey, sampling weights adjust for the differential selection probabilities and also for nonresponse and undercoverage. The fully adjusted sampling weights become the analysis weights and are added to each respondent record in SESTAT (variable name: Z_WEIGHTING_FACTOR_SURVEY). These weights should be used only when making estimates from each component survey in SESTAT. For the three combined component surveys, sampling weights further adjust for cross-survey multiplicity when making estimates based on SESTAT. The integrated SESTAT weight (variable name: Z_WEIGHTING_FACTOR) should be used when making estimates for the overall target population.
Reliability of Estimates
Because SESTAT comprises three sample surveys, estimates are subject to sampling errors.
Standard Error Tables
To measure the precision of the SESTAT estimates, standard errors were calculated for each estimate. Thus each statistical data table in this report has a corresponding standard error table in this appendix. For example, table A-1 is the standard error table that corresponds to the estimates presented in table 1. The standard errors can be used to construct confidence intervals for the estimates. To construct a 95% confidence interval about an estimate, multiply the standard error of an estimate by a z-score of 1.96. Add the result to the estimate to establish the upper bound of the confidence interval and subtract it from the estimate to establish the lower bound of the confidence interval.
Quality assurance procedures are included throughout the various stages of data collection and data processing to reduce possibilities for nonsampling error. Sources of nonsampling error include (1) nonresponse error, which arises when the characteristics of respondents differ systematically from nonrespondents; (2) measurement error, which arises when the variables of interest cannot be precisely measured; (3) coverage error, which arises when some members of the target population are excluded from the frame and thus do not have a chance to be selected for the sample; (4) respondent error, which occurs when respondents provide incorrect data; and (5) processing error, which can arise at the point of data editing, coding, or data entry. The analyst should be aware of potential nonsampling errors, but these errors are more difficult to detect and quantify than sampling errors.
Definitions and Explanations
Disability. The SESTAT component surveys ask the degree of difficulty—none, slight, moderate, severe, unable to do—an individual has in seeing (with glasses/contact lenses), hearing (with hearing aid), walking without assistance, or lifting 10 pounds. Those respondents who answered "moderate," "severe," or "unable to do" for any activity were classified as having a disability.
Reporting of education data. These data were derived from responses to several questions on type of degree and field of study earned by the respondent. The education categories of respondents in the SESTAT detailed statistical tables were based on respondents' field of study for the highest degree held in each survey's reference week. Please note that this differs somewhat from the component surveys' individual detailed statistical tables. While NSCG and NSRCG both use the respondent's highest degree for reporting education data, SDR reports using the respondent's first U.S. PhD, which is obtained from SED (i.e., the SDR sample degree). Degrees received after the sample degree, including second PhDs, are not reported in the SDR detailed statistical tables. However, in the SESTAT tables, a second PhD will generally be considered "higher" than the first PhD and will be the degree reported in the tables. The SESTAT surveys do collect the respondent's full degree inventory. Additional information on the degrees can be found at: http://ncsesdata.nsf.gov/docs/inventory.html.
The following is a link to the list of major groups of SESTAT education categories for 2003 or later: http://ncsesdata.nsf.gov/docs/ed03maj.html.
Employment sector. Sector of employment is a derived variable based on responses to multiple survey questions. In the detailed statistical tables, the category 4-year educational institution includes 4-year colleges or universities, medical schools (including university-affiliated hospitals or medical centers), and university-affiliated research institutions. Other educational institution includes 2-year colleges, community colleges, or technical institutes and other educational institutions. Business/industry includes self-employed individuals, nonprofit organizations and other unspecified types of employers. Within business/industry, private-for-profit includes those self-employed in an incorporated business. Self-employed includes those self-employed or business owners in a nonincorporated business.
Not in labor force. Includes individuals who were not working during the survey reference week and had not been seeking work in the prior four weeks because of retirement, family responsibilities, chronic illness, or other reasons.
Occupation data. These data were derived from responses to several questions on the kind of work performed by the respondent in their principal job. The occupational classification of the respondent was based on his or her principal job (including job title) held during the reference week or on last job held, if previously employed but not employed in the reference week. Also used in the occupational classification was a respondent-selected job code. The following is a link to the list of major groups of SESTAT occupation categories for 2003 or later: http://ncsesdata.nsf.gov/docs/occ03maj.html.
Race and ethnicity. All graduates, both U.S. citizens and non-U.S. citizens, are included in the race and ethnicity data presented in this report. American Indian or Alaska Native, Asian, black or African American, Native Hawaiian or Other Pacific Islander, white, and persons reporting more than one race refers only to individuals who are not Hispanic or Latino.
Salary. Median annual salaries are reported for the principal job and are rounded to the nearest $1,000. All respondents are asked to report annual salaries, even if their annual salary is provided for less than 12 months.
Unemployed. Includes individuals who were not working during the survey reference week but had been seeking work in the prior four weeks.
Changes in the Detailed Statistical Tables
The number of detailed statistical tables published in this edition of the series has been reduced. The complete list of tables produced for the 2008 SESTAT is shown in exhibit 1. The published tables are designated by table number in the first column. The remaining tabulations, designated as "supplemental," are available on request from the SESTAT Project Officer. The National Center for Science and Engineering Statistics (NCSES) is developing a new system of delivering data from its surveys. When fully implemented, select data tables will continue to be published together with the survey's technical documentation. The larger set of detailed data tables associated with the series will be available through the NCSES website and will provide greater opportunity for customization.
SESTAT = Scientists and Engineers Statistical Data System.
NOTES: Prior-year numbering for tables published in report are in boldface. Tables designated by "S" are available on request from Project Officer.
Standard Error Tables