Comparison of the National Science Foundation's Scientists and Engineers Statistical Data System (SESTAT) with the Bureau of Labor Statistics' Current Population Survey (CPS)
This section examines the areas of coverage and estimation. "SESTAT Coverage" discusses the coverage issues for SESTAT and provides SESTAT estimates of individuals in S&E occupations. "CPS Coverage" discusses the corresponding CPS coverage issues and how CPS can be used to provide estimates of individuals in S&E occupations that are comparable to SESTAT estimates. The CPS coverage also supports estimation of individuals in non-S&E occupations and individuals who do not have a bachelor's or higher degree (those who have an associate's degree or high school diploma as their highest degree level). Although not covered by SESTAT, there has been interest in examining the number and characteristics of individuals in S&E occupations who do not have a bachelor's or higher degree. "Comparison of Estimates" examines the compatibility of estimates between SESTAT and CPS and provides CPS estimates of individuals in S&E occupations that are not available in SESTAT. The estimates used data from the 1997 SESTAT and the April 1997 CPS. April 1997 was the reference month for both surveys.
Appendix B documents the SESTAT and CPS variables used in this report, the initial frequencies before recoding, and the method used to create derived variables for comparisons. Appendix B includes the following:
SESTAT and CPS are both sample survey systems and are thus subject to sampling error. In this report, survey estimates are presented with the approximate standard error (SE) to indicate the precision of the estimates. The section "Sampling Errors" and appendix C discuss the methods used to compute standard errors of SESTAT and CPS estimates presented in this report. For example, the 1997 SESTAT estimate shows a total of 3,369,400 individuals working in S&E occupations. The SE of this estimate is 26,600 (appendix C, tables C-1 and C-3); the corresponding 95% confidence limits are obtained by adding and subtracting 1.96 times the SE from the survey estimate. This means that with 95% confidence, the "true" population is expected to lie between 3,317,300 and 3,421,500. The corresponding CPS estimate of persons with bachelor's or higher degrees employed in S&E occupations is 3,542,100. The SE of this estimate is 101,700 (appendix C, tables C-9 and C-11) and the 95% confidence interval is between 3,342,700 and 3,741,600. The difference between these overall estimates is not statistically significant. However, the SESTAT and CPS estimates are significantly different for some groups and those differences that are significant at the 95% confidence level are indicated in appendix C, tables C-29 and C-31.
The SESTAT target population (see "Overview of SESTAT Design") includes people who meet all of the following conditions as of the survey reference period:
Table 1 summarizes estimates of the number of individuals in the SESTAT population by subset and year. The table also illustrates the magnitude of changes in the SESTAT population over time as well as coverage problems associated with certain subsets of the SESTAT population. As shown in the first row of the table, the population represented in the total SESTAT data system increased from 11,615,200 in 1993 to 12,530,700 in 1997. However, not all of the 12,530,700 individuals in the 1997 data system are scientists and engineers according to the SESTAT definition. About 276,600 individuals with no S&E degree in the 1997 SESTAT data system (subsets N, H, and J in table 1) were no longer working in S&E occupations in 1997 (but had been doing so in 1993). Figure 1 shows various subsets of the SESTAT population that correspond to the subsets in table 1.
Under the current design used for the 1993, 1995, and 1997 SESTAT, certain groups that were originally intended to be in the target population are subject to undercoverage in SESTAT over the decade. One major group that is subject to undercoverage in SESTAT is a special group of immigrants. Immigrants who earned S&E degrees outside of the United States and who were residing in the country in April 1990 were included in the sampling frame for the 1993 NSCG. However, immigrants who earned S&E degrees outside of the United States and then entered the country after April 1990 are not covered in the SESTAT integrated data system unless they later earned an S&E degree from a U.S. institution. Immigrants who earned S&E degrees at a U.S. institution either before or after April 1990 are covered in the SESTAT integrated data system. Since it is not possible with the current SESTAT sampling frames to include foreign-trained individuals in SESTAT after April 1990 (unless they earn an S&E degree from a U.S. institution after entering the U.S.), the estimated population counts shown in table 1 understate the true numbers. Until provisions are made to supplement the SESTAT sample with foreign-trained scientists and engineers with no U.S. degree, the undercoverage of this subset will obviously increase over time. A rough estimate of the size of this omitted group is provided in the section "CPS Coverage."
The other major group subject to undercoverage in SESTAT includes individuals with a non-S&E degree who are employed in S&E occupations. This group would ordinarily be included in subset M of table 1. Starting with an estimated 593,600 individuals in the 1993 SESTAT, the numbers in subset M have decreased to 334,100 in 1995 and to 292,000 in 1997. This subset has diminished over time because the current SESTAT sampling frames do not allow any additions to the group. Specifically, the sample frames do not allow the identification of (1) individuals who earned non-S&E degrees after April 1990 and then obtained S&E jobs, (2) individuals who earned non-S&E degrees before April 1990 but then moved to S&E jobs after 1993, and (3) immigrants after April 1990 with only foreign-earned non-S&E degrees who entered S&E jobs in the United States. Even the estimated number of individuals with non-S&E degrees for 1993 is an understatement because individuals receiving non-S&E bachelor's or higher degrees between 1990 and 1993 who were working in S&E occupations in 1993 are not included in the 1993 SESTAT estimate.
Although it is not possible to estimate the extent of the undercoverage of the population with non-S&E degrees using the SESTAT data, differences in the number of individuals in subset M of table 1 over the three survey cycles may provide some indication. For example, assuming conservatively that the actual number of individuals with non-S&E degrees who are working in S&E remains at roughly the 1993 level (i.e., 593,600 individuals), at least 259,500 would be excluded from the 1995 SESTAT (the difference between the numbers in 1993 and 1995), and 301,600 would be excluded from the 1997 SESTAT (the difference between the numbers in 1993 and 1997). Although these numbers of excluded individuals are relatively small in comparison to the total S&E population in SESTAT, they represent a significant portion of the subset of individuals without S&E degrees who work in S&E occupations.
To summarize, the groups that are conceptually part of the target population but are subject to undercoverage after 1993 in SESTAT include the following:
Analysts using the SESTAT data system may decide to restrict the S&E population to individuals with S&E degrees or, alternatively, to individuals currently working in S&E occupations. In doing so, the implications of the undercoverage will be different. For example, if the S&E population is restricted to include only individuals with S&E degrees, the undercoverage in SESTAT of individuals with non-S&E degrees who are working in S&E occupations is no longer a concern.
The S&E population in SESTAT includes active duty military personnel living in the United States. (Military personnel living outside the United States during the survey reference week are excluded from SESTAT, as are any individuals not residing in the United States.) Table 2 shows counts of the SESTAT population for civilians and military personnel in the United States. The total number of military personnel is 94,500, less than 1% of the 1997 SESTAT population.
The target population for CPS is the civilian noninstitutionalized population of the United States. Individuals residing in group quarters (e.g., college dormitories, retirement homes, and communes) are included in CPS if the group quarters are classified as civilian and noninstitutional. CPS includes only the "civilian" labor force. Active duty military personnel are generally excluded from CPS regardless of whether they are stationed in the United States or overseas. (An exception is the March CPS Supplement, for which military personnel residing in households with another adult civilian are eligible.)
See U.S. Census Bureau (2000) for a discussion of CPS coverage issues. Although the goal of the CPS sample design is to give all U.S. residents a nonzero probability of selection for the survey, coverage of 100% is rarely achieved. Noncoverage results from errors in almost every phase of data collection from listing, sampling, and enumerating households to locating and interviewing respondents. Historically, the effect of these errors in CPS (like many other national surveys using area probability sampling designs) has been to understate the number of people in the United States.
An indication of the amount of undercoverage for a specific subgroup of the population is given by the "coverage ratio." The coverage ratio is defined as the ratio of the estimated number of individuals (as estimated from the sample) to the corresponding "known" population total derived from independent sources. For the 1996 CPS, the overall coverage ratio is estimated to be 93%. This can be interpreted to mean that about 7% of the U.S. population is not covered in the CPS data collection. The CPS coverage ratios also vary by race/ethnicity, age, and sex (see U.S. Census Bureau 2000, table 16-1). Coverage ratios tend to be lower for blacks (84%) and Hispanics (83%) than for whites (94%) and are generally lower for males (92%) than for females (96%). The younger age groups (particularly individuals in their 20s and 30s and younger black males) have much lower coverage rates than older individuals.
CPS uses weighting adjustment and poststratification to independent population totals to compensate for undercoverage (see the section "Weighting and Estimation"). This procedure forces the aggregate count of individuals in the sample to agree with the independent totals. However, it does not guarantee that biases resulting from the undercoverage are adequately eliminated. For example, it is not known how the differential undercoverage will affect estimates of the number of individuals with S&E degrees or the number who are employed in S&E occupations.
A limitation of CPS relative to SESTAT is that CPS does not collect data about S&E degrees. Comparisons of SESTAT and CPS estimates are restricted to individuals in S&E occupations. Comparisons by S&E degree are not possible. For example, CPS cannot provide separate estimates of individuals with an associate's degree in S&E who are not working in S&E occupations. However, CPS can provide estimates of the number of individuals in S&E and non-S&E occupations and for individuals with an associate's degree who are employed in S&E occupations.
Table 3 shows CPS estimates of the number and percentage of individuals age 75 or younger in S&E and non-S&E occupations by highest degree attained, including degree levels above and below a bachelor's degree. An estimated 17% of individuals in S&E occupations have an associate's degree as their highest education level, and 5% have a high school diploma. The percentages of individuals in non-S&E occupations at these education levels are 33% and 39%, respectively. By employment status, 93% of individuals in S&E occupations are full-time workers, compared with 81% of individuals in non-S&E occupations.
Table 4 compares weighted counts of individuals in the April 1997 CPS who were age 75 or younger, had a bachelor's or higher degree, and were employed in an S&E occupation with the corresponding counts from the 1997 SESTAT. The SESTAT estimate of the number of civilian workers employed in S&E occupations is 3,346,200 (SE = 26,600). The corresponding CPS estimate is 3,542,100 (SE = 101,700). The difference between the two estimates is not statistically significant at the 95% confidence level.
Table 5 shows April 1997 CPS estimates of the number of bachelor's or higher degree recipients employed in S&E occupations by educational attainment and whether they entered the United States before 1990. Table 6 shows the corresponding estimates by S&E occupational group. Among the 3.54 million individuals in S&E occupations, about 210,200 (6%) entered the United States during or after 1990. This estimate of 210,200 (SE = 25,000) provides a crude upper boundary on the immigrant portion of the foreign-trained group that is excluded from SESTAT. The actual number of excluded immigrants is likely to be lower than 210,200 because individuals who received an S&E degree from a U.S. institution after April 1990 would be covered in SESTAT through the NSRCG or SDR. Those who entered the United States between January and April 1990 are also covered in SESTAT. The percentage of recent immigrants varies by educational level and by occupational groups. This percentage was higher among people with postgraduate degrees than among bachelor's degree recipients and higher among life scientists than other S&E occupations.
Table 7 shows the CPS estimates of the number of individuals in S&E occupations by highest degree attained (including associate's degree and high school diploma) and occupational groups. The total count, including individuals whose highest educational level is an associate's degree or high school diploma, is about 4.54 million, which is 28% more than the total of 3.54 million individuals with bachelor's or higher degrees. About 752,500 individuals (SE = 47,300) in S&E occupations had an associate's degree, and another 245,100 (SE = 27,000) had a high school diploma (or equivalent). The inclusion of individuals without a college degree has the largest effect on estimates for computer and mathematical scientists and engineers, increasing the total counts by 37% and 32%, respectively. Appendix D lists the number of individuals in S&E occupations by educational level, S&E occupational group, and detailed occupational code.
Comparison of Estimates
This section provides comparisons of SESTAT and CPS estimates of individuals employed in S&E occupations with and without coverage adjustments. Three coverage differences are noted. First, SESTAT included individuals employed in military service, and CPS did not include this group. Second, SESTAT did not include new immigrants who entered the United States after April 1990 (and who did not receive a bachelor's or higher S&E degree from a U.S. institution after entry into the country), and CPS did include this group. Third, SESTAT did not include individuals in S&E occupations who did not have a bachelor's or higher degree, whereas CPS did.
Table 8 shows SESTAT and CPS estimates of the numbers of bachelor's or higher degree recipients in S&E occupations by highest degree attained, employment status, and S&E occupational group. As mentioned earlier, the SESTAT estimate shows a total of 3.37 million graduates (civilian and military) in S&E occupations (SE= 26,600). The CPS estimate shows a total of 3.54 million civilians (SE = 101,700). The difference between these two overall estimates is not statistically significant at the 95% confidence level. By occupational group and degree attainment, some differences exist that are statistically significant at the 95% confidence level. By occupational group, the SESTAT estimate is significantly higher than the CPS estimate for the number of life scientists (126% of CPS estimate) and significantly lower for engineers (89% of CPS estimate). By degree attainment, the SESTAT estimate is significantly lower than the CPS estimate for bachelor's degree recipients (92% of CPS estimate) and significantly higher for doctorate recipients (118% of CPS estimate). One factor that could explain the differences in the doctorate estimates is better coverage of U.S. doctorate recipients that are included in SESTAT from the SDR component.
Table 9 shows SESTAT and CPS estimates after "adjusting" for known differences between them. In this table, military personnel are excluded from SESTAT estimates and recent immigrants (individuals entering the United States in 1990 or later) are excluded from CPS estimates. These adjustments are imperfect because some of the excluded immigrants in CPS could have received an S&E bachelor's or higher degree in the United States after 1990 or could have entered the United States in 1990 by 15 April and would thus be included in SESTAT. In other words, to be comparable, the same set of recent immigrants should also be excluded from the SESTAT estimates; however, this is not possible with the collected data. Nonetheless, the results are interesting because they indicate the effect recent immigrants may have on the S&E population. For example, the CPS estimate for number of individuals with a bachelor's or higher degree employed in S&E occupations is 3,542,100 (SE = 101,700) overall and 3,332,000 (SE = 98,700) if those who entered the United States during or after 1990 are excluded.
The adjusted estimates in table 9 show that the number of individuals working in S&E occupations is roughly the same in both surveys. The SESTAT estimate is 3.35 million (SE= 26,600) and the CPS estimate is 3.33 million (SE = 98,700). The differences between the SESTAT and CPS estimates by occupational group and educational level are generally consistent with and without these coverage adjustments. However, two groups affected by these adjustments are life scientists and doctorate recipients. In both cases, the difference between the SESTAT estimate and CPS estimate increased.
Table 10 shows SESTAT and CPS estimates of the number of bachelor's or higher degree recipients in S&E occupations by sex, race/ethnicity, and age. The differences by sex and race/ethnicity are mostly small and insignificant. The differences in estimates for blacks, although large, are not statistically significant at the 95% confidence level. The differences by age, however, show significantly fewer individuals who are 29 years old and younger in SESTAT (61% of CPS estimate) but more individuals in the 50–59 and 60–75 age groups (142% and 130% of CPS estimates, respectively). A factor that may have contributed to this difference is the population weighting adjustment in CPS. As discussed in the section "CPS Coverage," the CPS estimates include an overall adjustment to reflect known population totals by race/ethnicity, age, and sex. These adjustments tend to be greatest for individuals in their 20s and 30s and are applied equally to individuals with degrees and without degrees. Moreover, although the NSCG component of SESTAT included an initial poststratification adjustment to 1990 census counts, there has been no comparable adjustment in SESTAT.
Table 11 shows estimates from the two data sources after excluding from SESTAT persons in military service and excluding from CPS immigrants who entered the United States during or after 1990. The differences in estimates by age group are consistent with the differences without the adjustments. After coverage adjustments, the SESTAT estimates by sex are not significantly different than the CPS estimates. By race/ethnicity, the exclusion of new immigrants from CPS had a large effect on the estimated number of Asians (decreasing from 381,200 to 263,700).
One of the main coverage differences between SESTAT and CPS is that CPS includes individuals in S&E occupations who do not have a bachelor's or higher degree and SESTAT does not include this group. Table 12 shows the CPS estimates of this subgroup by sex, age, and race/ethnicity. Of the estimated 997,600 individuals in this subgroup (SE = 54,400), the majority have an associate's degree (see table 7) and most are employed in computer and mathematical sciences or engineering occupations. Compared with individuals who have a bachelor's or higher degree, somewhat greater proportions of individuals in S&E occupations who do not have a bachelor's or higher degree are female and age 40 or older. Additional detailed breakouts by occupation are given in appendix E, which shows that more than 78% of the 394,100 individuals who do not have a bachelor's degree but who are employed in the broad computer and mathematical science category are computer systems analysts and scientists and the majority of the 487,600 engineers are employed either as electrical/electronic engineers or mechanical engineers.
Using CPS data, table 13 compares the characteristics of individuals in S&E occupations who have an associate's degree or high school diploma and the characteristics of individuals with the same degree attainment in non-S&E occupations. By sex and race/ethnicity, 73% of individuals in S&E occupations are male, 4% are Hispanic, and 7% are black. By employment status, a greater percentage of individuals in S&E occupations work full time than individuals in non-S&E occupations. This trend is consistent for all individuals in S&E occupations, including those both below and above the bachelor's degree level (see table 3).
In conclusion, to compare SESTAT and CPS estimates it is possible to adjust for some coverage differences. Three coverage differences are noted. First, SESTAT includes individuals employed in military service and CPS does not include this group. Second, SESTAT omits new immigrants who entered the United States after April 1990 (and who did not receive a bachelor's or higher S&E degree from a U.S. institution after entry into the country) and CPS does not omit this group. Third, SESTAT does not include individuals in S&E occupations who do not have a bachelor's or higher degree. The adjusted estimates for total number of individuals with a bachelor's or higher degree working in S&E occupations is roughly the same in both surveys when those in military service are excluded from SESTAT and immigrants who entered the United States during or after 1990 are excluded from CPS. The main differences in estimates from the two data sources among this adjusted group are by age and race/ethnicity and for life scientists and doctorate recipients.
CPS can provide estimates of the number of individuals in S&E occupations without a bachelor's or higher degree that are not available in SESTAT. In the April 1997 CPS sample, 431 individuals without a bachelor's or higher degree were working in S&E occupations (appendix C, table C-18). On a weighted basis, these 431 individuals from the sample represent about 997,600 individuals without bachelor's degrees who are working in S&E occupations (SE= 54,400; appendix C, table C-17). Almost 90% of these people were employed as computer/math scientists or engineers.
 1999 SESTAT data were not available when this report was being written. The earlier cycles of SESTAT were not used because of time and resource limitations.
 As shown in appendix C, table C-32 and figure C-1, the number of people in S&E occupations as reported in CPS fluctuates by month. These appendix tables are intended simply to illustrate the month-to-month variation in the CPS S&E occupation numbers and to include all people age 16 or older in the civilian labor force who have a college degree and who are working or previously worked in an S&E occupation.
 The standard errors for CPS estimates tend to be higher than the corresponding standard errors for SESTAT estimates because the CPS sample includes a relatively small number of people in S&E occupations.
 Throughout this section, "degree" refers to bachelor's or higher degree.
 The sampling frame for the SDR is the Doctorate Records File, which is maintained by NSF and uses NSF's Survey of Earned Doctorates (SED) as its primary source. The SED is a census of all individuals receiving a research doctorate from a U.S. institution in each academic year. Institutional coordinators in graduate schools distribute survey forms to and collect them from individuals receiving research doctorates. Because of the high visibility and participation of doctorate-granting institutions, there is expected to be little, if any, coverage error in the first stage (inclusion of doctorate-granting graduate schools). Because the graduate schools collect the questionnaires from degree recipients at the time of doctoral completion, the second stage is also considered quite accurate. Comparisons of the number of research doctorates covered by the SED with the total number of doctorates (including nonresearch doctorates) reported by institutions to the National Center for Education Statistics confirm that coverage of research doctorates in the SED is excellent (http://www.nsf.gov/statistics/srvydoctorates/).