Comparison of the National Science Foundation's Scientists and Engineers Statistical Data System (SESTAT) with the Bureau of Labor Statistics' Current Population Survey (CPS)
Both SESTAT and CPS are based on sample surveys that use complex probability sample designs. As such they are subject to various limitations. As described in the section "SESTAT Coverage," the main limitation of the SESTAT design is that over time it excludes an increasingly larger part of the target S&E population. In addition, SESTAT currently excludes individuals who do not have a bachelor's or higher degree. CPS, on the other hand, by definition excludes individuals in the military (a group covered in SESTAT). Because CPS is based on an area probability sample, it is also subject to undercoverage of certain subgroups of the civilian population (see the section "CPS Coverage"). Thus, an important distinction between the designs for the two studies is related to coverage issues. As long as these differences are recognized, the results from CPS and SESTAT can be analyzed and compared despite the fact that the SESTAT and CPS sample designs are different. Some additional factors that have a bearing on the ability to make comparisons across the studies follow. These factors include nonresponse and imputation, weighting and estimation, and sampling errors.
Nonresponse and Imputation
Nonresponse (both unit and item nonresponse) is a concern in both SESTAT and CPS because it can introduce biases of unknown magnitude in the survey estimates. Although weighting adjustments are made in both SESTAT and CPS to compensate for nonresponse, it is unlikely that the biases are completely eliminated. However, even if the characteristics of the nonrespondents are different from those of the respondents, the effect of nonresponse on the survey estimates will be minimal if nonresponse rates are relatively low and are not highly variable among demographic groups. In other words, the higher the nonresponse rate, the greater the potential for serious biases in the survey estimates.
Table 14 shows the unweighted unit response rate for SESTAT components in 1993, 1995, and 1997. For the NSCG, the response rates in 1995 and 1997 were conditional on prior respondent status in 1993. The 1993 NSCG response rate was 80%; only respondents of the 1993 NSCG were eligible for subsequent cycles. That is, there was no follow-up of nonrespondents from one cycle to the next. The conditional NSCG response rates for 1995 and 1997 were 95% and 94%, respectively. The unconditional response rate for the 1997 NSCG (i.e., the cumulative response rate for all three cycles) was approximately 71%. The response rates for the NSRCG and the SDR are unconditional response rates computed independently at each cycle. The unconditional response rates for the 1997 NSRCG and SDR were 82% and 84%, respectively.
The CPS unit response rates are generally higher than the SESTAT response rates, ranging from 91% to 94% per month (e.g., see U.S. Census Bureau 2000 and the CPS website). The lower response rates in SESTAT raise the concern that the potential for bias resulting from unit nonresponse is greater for SESTAT than for CPS. However, SESTAT has fairly complete and rich data on demographics and degrees that are used for nonresponse adjustment to attenuate nonresponse biases. Because CPS nonresponse and poststratification adjustments are made without regard to degree status, the effect of the CPS adjustments for subsets of individuals with degrees is unknown.
Item nonresponse can also have adverse effects on survey data. The extent of item nonresponse is relatively minor for both SESTAT and CPS. In CPS, item nonresponse is generally low for demographic and labor force items (about 1% or less). In SESTAT, only those questionnaires that provide complete data for all "critical" items relating to degrees and occupation were considered to be completed questionnaires (i.e., respondents). Thus, by definition, there was no item nonresponse among respondents for the critical items. Any nonresponse in SESTAT is included in the unit response rates discussed above. For the noncritical items, item nonresponse rates are generally low. For example, the item nonresponse rates for variables included in this evaluation were approximately 1% or less. Both SESTAT and CPS use hotdeck methods to impute missing data items. SESTAT uses hotdeck imputation after some logical edit imputation is completed.
Weighting and Estimation
Both SESTAT and CPS require the use of weights to inflate the sample results to population levels. The purpose of the weights is to compensate for variable probabilities of selection, differential response rates, and undercoverage. All of the population estimates presented in this report are weighted estimates using person-level weights available in public-use files.
Although some aspects of weighting have varied from year to year, the main features of the weighting procedures used in SESTAT can be summarized as follows:
For analyses of the CPS data, weights have been developed that reflect probabilities of selection and include both nonresponse adjustments and poststratification adjustments to current population counts. As described in detail in U.S. Census Bureau (2000), chapter 10, the weights derived for analysis of CPS data include the following components:
In conclusion, CPS adjusts weights by geographic area and demographic characteristics such as sex, age, race, and ethnicity. SESTAT adjusts weights by degree level and field as well as demographic characteristics such as sex, race/ethnicity, disability status, and citizenship.
All of the estimates cited in this report are based on sample data and are thus subject to sampling errors. Both SESTAT and CPS publish generalized variance functions (GVFs) that can be used to estimate the standard error of an estimated total. These GVFs have been used to obtain the standard errors of the estimates presented in this report. For example, table 15 shows the standard errors and the coefficients of variation (CV) of estimates of individuals working in S&E occupations by highest degree attained for SESTAT and CPS. CV is the standard error divided by the estimated total expressed as a percentage. As shown in the table, the standard errors for SESTAT estimates are considerably smaller than those for the corresponding CPS estimates. This is a reflection of the samples size for the two studies (see table 16). Thus, although CPS can provide useful information about the S&E population, detailed analyses are severely limited by the comparatively large sampling errors. In particular, analysis by subgroups, such as detailed occupation (e.g., economists) or demographic groups (e.g., women and minorities), is limited, even if several months of CPS data are accumulated. For additional information about the standard errors of the SESTAT and CPS estimates and corresponding subgroup sample sizes, see appendix C.
 The National Center for Education Statistics has established IPEDS as its core postsecondary education data collection program. It is a single, comprehensive system that encompasses all identified institutions whose primary purpose is to provide postsecondary education. The IPEDS system is built around a series of interrelated surveys to collect institution-level data in areas such as enrollments, program completions, faculty, staff, and finances. The NSRCG poststratification adjustments used IPEDS data on number of bachelor's and master's degrees awarded by degree level and major field.