Using the American Community Survey as the Sampling Frame for the National Survey of College Graduates
NSCG Sample Design
In 1962, NSF and other agencies sponsored the Postcensal Manpower Survey in order to obtain information on S&E personnel resources. This was a single, cross-sectional survey with the sample derived from the long form of the 1960 decennial Census (hence, "postcensal"). In 1972, NSF sponsored another postcensal survey (the Professional, Technical and Scientific Manpower Survey) for similar purposes, with smaller follow-ups through 1978. In the 1980s, NSF again conducted a postcensal survey with follow-ups through 1989. After a major redesign following the 1990 Census, the NSCG continued the mode of a large postcensal (baseline) survey, with smaller follow-up surveys during the remainder of the decade. The decennial long form did not contain any information on educational background, aside from the level of educational attainment that would allow for the identification of individuals with S&E or S&E-related degrees. Therefore, the baseline NSCG served two purposes: (1) to provide a once-in-a-decade view of all college graduates in the United States, and (2) to act as a screening device (through detailed educational histories collected in the NSCG) for obtaining a sample of scientists and engineers for the SESTAT integrated file. The NSCG follow-up surveys, generally every 2 to 3 years, were limited to those meeting the SESTAT target population definition of a scientist or engineer.
At the beginning of each decade, the U.S. Census Bureau created a sampling frame for the NSCG based on the decennial Census, which was used to draw the baseline NSCG sample. All long-form respondents with a bachelor's degree or higher at the time of the Census had a chance of selection into the postcensal NSCG sample. To capture the stock of scientists and engineers from the NSCG, it was necessary to sample all occupations from the decennial long form because a high proportion of individuals with S&E or S&E-related degrees do not work in S&E or S&E-related occupations (57% of this population in the 2003 NSCG either worked in a non-S&E occupation or were not working). Additionally, the NSCG is the only source of information for the SESTAT integrated database on individuals with non-S&E degrees working in S&E or S&E-related occupations. In 2003, the size of this population was 1,510,100 (see table 1).
Utilizing data from the educational history section of the 2003 NSCG, it was possible to identify individuals sampled in non-S&E occupations, but who had S&E or S&E-related degrees, as well as to identify individuals with non-S&E degrees working in S&E or S&E-related occupations. Especially at the bachelor's level (which is the large majority of all those with a bachelor's degree or higher), people with S&E degrees are likely to be in a non-S&E occupation. Individuals in S&E occupations and those in non-S&E occupations with a higher likelihood of being held by someone with an S&E degree are sampled at higher rates than other cases.
In 1993, NSF requested that the Census select about 215,000 individuals for the NSCG sample from the decennial long-form sample frame. The 1993 sample netted about 75,000 cases that met NSF's definition of a scientist or engineer and therefore were eligible for the SESTAT integrated database and the NSCG follow-up surveys. In 2003, about 171,000 individuals were selected from the 2000 Census long-form frame, which yielded approximately 67,000 cases with S&E and S&E-related degrees or occupations. Table 2 shows the yield of cases that were obtained from the NSCG in 1993 and 2003.
The Census Bureau conducts the NSCG for NSF because it is a subsample of the decennial Census long-form sample. Any record derived from decennial Census records is protected by Title 13 and must be used only under Census Bureau supervision; thus the NSCG sample must be drawn by Census, and the survey must be conducted by the Census Bureau. The postcensal NSCG survey has generally been fielded about 3 years after the decennial Census because of the time needed to process the decennial Census and make the data available for NSCG sampling.
The postcensal NSCG sample design has been a two-phase, stratified random sample of individuals with at least a bachelor's degree at the time of the Census. Phase 1 consisted of the sampling households for the Census long-form sample. That procedure utilized a stratified systematic sample, with differing sampling rates for administrative areas of different sizes (about a 1-in-12 to 1-in-16 sampling rate). Phase 2, the NSCG postcensal sample, consisted of subsampling persons with at least a bachelor's degree whose reported age from the long-form records would result in them being age 75 years of age or younger at the time of reference date for the postcensal NSCG. In 2003, the major sampling variables used to create the strata for the frame were the following: educational attainment (bachelor's degree or higher), by highest degree level achieved; occupation; demographic group (which combines citizenship, race/ethnicity, and disability status); and sex. Within each stratum, individuals were selected using probability-proportional-to-size (PPS) systematic sampling. The long-form sampling weight was used as the size measure for selection to compensate as much as possible for the differing long-form sampling rates and, hence, to come as close as possible to an overall self-weighting sample within each Phase 2 stratum.
Survey responses to the postcensal NSCG baseline survey determined eligibility for the follow‑up NSCG surveys, depending on whether a respondent met the definition of a scientist or engineer for SESTAT. Because the NSCG baseline survey collects information on educational background, the sampling for the follow-up survey includes the original sampling strata, as well as field of highest S&E degree.
A review of the NSCG sample design found that the long-form frame approach had sample selection and coverage problems. Three significant problems were the following:
The NSCG sample from the 1980 decennial Census remaining at the end of the 1980s was discarded at the recommendation of a previous CNSTAT panel due to coverage, bias, and other problems. NSF considered doing the same with the 1990 postcensal remaining sample, primarily due to the low unconditional response rate (63%), as well as concerns about panel attrition, fatigue, and possible bias. However, problems with this approach include the complete loss of longitudinal continuity and a lack of information about how nonresponse adjustments during the decade might cause a shift in the time series. NSF addressed these issues by embedding an experiment in the design of the 2003 NSCG. In addition to drawing a new population from the 2000 decennial long-form sample, NSF also included the remaining 1999 NSCG respondent population (which included cases originally sampled in the 1993 NSCG, as well as the 1995–99 RCG surveys) to receive the 2003 survey. The evaluation of this experiment found some large differences in estimates of the scope of coverage between various nonresponse adjustment cells made from newly drawn 2000 postcensal samples versus retained longitudinal samples from the 1999 NSCG in 2003 (Finamore, Hall, and Fecso 2006). Further research is needed to determine all the factors that may have contributed to the differences.
NSF's potential sampling frames and designs have been reviewed several times by scientific bodies over the decades. Each time, the design based on the Census long form for the NSCG was found to be the best available strategy for the time. More recently, in preparation for the NSCG surveys in the 2000s, NSF explored alternative sampling frames for the S&E workforce data system. NCSES looked for a frame that could provide a more complete representation of the universe of scientists and engineers than the long-form approach (Fecso, Baskin, et al. 2007). No suitable alternative to the long-form frame for the NSCG was identified, primarily because no other survey had sufficient sample size to include the number of scientist and engineers, a relatively rare population, to meet the needs of the NSCG and SESTAT. However, the ACS was identified as a future potential alternative. At the time, the ACS was in a developmental mode. The ACS is now fully operational, and NCSES has been given the authority to use the ACS as a sample frame for the NSCG in the future.
 Detailed design information about the 2003 NSCG can be found in the 2003 NSCG sample design report and the 2003 NSCG methodology report, available from NCSES on request.
 Although referred to as "respondents," in actuality the response for the individual may be provided by another household member.
 See the 2003 NSCG questionnaire (http://www.nsf.gov/statistics/srvygrads/survey2003/grads_2003.pdf). The educational history grid can be found in Section A of the questionnaire.
 This number excludes those who graduated in non-S&E fields after April 1, 2000, who were working in S&E or S&E-related occupations in 2003 as well as those with only foreign degrees who were not in the United States at the time of the decennial Census but were here working in an S&E or S&E-related occupation at the time of the 2003 NSCG.