nsf.gov - NCSES Using the American Community Survey as the Sampling Frame for the National Survey of College Graduates - US National Science Foundation (NSF)
text-only page produced automatically by LIFT Text
Transcoder Skip all navigation and go to page contentSkip top navigation and go to directorate navigationSkip top navigation and go to page navigation
National Science Foundation National Center for Science and Engineering Statistics
Using the American Community Survey as the Sampling Frame for the National Survey of College Graduates

NSCG Sample Design[6]


In 1962, NSF and other agencies sponsored the Postcensal Manpower Survey in order to obtain information on S&E personnel resources. This was a single, cross-sectional survey with the sample derived from the long form of the 1960 decennial Census (hence, "postcensal"). In 1972, NSF sponsored another postcensal survey (the Professional, Technical and Scientific Manpower Survey) for similar purposes, with smaller follow-ups through 1978. In the 1980s, NSF again conducted a postcensal survey with follow-ups through 1989. After a major redesign following the 1990 Census, the NSCG continued the mode of a large postcensal (baseline) survey, with smaller follow-up surveys during the remainder of the decade.[7] The decennial long form did not contain any information on educational background, aside from the level of educational attainment that would allow for the identification of individuals with S&E or S&E-related degrees. Therefore, the baseline NSCG served two purposes: (1) to provide a once-in-a-decade view of all college graduates in the United States, and (2) to act as a screening device (through detailed educational histories collected in the NSCG) for obtaining a sample of scientists and engineers for the SESTAT integrated file. The NSCG follow-up surveys, generally every 2 to 3 years, were limited to those meeting the SESTAT target population definition of a scientist or engineer.

At the beginning of each decade, the U.S. Census Bureau created a sampling frame for the NSCG based on the decennial Census, which was used to draw the baseline NSCG sample. All long-form respondents with a bachelor's degree or higher at the time of the Census had a chance of selection into the postcensal NSCG sample.[8] To capture the stock of scientists and engineers from the NSCG, it was necessary to sample all occupations from the decennial long form because a high proportion of individuals with S&E or S&E-related degrees do not work in S&E or S&E-related occupations (57% of this population in the 2003 NSCG either worked in a non-S&E occupation or were not working). Additionally, the NSCG is the only source of information for the SESTAT integrated database on individuals with non-S&E degrees working in S&E or S&E-related occupations. In 2003, the size of this population was 1,510,100 (see table 1Excel table.).

Utilizing data from the educational history section of the 2003 NSCG,[9] it was possible to identify individuals sampled in non-S&E occupations, but who had S&E or S&E-related degrees, as well as to identify individuals with non-S&E degrees working in S&E or S&E-related occupations. Especially at the bachelor's level (which is the large majority of all those with a bachelor's degree or higher), people with S&E degrees are likely to be in a non-S&E occupation. Individuals in S&E occupations and those in non-S&E occupations with a higher likelihood of being held by someone with an S&E degree are sampled at higher rates than other cases.

In 1993, NSF requested that the Census select about 215,000 individuals for the NSCG sample from the decennial long-form sample frame. The 1993 sample netted about 75,000 cases that met NSF's definition of a scientist or engineer and therefore were eligible for the SESTAT integrated database and the NSCG follow-up surveys. In 2003, about 171,000 individuals were selected from the 2000 Census long-form frame, which yielded approximately 67,000 cases with S&E and S&E-related degrees or occupations. Table 2Excel table. shows the yield of cases that were obtained from the NSCG in 1993 and 2003.

The Census Bureau conducts the NSCG for NSF because it is a subsample of the decennial Census long-form sample. Any record derived from decennial Census records is protected by Title 13 and must be used only under Census Bureau supervision; thus the NSCG sample must be drawn by Census, and the survey must be conducted by the Census Bureau. The postcensal NSCG survey has generally been fielded about 3 years after the decennial Census because of the time needed to process the decennial Census and make the data available for NSCG sampling.

The postcensal NSCG sample design has been a two-phase, stratified random sample of individuals with at least a bachelor's degree at the time of the Census. Phase 1 consisted of the sampling households for the Census long-form sample. That procedure utilized a stratified systematic sample, with differing sampling rates for administrative areas of different sizes (about a 1-in-12 to 1-in-16 sampling rate). Phase 2, the NSCG postcensal sample, consisted of subsampling persons with at least a bachelor's degree whose reported age from the long-form records would result in them being age 75 years of age or younger at the time of reference date for the postcensal NSCG. In 2003, the major sampling variables used to create the strata for the frame were the following: educational attainment (bachelor's degree or higher), by highest degree level achieved; occupation; demographic group (which combines citizenship, race/ethnicity, and disability status); and sex. Within each stratum, individuals were selected using probability-proportional-to-size (PPS) systematic sampling. The long-form sampling weight was used as the size measure for selection to compensate as much as possible for the differing long-form sampling rates and, hence, to come as close as possible to an overall self-weighting sample within each Phase 2 stratum.

Survey responses to the postcensal NSCG baseline survey determined eligibility for the follow‑up NSCG surveys, depending on whether a respondent met the definition of a scientist or engineer for SESTAT. Because the NSCG baseline survey collects information on educational background, the sampling for the follow-up survey includes the original sampling strata, as well as field of highest S&E degree.

A review of the NSCG sample design found that the long-form frame approach had sample selection and coverage problems.[10] Three significant problems were the following:

  1. Responses to the Census long-form questionnaire were not an efficient means of identifying those with S&E degrees because there was only information about the highest level of degree attained and not of degree fields. It did not provide the means to identify those with SEH degrees.

    The past sample design using the postcensal survey as a screening mechanism made possible valuable comparisons of scientists and engineers with nonscientists and nonengineers once a decade. The 1993 survey created a large database about college-educated individuals that could be utilized for analysis by those interested in fields other than S&E. However, the postcensal sample included many people who were not SESTAT eligible. On average, almost 3 cases were surveyed to find one SESTAT-eligible respondent. Using the 1993 NSCG, it was possible to improve the efficiency of the 2003 NSCG sample design, to approximately 2 sample cases to yield one SESTAT-eligible respondent, while still maintaining a sufficient level of accuracy to make comparisons between S&E, S&E-related and non-S&E domains.

  2. The approach of using a decennial Census to identify the stock of engineers and scientists to be interviewed over the decade, together with new graduates of U.S. institutions in S&E fields from the RCG and the SDR, meant that some population groups were missed. Some were missing from the beginning in the postcensal NSCG, while for other groups the coverage problems grew worse over the decade.

    1. Those with non-S&E degrees who entered S&E or S&E-related jobs after the postcensal NSCG were not covered in any of the surveys later in the decade. The computer occupations, for example, included a significant number of workers not educated in a science, engineering, or related discipline.

    2. The NSCG is the only SESTAT survey that includes scientists and engineers whose degrees were all earned abroad. However, this population was captured in the sample only once a decade in the baseline survey. Foreign-educated scientists and engineers entering the United States after the decennial Census and receiving no further degrees in the United States were not included in any SESTAT survey, and so the undercoverage of this group grew throughout the decade.

    3. A substantial number of scientists and engineers fall into both of the populations described above: non-S&E graduates in S&E and S&E-related occupations and foreign-educated scientists and engineers. Data from the 2003 NSCG and the 2003 SESTAT integrated file show that there were 720,632 individuals in S&E occupations and 789,428 individuals in S&E-related occupations with non-S&E degrees only.[11] Additionally, there were estimated to be 1,470,729 individuals in the SESTAT population who had only foreign degrees. Taking into account some overlap between these two populations, approximately 2.6 million individuals in 2003 in the SESTAT population (a) had an S&E occupation, but no S&E degree; or (b) had only foreign degrees. Such individuals represent approximately 12% of the 2003 SESTAT population of 21.6 million persons.

  3. Another problem was increasing cumulative nonresponse through the decade. The postcensal surveys have had a response rate that historically has been near 80%; the 2003 postcensal NSCG had an unweighted response rate of 63%. Follow-on surveys later in the decade generally have had very good response rates (in the high 90s), but past nonrespondents are removed from these surveys. Thus, there is an increasing unconditional nonresponse rate as the decade progresses (see Table 3Excel table. for the NSCG sample sizes, respondents, and unweighted response rates).

The NSCG sample from the 1980 decennial Census remaining at the end of the 1980s was discarded at the recommendation of a previous CNSTAT panel due to coverage, bias, and other problems. NSF considered doing the same with the 1990 postcensal remaining sample, primarily due to the low unconditional response rate (63%), as well as concerns about panel attrition, fatigue, and possible bias. However, problems with this approach include the complete loss of longitudinal continuity and a lack of information about how nonresponse adjustments during the decade might cause a shift in the time series. NSF addressed these issues by embedding an experiment in the design of the 2003 NSCG. In addition to drawing a new population from the 2000 decennial long-form sample, NSF also included the remaining 1999 NSCG respondent population (which included cases originally sampled in the 1993 NSCG, as well as the 1995–99 RCG surveys) to receive the 2003 survey. The evaluation of this experiment found some large differences in estimates of the scope of coverage between various nonresponse adjustment cells made from newly drawn 2000 postcensal samples versus retained longitudinal samples from the 1999 NSCG in 2003 (Finamore, Hall, and Fecso 2006). Further research is needed to determine all the factors that may have contributed to the differences.

NSF's potential sampling frames and designs have been reviewed several times by scientific bodies over the decades. Each time, the design based on the Census long form for the NSCG was found to be the best available strategy for the time.[12] More recently, in preparation for the NSCG surveys in the 2000s, NSF explored alternative sampling frames for the S&E workforce data system. NCSES looked for a frame that could provide a more complete representation of the universe of scientists and engineers than the long-form approach (Fecso, Baskin, et al. 2007). No suitable alternative to the long-form frame for the NSCG was identified, primarily because no other survey had sufficient sample size to include the number of scientist and engineers, a relatively rare population, to meet the needs of the NSCG and SESTAT. However, the ACS was identified as a future potential alternative. At the time, the ACS was in a developmental mode. The ACS is now fully operational, and NCSES has been given the authority to use the ACS as a sample frame for the NSCG in the future.

Top of page. Back to Top


[6] Detailed design information about the 2003 NSCG can be found in the 2003 NSCG sample design report and the 2003 NSCG methodology report, available from NCSES on request.

[7] The redesign was largely based on a Committee on National Statistics (CNSTAT) report (NRC 1989).

[8] Although referred to as "respondents," in actuality the response for the individual may be provided by another household member.

[9] See the 2003 NSCG questionnaire (http://www.nsf.gov/statistics/srvygrads/survey2003/grads_2003.pdf). The educational history grid can be found in Section A of the questionnaire.

[10] This section draws heavily on a review of potential sampling frames for SESTAT done prior to the design of the 2003 NSCG. See Fecso, Choudry, et al. (2007).

[11] This number excludes those who graduated in non-S&E fields after April 1, 2000, who were working in S&E or S&E-related occupations in 2003 as well as those with only foreign degrees who were not in the United States at the time of the decennial Census but were here working in an S&E or S&E-related occupation at the time of the 2003 NSCG.

[12] For example, see the National Research Council (1989, 2003).

Using the American Community Survey as the Sampling Frame for the National Survey of College Graduates
Working Paper | NCSES 12-201 | August 2012