Using the American Community Survey as the Sampling Frame for the National Survey of College Graduates
Adding a Field of Degree Question to the ACS
While the ACS provides opportunities for the improvement of coverage of the SESTAT target population, there is one area where the ACS is no different than the decennial Census in that it does not collect any information on the field of an individual's college degree(s). The SESTAT target population has two components (S&E or S&E-related degree OR S&E or S&E-related occupation), but it was only possible to use the latter as a sampling characteristic for the NSCG from the long form, which resulted in a sample with many non-SESTAT-eligible cases. Adding an item to the ACS related to the field of a person's college degree would greatly increase the efficiency of the ACS as a sample frame for the NSCG, as then the first part of the SESTAT population definition could be used as a sampling characteristic, and the ACS would contain data on both dimensions that determine whether an individual meets the SESTAT criteria for being classified as a scientist or engineer.
NCSES began discussing the issue of adding a field of degree (FOD) item to the ACS several years ago and has worked with other agencies, the Office of Management and Budget (OMB), the U.S. Census Bureau, and congressional staff on this process. A variety of question formats and the content of an FOD item were discussed, investigated, and tested. A new FOD item would immediately follow the educational attainment question on the ACS and would only be asked for those whose highest level of educational attainment was a bachelor's degree or higher. Such an item could ask about FOD for one or more degrees. The test for the SESTAT definition on educational background is conducted by looking at the entire educational history (all degrees held) to determine if one or more of them are in S&E or S&E-related degrees. In 2003, approximately 40% of the NSCG reported having one or more degrees at the bachelor's level or higher. Ideally, for the purpose of NSCG sampling, ACS would collect information about the FOD for all degrees.
After initial discussions with the U.S. Census Bureau, it became evident that there would only be room on the ACS questionnaire for a question on a single degree. Therefore, it was necessary to identify a single degree that could be collected, and NSF recommended that such an item focus on the FOD for an individual's bachelor's degree, not the most obvious approach given that the ACS educational attainment question requests information on each individual's highest degree of attainment. It might have been easier to implement an FOD question related to a person's highest degree. However, this would cause significant coverage problems for the NSCG. In the 2003 NSCG population, 17% of individuals whose first bachelor's degree was in an S&E or S&E-related field reported that their highest degree was in a non-S&E field. Only a small proportion (4%) of non-S&E bachelor's-degree holders reported that their highest degree was in an S&E or S&E-related field. By asking for the field of bachelor's degree instead of field of highest degree, fewer NSCG (and SESTAT) coverage problems were likely to result. Thus, for stratification purposes for sampling, asking about a person's bachelor's degree is the best choice if FOD for only one degree is possible. The sampling efficiency for SESTAT-eligible cases would be greater (i.e., there would be fewer cases sampled that were not SESTAT eligible) asking about FOD of the bachelor's degree rather than of the highest degree. Finally, for the purpose of analysis of the degree holders in any field, not just in S&E or related fields, data about the same degree level for all sample cases would likely be useful.
NSF recommended that the content of the FOD item collect information specifically about all degree fields for the bachelor's degree and not just those in S&E or related fields. Gathering information about all degree fields would make the information much richer for analytical purposes for a wide variety of users and would improve its value for NSF purposes as well. Such data can be used to compare patterns for those in S&E and S&E-related fields with those in non-S&E fields. Respondent accuracy in reporting degrees in S&E and S&E-related fields is likely to be better if there are specific categories for non-S&E fields, with examples, rather than simply a list of S&E or S&E-related categories and then a residual category labeled as "other" or "non-S&E."
Beginning in 2006, NCSES worked with the U.S. Census Bureau as well as two groups of academic researchers to develop and test alternative formats of an FOD question. Based on the preliminary research, two alternative formats of the question were developed and tested in the 2007 ACS Methods Test, completed in fall 2007. The result of the 2007 ACS Methods Test was to use an open ended question for FOD, which would be coded by Census. Details on the two question formats are provided later in this paper.
The evaluation of the 2007 ACS Methods Test for FOD did not reveal major problems with the FOD items. OMB approved adding an open ended question to the ACS in 2009.
NSCG Sampling with an ACS FOD Item
The addition of an FOD question on the ACS would affect the cost and efficiency of any of the NSCG design options outlined earlier for using the ACS as a sample frame. For example, regardless of the option, because it would be possible with the FOD item to mimic more closely the SESTAT target population with respect to educational background, the oversampling needed to find sufficient SESTAT-eligible cases in non-S&E occupational groups could be reduced substantially.
The potential for cost reductions and efficiency improvements throughout the decade will depend on how often and how extensively the ACS frame is used for drawing samples for the NSCG, on the format used to collect the FOD data, and on the accuracy of the FOD data. The sample size needed to obtain efficiency similar to the postcensal surveys in the past was not determined precisely until decisions were made and testing provided information on the quality of the FOD information collected on the ACS. However, it was reasonable to presume that the sample size of a once-a-decade sample could be cut substantially. Alternatively, the sample size could be maintained (or reduced somewhat less) to yield a larger in-scope sample, allowing better coverage and the ability to report for rare populations or small domains, such as race/ethnicity or sex in S&E occupations.
Major improvements that could result from having an FOD question to use for sampling for the NSCG, regardless of which option(s) for sampling were chosen, included the following:
Even with an FOD item, some level of screening of cases drawn from the ACS is necessary because the combination of the FOD question and occupation does not fully identify all SESTAT-eligible cases. For example, the FOD for the bachelor's degree does not allow for the identification of non-S&E bachelor's-degree holders with non-S&E occupations who have an S&E or S&E-related degree at the master's-degree level. Additionally, there may be some accuracy in reporting of the degree field or occupation (either type 1 or type 2 errors) on the ACS that could be verified with the NSCG follow-up survey. Furthermore, there is value to NSF to collect some data periodically for comparison purposes on those who are not scientists and engineers.
The use of an ACS sampling frame provides ample opportunities for variation in the NSCG survey design. Having an FOD question would enhance all four options for utilizing the ACS as a frame for the NCSG. The impact on each of the four options is discussed below.
Based upon the recommendation by CNSTAT after presenting the four options, NCSES chose option 3.
Technical Issues for Sampling from ACS
There are several issues that impact the use of the ACS for the NSCG sample design: the form of the FOD question and how swapped and imputed data for educational attainment and FOD are assigned in the ACS. Each is discussed below.
There were issues to be resolved regardless of the version of the question that was chosen. Some of these issues affect the version that was chosen; others affect the use of the data for sampling or analysis. Table 4 lists some of the issues of concern.
A sampling design for NCSG using the ACS with FOD could be crafted to produce little or no undercoverage in an initial sample drawn from the ACS. The form of the FOD question and the accuracy of the information provided impacts the gains in efficiency. For example, how accurate will the reports on the FOD item be for those reporting for others in the household (proxy reports) compared to those reporting for themselves? If the FOD and occupation items can be used to accurately distinguish scientists and engineers from other college graduates, substantial gains in efficiency were possible. For NSCG sampling purposes, the most important concern was whether a degree is accurately reported as falling into an S&E, an S&E-related, or a non-S&E category.
The accuracy of the FOD reporting will be evaluated after the first NSCG is conducted using the ACS. The information from the detailed education history collected as part of the NSCG from the individual (where there are no proxy reporters) can be compared to the information reported on FOD (and educational attainment) in the ACS. Analysis of the reinterviews in the 2007 Methods Panel testing of the two FOD items showed that responses to both versions of the FOD item were reliable and valid. However, validity on the full ACS sample is an open question.
Some number of cases apparently not meeting the criteria of being a scientist or engineer (a non-S&E bachelor's degree and a non-S&E occupation) would be drawn in the NSCG sample from the ACS frame both to provide a comparison group and to account for those in non-S&E occupations with a non-S&E bachelor's degree but an S&E or S&E-related degree at a higher level. It was advisable in drawing the first NSCG sample from the ACS to allocate part of the sample to test the efficiency of the FOD item for sampling purposes, either drawing a larger number of apparently non-S&E cases that might be done otherwise or drawing a portion of it using the long-form procedures without taking the FOD information into account.
The U.S. Census Bureau regularly uses a technique called swapping data to create public-use data sets (a decision based on the Bureau's overall disclosure policies). Swapping is done during the survey data processing. NCSES requested that the edited ACS file, before swapping, be used for weighting and creation of the NSCG sampling frame. Using swapped data would greatly reduce the stratification efficiency, especially when disproportionate stratified sampling is used to target precision levels for selected domains. The U.S. Census Bureau allowed use of unswapped data from the ACS for sampling for the 2010 NSCG.
Another technical concern was the use of imputed data from the ACS. Imputed educational attainment level data (the U.S. Census Bureau calls them allocated data) should not be used for sampling. Imputed data create an unacceptable amount of undercoverage of those with a bachelor's degree (estimated at 3% to 7%; see Finamore, Hall, and Fecso 2006) as well as sampling inefficiency (when those with an imputed education level of a bachelor's degree turn out not to have a bachelor's degree). Records that have imputed educational attainment level data were put aside prior to sampling, and a small sample of these ACS cases could be subsequently sampled to measure bias.
Adding an FOD question to the ACS could create an entirely new issue related to imputation. Given the relatively poor performance of the imputation methods for education level (the imputation performs much like the full-file, missing-at-random model), it is unclear how imputation should be done for missing FOD. For individuals with an S&E or S&E-related occupation, FOD imputation might perform well. For other occupations, it is not obvious that an acceptable imputation model can be developed. It may be that such cases will need to be treated as missing and reweighted. A program of research on imputation and nonresponse weighting for missing FOD is desirable.
 With a full year of data available from the ACS (2005), NCSES can begin to work with the U.S. Census Bureau to explore the use of the ACS without the FOD item for the NSCG (and for analysis). NCSES needed to use the ACS whether or not there was an FOD degree question.
 In the categorical version of the FOD question tested, only one set of S&E-related fields (health) can be captured accurately. In order to identify samples in other S&E-related fields, NSF had to sample some of the non-S&E FOD categories and some non-S&E occupations. For example, in order to find individuals with degrees in science or math teacher education (an S&E-related field), it was necessary to sample some individuals with bachelor's degrees in "Education or education administration" and some secondary teachers.