Using the American Community Survey as the Sampling Frame for the National Survey of College Graduates
Replacing the Decennial Long Form With the ACS as the NSCG Frame
The concept of a baseline postcensal NSCG, followed by subsequent panel follow-up surveys, has been used for the NSCG for a variety of reasons. Identifying and then locating the stock of scientists and engineers of interest are both difficult and expensive. Having identified them once through the initial baseline NSCG, it was most efficient to keep them in the NSCG throughout the decade. Additionally, this also provided some stability to the estimates being made. The other SESTAT surveys do provide some of the new flows of U.S.-educated scientists and engineers to the overall population (e.g., new bachelor's- and master's-degree SEH graduates from the RCG, and new SEH doctorates from SDR). The alternative to maintaining the NSCG postcensal sample was to draw a new sample every 2 to 3 years, but there was no benefit and a considerable cost to doing so. Additional screening surveys with large samples would be very expensive, and there would be no improvement in the coverage of the population because the sample frame (the decennial long form) would not change.
There are some populations covered by the NSCG that are covered by the other two SESTAT surveys. For example, the stock and flow of SEH doctorates are well covered by the SDR. Earlier sampling-frame research conducted by NCSES recommended that the SDR be maintained, given the small number of U.S. SEH doctorates in the NSCG and the great policy interest in SEH doctorates.
The SDR also has great value as a stand-alone survey, enabling longitudinal analysis of the careers of U.S. SEH doctorate holders. Use of the ACS as a frame for the NSCG does not change the value of and need for the SDR survey. ACS does not provide a sufficient sample of doctorate recipients, unless multiple ACS years are combined and such a combining of multiple years would nullify any of the quality-enhancing features of using the ACS. The desire for small domain estimates for these individuals (e.g., doctoral field by race/ethnicity by sex) and the readily available SED (a census of all U.S.-earned SEH doctorates) for a sampling frame makes continued use of a separate SDR survey a very efficient approach for the SEH doctorate domain.
As was the case with the long-form records, the ACS records can be stratified by households or persons with specific characteristics. Thus, the ACS can provide an efficient frame for follow-on surveys. The ACS provides a means to include in the NSCG frame scientists and engineers earning all their degrees abroad who then come to the United States and enter the labor force. Similarly, it can provide better coverage throughout the decade of non-S&E graduates working in S&E or S&E-related occupations, a shortcoming of the long-form sample design.
With no change in its survey content, the ACS can be used as a frame for the NSCG in several different ways. There is more flexibility in possible NSCG designs compared to the previous long-form frame, particularly given that current data throughout the decade will be available. The continuous survey approach of the ACS makes the following options (or some combination of them) possible for an NSCG sampling frame. Cost considerations are also important because the potential costs of pursuing the various options are likely to vary considerably.
Drawing new samples more frequently would also reduce (or eliminate) the longitudinal feature of the ACS. If the sample were redrawn every survey cycle, the NSCG would become a series of cross-sectional surveys. One result would be considerably more variation in the estimates from cycle to cycle than with the current longitudinal design. This phenomenon would be especially noticeable in important small-domain estimates, such as field by race/ethnicity estimates.
NSF sought advice about the strengths and weaknesses of each of these options, combinations of these options, the frequency of utilizing such options, and suggestions and reviews of any other options for using the ACS as a frame for the NSCG, both during a potential transition period as use of the ACS was being phased in, and on a longer-term basis. NSF welcomed suggestions or recommendations about any changes in the current design of using the coordinated SESTAT surveys to achieve coverage of the population of scientists and engineers that might be possible with the availability of the ACS as well as advice about any potential pitfalls or problems in using the ACS as a frame.
Given that the ACS surveys 250,000 addresses a month, most uses of the ACS for an NSCG frame will require aggregating multiple months of the ACS for a frame. The largest sample needed to be drawn from the ACS at any one time would be when the entire NSCG sample was to be redrawn at one time (Options 1 and 2 above)—sample sizes of 215,000 and 171,000 in 1993 and 2003, respectively. Much smaller sample sizes from the ACS might be needed at any one time under Options 3 and 4.
The ACS annual sample size is approximately 3 million housing unitsthat include 7.8 million people. In 2005, the ACS had a completion rate of 66%, which means ACS data being collected about some 2 million housing units, including 5.2 million persons, annually. NSF estimated that from this population, approximately 18.8% have a bachelor's degree or higher and are aged 75 years of age or under (SESTAT target population definitions), so approximately 978,640 cases would be eligible for the NSCG. This compares to 6.4 million cases that were eligible from the long form for the 2003 NSCG.
Based on analysis of the full-year 2005 ACS data, NCSES determined that 1 year of ACS samples (January to December) would contain enough cases to equal or surpass the size of past NSCG postcensal samples for some populations, but it was unlikely to have enough samples to equal the previous NSCG cell size for the more-rare populations (e.g., minority groups). At least 2 years of monthly samples might be necessary to provide sufficient coverage of many of these small-population groups. Because, under current procedures, the U.S. Census Bureau processes the ACS monthly samples on a calendar-year basis (12 months of sample are processed together after data collection has closed), sampling for the NSCG could require 2 years of ACS data if a completely new sample is drawn. If NCSES phased in the use of the ACS (e.g., by continuing to use some of the current 2000 decennial sample until the ACS provides sufficient sample for NSCG sampling), it would be possible to use 1 year of ACS samples initially.
The schedule for processing ACS data has implications for the reference date for the NSCG—and, thus, for the other two SESTAT surveys. A full calendar year (or years) of ACS data need to be available sufficiently in advance of the NSCG reference date to allow Census time to clean and weight the ACS data so as to be usable by the U.S. Census Bureau unit that does the sampling for the NSCG and to allow for sufficient time to select and prepare the NSCG sample for the field. To have ACS frame data that are as fresh as possible at the time the NSCG goes into the field, the ACS collection year must end about 8–10 months prior to the NSCG survey reference date. A fall NSCG reference date accomplishs this, and the reference date for the 2008 and 2010 SESTAT surveys was October 1. According to the U.S. Census Bureau, the 12-month calendar year ACS data are ready for use in sampling before the end of June of the following year. An October SESTAT reference date allows several months to process the files, stratify the frame, select the sample, and create the mailing records.
Such a time schedule has advantages in terms of the age of the data. Typically, there have been about 3 years between the reference dates for the long form and the NSCG postcensal survey. With an October reference date and a sample based on ACS monthly samples for the previous calendar year, some of the contact data would be less than 12 months old, and none would be older than 22 months. (If 2 years of ACS sample were used, only the oldest data would be similar in age to the long-form data.) Some sample cases will have moved between the time they were sampled in the ACS and the NSCG data collection, but there will be many fewer than in the postcensal surveys.
Pooling monthly ACS samples across multiple months creates some issues in estimation and determining NSCG/SESTAT eligibility. In the past, the postcensal NSCG eligibility was based on a sample with a single reference date (the date of the decennial Census). In the ACS, each monthly sample has a different reference date. This will require the NSCG to use a different strategy for determining eligibility. For example, degrees are conferred at many points during the year. For those newly earning a bachelor's degree during a particular ACS calendar year, their eligibility for the NSCG could depend on which month they were in the ACS sample, which could be before or after receiving their degree. This issue can be addressed using domain estimation techniques. The target population can be defined as those earning a bachelor's degree before the first month of the sequence of ACS sample months pooled to create the frame. A similar approach might be considered for immigrants where the target population could be defined as those in the United States at a defined cutoff date.
Using such a procedure would result in a very small proportion of sample members being screened out as ineligible during the NSCG. Using ACS data from a calendar year and a cutoff month of the preceding December, only a small number of sample cases would have received their first bachelor's degree after December but before the ACS sample cutoff month.
Using the ACS as the sampling frame for the NSCG also provides quality-enhancing opportunities over the previous long-form approach to improve the survey in several dimensions—timeliness, accuracy, relevance, and cost. Such opportunities reflect in part the availability of the frame—and, therefore, the fielding of the NSCG—much sooner after the frame data were collected than for the long form.
Being able to draw a sample and field the NSCG closer to the time the frame data were collected can reduce costs in several ways. A shorter time period between the frame and NSCG data collection reduces the likelihood of changes in eligibility status between the two dates, such as moving abroad or earning another degree, and should improve the ability to locate individuals for participation. With a shorter time gap for all or most of the sample between the ACS frame data and the NCSG reference date, a smaller fraction of the NSCG sample cases should have moved from where they were living at the time of the ACS compared to the long-form frame. Additionally, it should be easier to locate individuals who have moved within the United States when the time they have been gone from the previous address is shorter. Such factors should reduce the cost of locating, which should cut survey costs and possibility reduce time in the field.
As mentioned before, the NSCG historically has provided the stock of scientists and engineers near the beginning of the decade, while the RCG and SDR have captured the new flows of those receiving SEH degrees during the decade after the postcensal NSCG. A person who was sampled in the NSCG (or RCG) but subsequently earned another degree (bachelor's, master's, or doctorate) in an SEH field is eligible for inclusion in the RCG or SDR by virtue of that additional degree. To keep the frames for the three surveys mutually exclusive and to eliminate the possibility of double-counting these populations, all NSCG/RCG cases involving individuals earning another eligible degree after they were originally sampled in one of the surveys are considered out-of-scope cases for the integrated SESTAT data set. Reducing the number of such sample cases that are excluded from the integrated database increases the effective sample size and thus reduces variance slightly, improving accuracy. Alternatively, the actual sample size could be reduced while maintaining the effective sample size, which could reduce survey costs somewhat.
The annual availability of the ACS as a frame provides the opportunity to update the frame throughout the decade, which can reduce or eliminate coverage problems associated with the 10 years between updates (long-form frame, or Option 1 using the ACS). In addition to the possibilities of redrawing the entire sample (or a substantial portion of the sample) periodically (Options 2 and 3), there is the potential to refresh the sample for particular groups during the decade (Option 4). For example, additional samples could be drawn for groups that assume heightened interest during the decade (e.g., post-9/11 issues, end of the dot.com era). The frame also could be refreshed during the decade to capture aspects of the flow of scientists and engineers during the decade that are not captured by the RCG and SDR, such as foreign citizens entering the United States after the time period of the ACS frame (and who did not subsequently earn a U.S. SEH degree) or new graduates with non-S&E degrees who enter S&E or S&E-related occupations. Such an approach would improve coverage of the target population, increasing both the accuracy and the relevance of the SESTAT data.
 Currently, those with U.S. doctorates contained in the NSCG are not included in the SESTAT integrated database. U.S. doctorates are drawn from the SDR survey. However, sample cases in the NSCG who have doctorates from institutions outside the U.S. are included in the SESTAT integrated database.
 This is based on an average household size of 2.6, multiplied by the 3 million housing units surveyed in the ACS. Average household size was determined from the Census's American Fact Finder with data from the ACS for 2005.
 It is unlikely that 12 months of ACS data would be sufficient for approximately one-third of the aggregate sampling cells that NCSES tested based on analysis of the full year of 2005 ACS data. These aggregate cells combined minority groups and used fewer occupational categories than have been used in the past. Using the 2003 sampling cells, several more years of ACS sample would be required to produce sample sizes similar to those achieved with the 2003 NSCG design. The aggregate cells that NCSES tested were important because they form the basis for many of the domains for which estimates have been produced in the past and are possible to achieve with 2 years of ACS samples.
 It is desirable to work with the ACS data much earlier for NSCG sampling than June, but later availability of the data maybe less of an obstacle if more than 1 year of ACS sample is needed. If, for example, 2 years are needed, it may be possible to sample and field the NSCG in two waves—one based on the first of the 2 ACS years, which could be processed much in advance of the survey date, and the second, fielded slightly later, based on the second ACS year. In 2006, both the RCG and SDR were fielded in two waves for similar reasons—the late availability of the frame for part of the sample.