Design Options for SESTAT for the Current Decade: Statistical Issues
Appendix: Discussion of Response Issues for SESTAT Redesign Options
The purpose of this document is to discuss some of the issues related to the four possible SESTAT redesign options that Westat was asked to investigate for the Division of Science Resources Studies of the National Science Foundation (NSF/SRS). It is not intended to provide a complete description of all considerations for each option, but instead focuses on operational issues related to locating and contacting sample members and obtaining sufficient response rates.
Redesign Option 1
Option 1 replicates the design used in the decade of the 1990s. It involves drawing a new sample of college graduates from the 2000 Census for the National Survey of College Graduates (NSCG). This would be supplemented throughout the decade with samples from the National Survey of Recent College Graduates (NSRCG). Response patterns and problems in the 2000 decade are expected to be similar to those experienced in the 1990 decade. Although many surveys have experienced declining response rates in recent years, the positive image of the NSF and Census Bureau is expected to help offset this trend. Since this option is expected to be very similar to the past decade. It will not be discussed further here, except to note that SRS could modify the sampling strategy based on the 1993 experience to more carefully target the population of interest, and may change the target population definition to reduce the required sample.
Redesign Options 2 and 3
Option 2 involves continuing with the current panels, with the NSRCG continuing to contribute the new domestic S&E bachelor's and master's degree population each cycle. The SDR will provide new domestic S&E doctorate earners, and the decennial census will contribute individuals with foreign-earned S&E degrees at all levels. Option 3 is a combination design, with half the sample following the option 1 design and half following the option 2 design. For both options 2 and 3, NSF is considering whether to include members of the original samples that were dropped due to nonresponse during the 1990s. This would be used as a way to reduce the potential nonresponse bias. Some of the issues involved in including nonrespondents from previous cycles are discussed below.
First, it is helpful to review the design of the studies involved. The original design for the 1990s involved adding a sample of respondents from the previous NSRCG New Graduate survey to the NSCG to represent that cohort in the followup survey. For example, in the 1993 cycle, the NSRCG consisted of a sample of individuals who earned new S&E bachelor's and master's degrees in the spring 1990, 1991, and 1992 academic years. In the 1995 survey cycle, these 1993 sample cases were moved into the NSCG sample frame; the 1995 NSRCG includes only recent U.S. S&E bachelor's and master's degree earners from the 1993 and 1994 academic years. This procedure of first contacting individuals with the NSRCG, and then later moving them to the panel sample frame was followed throughout the 1990s. These cases that were originally part of the NSRCG and later moved to the panel are collectively referred to as the NSRCG Panel.
During the 1990s, the NSCG was generally conducted only with sample members who responded to previous survey cycles. This procedure was not always followed for the cases that were part of the NSRCG Panel. Individuals in the NSRCG Panel had to respond to the NSRCG baseline survey to be included in the Panel frame the next year, but did not need to respond to the Panel followup survey to continue to remain in the frame. That is, once a sample member responded to the baseline, he/she was included in both Panel followup cycles. After two NSRCG followup cycles, the sample member became part of the NSCG and followed the rules for that survey.
When considering the tracing of nonrespondents from previous survey cycles, it is helpful to look at the cases that were sampled from the 1990 Census separately from those added from the NSRCG survey. There are three main differences between the two groups that have a significant effect on tracing. First, there are population differences. Most of those sampled from the Census are older and a greater fraction of these individuals are likely to have completed their education than those sampled as new graduates. As such, they may be easier to locate than recent graduates who have not yet established a permanent or semi-permanent address. Second, for those sampled from the Census, we started with a confirmed address that we know is where the sample member lived in 1990. For those sampled as new graduates, we started with information provided by the sampled colleges and universities. This information varies widely in terms of completeness and timeliness and may only include where the sample member lived while attending college. Third, all cases sampled from the Census were living in the U.S. in 1990, while those sampled as new graduates include non-resident aliens who may have left the country after graduation. Therefore, different decisions may be made for handling nonrespondents for the two groups.
Cases sampled from the 1990 Census. For this group, we should first consider nonrespondents to the baseline (1993) survey. Since this baseline survey was used to identify the eligible sample members, it would be very expensive to go back to nonrespondents from the 1993 survey. This would involve contacting a large number of sample members with unknown eligibility status where no contact attempts have been made since 1993. Instead, we expect that NSF will consider contacting those who responded to the baseline but did not respond to the followup surveys. But it should be pointed out that most of the nonresponse occurred during the baseline survey, with unweighted response rates of 78 percent for the baseline and 90-94 percent for each followup cycle.
Another consideration in contacting previous nonrespondents is the type of nonresponse. We can group nonrespondents into the three broad categories of refusal, non-locatable, and other (including those who were ill or temporarily absent; wrong sample persons; unable to contact despite repeated attempts; or those who were contacted, but for which critical data items were missing). Among the baseline NSCG sample members who were in the 1997 NSCG, there were 2,630 nonrespondents, which was 6 percent of this portion of the sample. Among these nonrespondents, 61 percent were refusals, 18 percent non-locatable, and 21 percent other. While this might seem to indicate that refusal conversion is a greater challenge than tracing, tracing generally becomes more difficult as time elapses, while refusal conversion may become easier. Cases that refused in the early survey cycles may become tracing problems in the 2003 cycle. Therefore, both refusal conversion and tracing are important considerations for contacting nonrespondents to previous cycles.
The 1993 NSCG obtained a weighted response rate of 80 percent with a mail survey followed up by CATI and a personal interview. In 1995, a decision was made to follow up on only the respondents to the 1993 survey, as locating costs for following nonrespondents would have been prohibitive. Additionally, at the time, it was not expected that the baseline NSCG sample would be contacted after the 1990s decade, so the diminishing cumulative response rate and potential nonresponse bias that was a result of only following respondents was not expected to have a significant impact on the SESTAT system as a whole. It also saved the cost of locating and contacting nonrespondents.
As previously stated both refusal conversion and tracing are important to consider for contacting nonrespondents. No tests were conducted to determine the estimated success rate or cost of additional refusal conversion activities if refusals are added back into the NSCG sample. However, a tracing test of nonrespondents to the 1995 NSCG was conducted by the Census Bureau under NSF's direction, as explained below.
A simple random sample of 25 cases that responded to the 1993 NSCG baseline but did not respond to the 1995 NSCG were included in the tracing test. The sample was selected from a frame of people who were un-locatable or could not be contacted for the 1995 NSCG. The test used only non-invasive searches, so the new listings were not contacted to determine if the correct individual had been found; that is, sample person verification was not conducted. Therefore, all the address and phone number listings found are unconfirmed. Most searches were done using FastData, an address source using United States Postal Service (USPS) National Change of Address database and other information that provides address, phone number, date of birth or age range, persons living in that residence, and length of time at that address. Other searches were done using PowerFinder on CD-Rom and the Internet. A summary of the results for address and telephone number searches is shown in table 1 below.
As seen above, of the 25 cases in the test, unconfirmed address listings were found for 18 (72 percent). Only 7 of these (28 percent of the total) could be matched on birth date. Unconfirmed phone number listings were found for about 40 percent of the test cases and about one-third (32 percent) of the cases had non-published phone numbers. While finding address listings for 72 percent of the test cases is encouraging, it should be noted that a number of these are expected to be outdated addresses where the sample member no longer lives, and some will be listings for someone else with the same name as the sample member.
Cases sampled from the NSRCG Baseline and included in the Panel sample. For this group, we should consider nonrespondents separately by whether they were nonrespondents in the baseline survey (1993, 1995, 1997 or 1999 NSRCG baseline survey) or did not respond to the Panel followup survey. We can also classify them by type of nonresponse. Each of the baseline surveys has resulted in a similar distribution of nonrespondents by type. In the 1997 NSRCG baseline, there were 2,573 nonrespondents, 18 percent of the sample. Among these nonrespondents, 37 percent were refusals, 50 percent non-locatable, and 13 percent were other. If NSF decides to include nonrespondents to past NSRCG baseline surveys in the 2003 sample, both tracing and refusal conversion activities are important, with tracing issues predominating. Since no followup survey has been conducted that included nonrespondents to the NSRCG baseline surveys, a tracing test was conducted by the Census Bureau, as described below.
This test included 40 cases that did not respond to the 1995 NSRCG. Since every NSRCG is a baseline survey, these tracing test cases never responded to any survey cycle. The sample was selected from a frame of 1,762 people who were either un-locatable or their household could never be contacted to confirm the graduate lived there in the 1995 cycle. The sample of 40 includes 10 people with foreign addresses provided by the school, 10 with no address provided by the school, and 20 with one or more U.S. addresses at the time of sampling. The sample was drawn using three sampling categories based on the type of address provided on the school sampling list in the 1995 cycle (no address, foreign address, or U.S. address provided). Since different sampling rates were used for the different categories, the tables below include weighted totals and weighted percents. Please note that these weights reflect the sampling for the tracing test only. That is, the weighted total of 1,762 is the number of 1995 NSRCG sample members (nonrespondents) eligible to be included in the tracing test. The weight for the "No address provided" category is 38.4 (384 in frame/10 in sample), for the "Foreign address provided" category the weight is 17.1 (171 in frame/10 in sample), and for the "U.S. address provided" category the weight is 60.35 (1,207 in frame/20 in sample).
The same tracing procedures were followed for this group that were followed for the NSCG tracing test. Both tests used only non-invasive searches, so the new listings were not contacted to determine if the correct individual had been found. A summary of the results for addresses and phone numbers is listed below, with table 2 showing the results of address searches and table 3 showing the results of telephone number searches.
Looking at the weighted percents for address searches in table 2 , we can see that unconfirmed address listings were found for about half (51 percent). However, only 10 percent of the total could be matched on birth date. As expected, results varied by type of address provided on the initial sampling list. For cases with no address or foreign address provided, none of the cases in the test sample could be matched on birth date and only 2 cases in each category had an address listing found.
Table 3 shows that un-confirmed phone number listings were found for 23 percent of the cases (with 3 percent matched on birth date) and 27 percent of the cases had non-published phone numbers. None of the cases in the "no address" and "foreign address" categories could be matched by birth date and only 1 un-confirmed telephone number listing was found in each category. It should also be noted that for both addresses and telephone numbers, a number of these listings are expected to be outdated addresses/telephone numbers where the sample member no longer lives, and some will be listings for someone else with the same name as the sample member.
It is interesting to compare the results of this tracing test (for nonrespondents to the NSRCG baseline) with the tracing test conducted for cases that responded to the 1993 NSCG baseline but did not respond to the 1995 NSCG, as discussed in the previous section. The NSCG tracing test involved searching for sample members for the fourth time (first for the 1990 Census, second for the 1993 NSCG, third for the 1995 NSCG, and fourth for the tracing test). The first two contacts resulted in successfully locating the individual and obtaining his/her cooperation. Both of these contacts provided a confirmed address and confirmation of the sample member's identify. The third contact attempt (for the 1995 NSCG) resulted in a nonresponse of either un-locatable or unable to contact. In contrast, the NSRCG tracing test conducted searches for sample members that had never been contacted. For some of these cases, no good address was ever obtained from the sampled college/university. These differences in test samples are highlighted by the different tracing test results, with no address listings found for 28 percent of the NSCG tracing cases and 49 percent of the NSRCG cases. In addition, 28 percent of the NSCG cases had an address listing matched by date of birth while 10 percent of the NSRCG cases could be matched by birth date.
While the NSRCG tracing test gives us information about tracing nonrespondents from the baseline NSRCG survey, we also need to look at the problems associated with tracing nonrespondents to NSRCG Panel followup surveys. In considering this group, we can use the example of cases that responded to the 1995 NSRCG baseline survey, were sampled for the 1997 Panel followup, but did not respond in 1997. These cases were included in the 1999 Panel followup survey, since they responded to the baseline survey. There were a total of 1,191 cases in this group (8 percent of the 1999 followup sample), which can be broken down by the type of 1997 nonresponse as follows: 46 percent refused, 20 percent were not located, 23 percent were other nonresponse, and 11 percent were temporary ineligibles. As would be expected, the response rates in the 1999 survey cycle vary by the type of nonresponse in the 1997 followup survey. Table 4 shows the distribution of cases by 1997 and 1999 survey response category and the 1999 cycle response rates by 1997 response category.
Among the three types of 1997 cycle nonrespondents (refused, not located, and other nonresponse), the refusals are the biggest group and have the lowest response rate (27 percent). For the un-locatable and other nonresponse categories, less than half responded. The cases that were temporarily ineligible in 1997 (mostly living out of the U.S.) had a high response rate in 1999, as most of them continued to be ineligible.
Redesign Option 4
For this option, the NSRCG Panel samples selected during the 1990s would be supplemented by new samples drawn from the original sampling frames. Since there is no viable sampling frame available for the 1993 NSRCG,* these new sample selections would begin with the 1995 NSRCG. There are two main areas of consideration for this. First, there is the issue of college/university cooperation and confidentiality. The second is the operational issues involved in using old sampling lists up to eight years old. Each of these areas is discussed below.
Colleges and universities were asked to provide sampling lists for each NSRCG survey cycle. All materials sent to the colleges implied that the lists were to be used for that one survey cycle, as was intended at the time each list was collected. Although no specific promises were made to colleges about the use of their sampling lists for later survey cycles, most colleges would not expect that their lists would be kept and used years later. The NSRCG confidentiality plan states, "At the close of each study, survey materials are placed in secure storage for a period of 3 years…After this period has lapsed, the materials are disposed of." If this option is chosen, we suggest that NSF consider contacting the sampled colleges to ask their permission to use the old sampling lists they provided. While we expect that most colleges will give permission, some may be concerned that their sampling lists were kept for such a long time period.
The second issue involves the operational steps necessary to restore the old sampling lists and select the new samples. Lists were provided by colleges in either computer files or on paper. For lists provided on paper, the sampling information was keyed for all eligible graduates and was included in the sampling frames along with the computer lists. Therefore, it is possible to draw a new sample from both paper and computer lists once the sampling frame files and documentation have been restored. However, all identifying and locating information from paper lists was keyed after sampling, only for the sampled graduates. This means that if a new sample is drawn, it must be matched back to the paper list by ID number and the information for the new sample must be keyed from these old paper lists. As would be expected, the percent of sampling lists provided by paper varies by cycle, with 26 percent for 1995, 17 percent for 1997, 15 percent for 1999, and 9 percent for 2001. Time and money must be allowed for processing the information from these paper lists if this option is chosen.
In addition to the sampling issues, there are considerations about locating and contacting graduates who have never been contacted in earlier cycles. These issues are similar to those discussed for option 2 and 3 for contacting nonrespondents. Once the new samples are selected, the alumni offices at the sampled schools will be asked to provide updated locating information. We expect that additional substantial tracing will be needed to locate these graduates. In the 1997 NSRCG baseline survey, only 28 percent of the sample were interviewed at the address provided by the schools on the sampling lists or provided by the alumni offices.
* The contractor that drew the 1993 NSRCG sample is no longer involved with the SESTAT studies and is not expected to have maintained the sampling frames for this length of time (since they were not project deliverables). In particular, the lists sent on paper are not expected to be available.