Section A. Technical Notes

Exhibits
Appendix A

Overview

The 1993 National Survey of Recent College Graduates (NSRCG:93) is sponsored by the National Science Foundation (NSF), Division of Science Resources Studies (SRS). The NSRCG is one of three data collections covering personnel and graduates in science and engineering. The other two surveys are the National Survey of College Graduates (NSCG) and the Survey of Doctoral Recipients (SDR). Together, they constitute the NSF's Scientists and Engineers Statistical Data System (SESTAT). These surveys serve as the basis for developing estimates and characteristics of the total population of scientists and engineers in the United States.

The first NSF-sponsored NSRCG (then known as New Entrants) was conducted in 1974. Subsequent surveys were conducted in 1976, 1978, 1979, 1980, 1982, 1984, 1986, 1988, 1990, and 1993. In the initial survey, data were collected only on bachelor's degree recipients, but all ensuing surveys included both bachelor's and master's degree recipients.

For the NSRCG:93, the school and graduate sampling was done by the Institute for Survey Research (ISR) at Temple University, and the survey collection, processing, weighting, and table production were conducted by Westat, Inc. A sample of 275 colleges and universities was asked to provide lists of eligible bachelor's and master's degree recipients. From these lists, a sample of 25,785 graduates (16,585 bachelor's and 9,200 master's) was selected. These graduates were interviewed between May and November of 1993. Computer-assisted telephone interviewing (CATI) served as the primary means of data collection. Mail data collection was used only for those who could not be reached by telephone. The unweighted response rate for institutions was 99 percent, and the unweighted response rate for graduates was 86 percent. The weighted response rates were 99 and 84 percent, respectively.

The NSRCG questionnaire was expanded and revised substantially by NSF for the 1993 survey. This revision was done in coordination with similar revisions to the other SESTAT surveys. Topics covered in the survey include:

Sample Design

The NSRCG used a two-stage sample design. In the first stage, a stratified nationally representative sample of 275 institutions was selected with probability proportional to size. There were 196 self- representing institutions, also known as certainty units. Measures of size were devised to account for the relative rareness of certain specialty and nonspecialty major fields of study. Universities with a high proportion of Hispanic, black, and foreign students were oversampled by doubling their measure of size. The 79 noncertainty institutions were implicitly stratified by sorting the list by ethnic status, region, public/private status, and presence of agriculture as a field of study. Institutions were then selected by systematic sampling from the ordered list.

Graduate Sample

The second stage of the sampling process involved selecting graduates within the sampled institutions by cohort. As a first step, each participating institution was asked to send lists of graduates to ISR. Within graduation year (cohort), each eligible graduate was then classified into one of 42 strata based on the graduate's major field of study and degree status. Table A-1 is a list of the major fields and the corresponding sampling rates by cohort and degree. These rates are overall sampling rates for the major field, so they include the institution's probability of selection and the within-institution sampling rates. To achieve the within-institution sampling rate, the overall rate was divided by the institution's probability of selection.

Graduate Eligibility

To be included in the sample, the graduates had to meet all of the following criteria:

Data Collection and Response

Prior to graduate data collection, it was first necessary to obtain the cooperation of the sampled institutions that provided lists of graduates. Since the sample included graduates from three time frames between 1990 and 1992, lists were collected from the institutions in three waves. The response rate for the institution collection was 99.4 percent.

Graduate data collection took place between May and November of 1993, with computer-assisted telephone interviewing as the primary means of data collection. Flyers were sent to all graduates announcing the study and asking for phone numbers at which they could be reached during the survey period. Extensive tracing of graduates was required to obtain the desired response rate. Tracing activities included computerized telephone number searches, national change of address searches (NCOA), school alumni office contacts, school major field department contacts, directory assistance, military locators, post office records, personal referrals from parents or others who know the graduate, and the use of professional tracing organizations.

Table A-2 gives the response rates by cohort, degree, major, sex, and type of address. The overall unweighted graduate response rate is 86 percent. The weighted response rate is 84 percent. The weighted overall or second-stage response rate is calculated as the school response rate times the graduate response rate (.994 x .841 = .836). As can be seen from table A-2, response rates varied somewhat by major field of study and by sex. Rates were lowest for those with foreign addresses.

Weight Calculations

To produce national estimates, the data were weighted. Weighting the data adjusted for unequal selection probabilities and for nonresponse at the institution and graduate level. In addition, a ratio adjustment was made at the institution level using the number of graduates reported in specified IPEDS categories of major and degree. The final adjustment to the graduate weights adjusted for those responding graduates who could have been sampled twice. For example, a person who obtained an eligible bachelor's degree in 1990 could have obtained an eligible master's degree in 1992 and could have been sampled for either degree. To make the estimates from the survey essentially unbiased, we modified the weights of all responding graduates who could have been sampled twice. The weights of these graduates were divided by 2.

The weights developed for the NSRCG:93 comprise both full-sample weights for use in computing survey estimates and replicate weights for use on variance estimation with a jackknife replication variance estimation procedure.

Data Editing

Most editing checks were included within the CATI system, including range checks, skip pattern rules, and logical consistency checks. Skip patterns were controlled by the CATI system so that inappropriate items were avoided. For logical consistency check violations, CATI screens appeared that explained the discrepancy and asked the respondent for corrections. Some additional logical consistency checks were added during data preparation, and all edit checks were rerun after item nonresponse imputation.

Imputation of Missing Data

Missing data occur if the respondent cooperated with the survey but did not answer one or more individual questions. The item nonresponse for this study was very low (typically about 1 percent) as a result of using CATI for data collection and data retrieval techniques for missing key items. However, imputation for item nonresponse was performed for each survey item to make the study results simpler to present and to allow consistent totals to be obtained when analyzing different questionnaire items. "Not applicable" responses were not imputed since these represented respondents who were not eligible to answer the relevant item.

Imputation was performed using a hot-deck method. Hot-deck methods estimate the missing value of an item by using values of the same item from other record(s) in the same file. Using the hot-deck procedure, each missing questionnaire item was imputed separately. First, respondent records were sorted by items thought to be related to the missing item. Next, a value was imputed for each item nonresponse recipient from a respondent donor within the same subgroup. The results of the imputation procedure were reviewed to ensure that the plan had been followed correctly. In addition, all edit checks were run on the imputed file to be sure that no data inconsistencies were created by imputation.

For a more detailed discussion of survey methodology, readers are referred to the NSRCG: 93 data file User's Manual.

Accuracy of Estimates

The survey estimates provided in these tables are subject to two sources of error: sampling and nonsampling errors. Sampling errors occur because the estimates are based on a sample of individuals in the population rather than on the entire population and hence are subject to sampling variability. If the interviews had been conducted with a different sample, the responses would not have been identical; some figures might have been higher, whereas others might have been lower.[1]

The standard error is the measure of the variability of the estimates arising from sampling. It indicates the variability of a sample estimate that would be obtained from all possible samples of a given design and size. Standard errors can be used as a measure of the precision expected from a particular sample. Tables A-3, A-4, A-5 and A-6 contain standard errors for key statistics included in the detailed tables.

If all possible samples were surveyed under similar conditions, intervals within plus or minus 1.96 standard errors of a particular statistic would include the true population parameter being estimated in about 95 percent of the samples. This is the 95-percent confidence interval. For example, the total number of 1991 bachelor's degree recipients majoring in engineering is 60,600 and the estimated standard error is 2,900. The 95 percent confidence interval for the statistic extends from

This means that one can be confident that intervals constructed in this way contain the true population parameter 95 percent of the time.

Estimates of standard errors were computed using a technique known as a jackknife replication. As with any replication method, jackknife replication involves constructing a number of subsamples (replicates) from the full sample and computing the statistics of interest for each replicate. The mean square error of the replicate estimates around their corresponding full sample estimate provides an estimate of the sampling variance of the statistic of interest. To construct the replications, 50 stratified subsamples of the full sample were created. Fifty jackknife replicates were then formed by deleting one subsample at a time from the full sample. WesVarPC, a public use computer program developed at Westat, was used to calculate direct estimates of standard errors for a number of statistics from the survey.

Generalized Variance Functions

Computing and printing standard errors for each estimate from the survey is a time-consuming and costly effort. For this survey, a different approach was taken for estimating the standard errors of the estimates reported in this report. First, the standard errors for a large number of different estimates were directly computed using the jackknife replication procedures described above. Next, models were fitted to the estimates and standard errors and the parameters of these models were estimated from the direct estimates. These models and their estimated parameters can now be used to approximate the standard error of an estimate from the survey. This process is called the development of generalized variance functions. Models were fitted for the two types of estimates of primary interest: estimated totals and estimated percentages.

It should be noted that the models used to estimate the generalized variance functions may not be completely appropriate for all estimates. When it is feasible, direct estimates of the standard errors should be computed using the replication method. This process is relatively simple since replicate weights and software such as WesVarPC are available.

Estimated Totals

For estimated totals, the generalized variance function applied assumes that the relative variance of the estimate (the square of the standard error divided by the square of the estimate) is a linear function of the inverse of the estimate. Using this model, the standard error of an estimate can be computed as

       (1)

where se(y) is the standard error of the estimate y, and a and b are estimated parameters of the model. The parameters of the models were computed separately for 1991 bachelor's and master's recipients and for 1992 bachelor's and master's recipients, as well as for other important domains of interest. The estimates of the parameters are given in Table A-7.

The following steps should be followed to approximate the standard error of an estimated total:

  1. obtain the estimated total from the survey,

  2. determine the most appropriate domain for the estimate from Table A-7,

  3. refer to Table A-7 to get the estimates of a and b for this domain, and

  4. compute the generalized variance using equation (1) above.
For example, suppose that the number of 1991 bachelor's degree recipients in engineering who were currently working in an engineering-related job was 40,000 (y = 40,000). The most appropriate domain from Table A-7 is engineering majors with bachelor's degrees from 1991 and the parameters are a = 0.000818 and b = 80.969. Approximate the standard error using equation (1) as

Estimated Percentages

The model used to approximate the standard errors for estimates of percentages was somewhat less complex than the model to estimate totals. The generalized variance for estimated percentages assumed that the ratio of the variance of an estimate to the variance of the same estimate from a simple random sample of the same size was a constant. This ratio is called the design effect and is often labeled the DEFF. Since the variance for an estimated percentage, p, from a simple random sample is p(100-p) divided by the sample size, the standard error of an estimated percentage can be written as

       (2)

where n is the sample size or denominator of the estimated percentage. DEFFs were computed separately for 1991 bachelor's and master's recipients and for 1992 bachelor's and master's recipients, as well as for other important domains of interest. The median or average value of the DEFFs from these computations are given in Table A-7.

The following steps should be followed to approximate the standard error of an estimated percentage:

  1. obtain the estimated percentage and sample size from the survey,

  2. determine the most appropriate domain for the estimate from Table A-7,

  3. refer to Table A-7 to get the estimates of the DEFF for this domain, and

  4. compute the generalized variance using equation (2) above.
For example, suppose that the percentage of 1991 bachelor's degree recipients in engineering who were currently working in an engineering-related job was 60 percent (p = 60) and the number of engineering majors from the survey was 1,907. The most appropriate domain from Table A-7 is engineering majors with bachelor's degrees from 1991 and the DEFF for this domain is 1.6. Approximate the standard error using equation (2) as

Nonsampling Errors

In addition to sampling errors, the survey estimates are subject to nonsampling errors that can arise because of nonobservation (nonresponse or noncoverage), reporting errors, and errors made in the collection and processing of the data. These errors can sometimes bias the data. The NSRCG:93 included procedures for both minimizing and measuring nonsampling errors.

Procedures to minimize nonsampling errors were followed throughout the survey. Extensive questionnaire design work was done by Mathematica Policy Research (MPR), NSF, and Westat. This work included focus groups, expert panel reviews, and a mail and CATI pretest. The design work was done in conjunction with the other two SESTAT surveys.

Strict training and monitoring of interviewers and data processing staff were conducted to help ensure the consistency and accuracy of the data file. Data collection was done almost entirely by telephone to help reduce the amount of item nonresponse and item inconsistency. Mail questionnaires were used for cases difficult to complete by telephone. Nonresponse was handled in ways designed to minimize the impact on data quality (through weighting adjustments and imputation). In data preparation a special effort was made in the area of occupational coding. All respondent-chosen codes were verified by data preparation staff using a variety of information collected on the survey and applying coding rules developed by NSF for the SESTAT system.

Although general sampling theory can be used to estimate the sampling variability of a statistic, the measurement of nonsampling error is not easy and usually requires that an experiment be conducted as part of the data collection, or that data external to the study be used. For NSRCG:93, two data quality studies were completed: (1) an analysis of interviewer variance, and (2) a behavioral coding analysis of 100 recorded interviews. The interviewer variance study was designed to measure how interviewer effects might have affected the precision of the estimates. The results showed that interviewer effects for most items were minimal and thus had a very limited effect on the standard error of the estimates. Interviewer variance was highest for open-ended questions.

The behavioral coding study was done to observe the extent to which interviewers were following the structured interview and the extent to which it became necessary for them to give unstructured additional explanation or comment to respondents. As part of the study, 100 interviews were taped and then coded on a variety of behavioral dimensions. This analysis revealed that on the whole the interview proceeded in a very structured manner with 85 percent of all questions and answers being "asked and answered only." Additional unstructured interaction/discussion took place most frequently for those questions in which there was some ambiguity in the topic. In most cases this interaction was judged to have facilitated obtaining the correct response.

Both the recorded interview and the variance study were used to identify those questionnaire items that might need additional revision for the next (1995) study cycle. A debriefing session concerning the survey was held with interviewers, and this information was also used in revising the survey for the 1995 cycle. In addition, results from a reinterview conducted by the Census Bureau for the NSCG were reviewed in this regard.

Comparisons of Data with Previous Years' Results

A word of caution needs to be given concerning comparisons with previous NSRCG results. For 1993, the SESTAT system underwent considerable revision in all areas, including survey eligibility, data collection procedures, questionnaire content and wording, and data coding and editing procedures.

Among the important changes for 1993 that may affect comparisons with previous years' survey results are the following:

Comparisons with U.S. Department of Education Data

In weighting the NSRCG: 93 data, ratio adjustments were made at the institution level to Integrated Postsecondary Educational Data System (IPEDS) estimates. However, because of the special NSF eligibility requirements and use of differing summary classification systems, the estimates given in these sets of tables do not correspond directly to tables reported for IPEDS. There are two major reasons for these differences: (1) the exclusions from the NSRCG of certain groups, primarily those living outside of the United States on the reference date and those over 75 years of age; and (2) the exclusion from the NSRCG sample of certain majors. It should also be noted that IPEDS is based on administrative records and NSRCG on respondent classification.

Other Explanatory Information

Coverage of tables. In this report's tables information is presented for the 1991 and 1992 bachelor's and master's degree cohorts (academic years 1990-91 and 1991-92). Information for the 1990 cohort was collected primarily for inclusion in the SESTAT longitudinal studies and hence did not cover an entire year, but only that part of the cohort not represented in the 1990 decennial census (those graduating from April 1990 to June 1990).

The following definitions are provided to facilitate the reader's use of the data in this report.

Major field of study: Derived from the survey major field category most closely related to the respondent's degree field. Exhibit 1 is a listing of the detailed major field codes used in the survey. Exhibit 2 is a listing of the summary major field codes developed by NSF and used in the tables. A listing of the eligible and ineligible major fields within each summary category appears in the appendix.

Occupation: Derived from the survey job list category most closely related to the respondent's primary job. Exhibit 3 is a listing of the detailed job codes used in the survey, and Exhibit 4 is a summary of the occupation codes developed by NSF and used in the tables.

Labor force: The labor force includes individuals working full or part time as well as those not working but seeking work or on layoff. It is a sum of the employed and the unemployed.

Unemployed: The unemployed are those who were not working on April 15 and were seeking work or on layoff from a job.

Involuntarily out of field: Those respondents who are involuntarily out of field either: (1) have a job not related to degree field and have indicated they took a job because suitable work in a degree field was not available, or (2) are employed part time and took part-time work only because suitable full-time work was not available.

Type of employer: This is the sector of employment in which the respondent was working on his or her primary job on April 15, 1993. In this categorization, those working in 4-year colleges and universities or university-affiliated medical schools or research organizations were classified as employed in the "4-year college and university" sector. Those working in elementary, middle, secondary, or 2-year colleges or other educational institutions were categorized in the group "other educational." The other sectors are private, for profit, self-employed, nonprofit organizations, federal government, and state or local government. Those reporting that they were self-employed but in an incorporated business were classified in the private, for-profit sector.

Primary and secondary work activities: These refer to activities that occupied the most time and the second-most time on the respondent's job. In reporting the data, those who reported applied research, basic research, development, or design work were grouped together in "research and development (R&D)." Those who reported teaching were given the code "teaching." Those who reported accounting, finance or contracts, employee relations, quality or productivity management, sales and marketing, or management or administration were grouped into "management, sales, administration." Those who reported computer applications were placed in "computer applications." Those who reported production, operation maintenance, or professional services or other activities were given the code "other."

Full-time salary: This is the annual income for the full-time employed who were not self-employed and who were not full-time students on the reference date (April 15, 1993). To annualize salary, reported hourly salaries were multiplied by 2080, reported weekly salaries were multiplied by 52, and reported monthly salaries were multiplied by 12. Yearly and academic-yearly salaries were left as reported.


Footnotes

1 A detailed discussion of nonsampling errors can be found at the end of this section starting on page 20.