These technical notes on the 2003 National Survey of Recent College Graduates (NSRCG) include information on sampling and weighting, survey methodology, sampling and nonsampling errors, as well as discussions on data comparisons to previous cycles of the NSRCG and the Integrated Postsecondary Education Data System (IPEDS) data. For a more detailed discussion of survey methodology, refer to the 2003 NSRCG Methodology Report and to the National Science Foundation's (NSF's) Division of Science Resources Statistics (SRS) website at http://www.nsf.gov/statistics/.
NSRCG is sponsored by NSF's Division of Science Resources Statistics (SRS). NSRCG is one of three SRS data collections covering personnel and graduates in science, engineering, and health fields. The other two surveys are the National Survey of College Graduates (NSCG) and the Survey of Doctorate Recipients (SDR). Together, they constitute NSF's Scientists and Engineers Statistical Data System (SESTAT). These surveys serve as the basis for developing estimates and characteristics of the total population of scientists and engineers in the United States.
The first NSF-sponsored NSRCG (then known as the New Entrants Survey) was conducted in 1974. Subsequent surveys were conducted about every 2 years. The initial survey collected data on only bachelor's degree recipients, but all subsequent surveys included both bachelor's and master's degree recipients.
For the 2003 NSRCG, a sample of 300 colleges and universities was asked to provide lists of eligible bachelor's and master's degree recipients. From these lists, a sample of 18,000 graduates (13,061 bachelor's and 4,939 master's degree recipients) was selected. These graduates were interviewed between October 2003 and July 2004. Data for the survey were collected using three data collection modes: mail, Web, and computer-assisted telephone interviewing (CATI). The weighted response rates were 98.7 percent for institutions and 63.3 percent for graduates.
The NSRCG questionnaire underwent revisions for the 2003 survey. All revisions were done in coordination with revisions to the other SESTAT surveys. Following recommendations resulting from the pretest, NSF made several modifications to the 2003 data collection instrument. Although no questions from the 2001 NSRCG survey were deleted in the 2003 version, some were added to the sections on employment and work-related experiences. Other questions were modified to promote more accurate reporting or better understanding of the question.
Data on recent graduates with bachelor's and master's degrees in health fields were collected for the first time in the 2003 NSRCG data collection effort. Users of the data on health fields are urged to use caution when reviewing these data because graduates in health fields who did not respond to the mail questionnaire were not followed up during the Web and CATI stages of the data collection. Consequently, this population has lower response rates than graduates who received degrees in other fields of study.
Following is a list of questions that were added or modified during 2001 through 2003.
B17 (Type of academic position(s) held in principal job). This question was added to gain a better understanding of the types of positions held by employees working in the academic sector.
B18 (Faculty rank). Formerly asked only in the SDR, this question was added to NSRCG and NSCG in 2003 to account for sample members who may hold faculty rank. The presence of this question in all three surveys will promote consistency among the three surveys.
B32 (Overall satisfaction with principal job). This question was intended as a follow-up to the previous question, which asks about satisfaction with specific aspects of the principal job.
B36/B37 (Federal support through grants and contracts). These questions were added to NSRCG in order to better understand the role of federal support in the work of all scientists and engineers, and to promote consistency with the other SESTAT surveys.
B38 (2002 income). This question was added to the 2003 NSRCG to be consistent with the other SESTAT surveys.
C1, C2, C3 (Publications/patents). These questions were added to the 2003 NSRCG because they provide one of the few measures of work productivity and they are also found in the other SESTAT surveys.
A17/A18 (Degree grid/financing for degrees). Instead of asking about financing for each degree, the financial support questions were removed from the degree grid. Data were collected for undergraduate and graduate degrees, consistent with data gathered for doctorates.
Field of study verbatim and self-code was separated into two questions so that the verbatim was reported before the self-code.
Location of the degree grid in the questionnaire was changed to promote a more natural flow in the questionnaire.
A9–A16 (Enrollment during reference week) and A20–A23 (Enrollment between last degree earned and reference week). Previous versions of the questionnaire asked respondents to report on their educational activities between their most recent degree and the reference week, during the reference week, after the reference week, and in the future. In 2003, NSF dropped the questions about educational activities after the reference week and in the future. The order of the education questions was changed to promote better flow. The 2003 questionnaire first asked about educational activities during the reference week (A9–A16), then about all degrees earned as of the reference week (degree grid), and finally about educational activities between the most recent degree reported in the degree grid and October 1.
A19 (Money borrowed to finance degrees and money still owed). The question was changed from an open-ended response format to a "mark one answer" response format.
B1 (Working for pay or profit). Instructions were simplified and separated from the question stem.
B5, B20 (Job description verbatim). Extra lines were added to allow longer open-ended answers.
B11 (Employer's main business). Lines were added to report department/division and street address.
B12 (Employer size). The final two response categories were changed; the last was broken into two categories.
B14 (Employer type). The self-employed response category was moved to the top of the list.
B16 (Type of educational institution). The preschool, elementary, and middle school response category was combined with the secondary school system response category.
B24 (Relationship between principal job and highest degree). The wording of this question was simplified.
B27 (Primary work activities). The order of response categories was changed. Wording of response choices for employee relations, managing/supervising, and production was modified based on "other, specify" responses from previous survey cycles.
B35 (Salary). This question had been divided into two parts in previous NSRCG survey cycles to maintain consistency between the paper and CATI instruments. In 2003, the two parts were combined into one question to promote consistency with the NSCG and SDR paper questionnaires.
bD1 (Marital status). An extra response category was added for "living in a marriage-like relationship."
D8–D11 (Citizenship/residency). Permanent residents of the United States were asked to report the year they attained permanent residency status.
NSRCG used a two-stage sample design. In the first stage, a stratified nationally representative sample of 300 institutions was selected. The first-stage sample was drawn in two steps. In the first step, certainty institutions were identified from the list of all institutions; all certainty institutions were included in the sample. In the second step, noncertainty institutions were sampled from the list that did not include the certainty institutions. For each institution, the measure of size, a composite related to cohort (two groups: 2000–2001 and 2001–2002 academic years), degree type (bachelor's and master's), major (21 fields of study), race/ethnicity (non-Hispanic whites, non-Hispanic Asians and Pacific Islanders, and underrepresented minorities—blacks, Hispanics, and American Indians/Alaska Natives). Eighty-five self-representing or certainty institutions were identified and included in the sample in the first step. The remaining noncertainty institutions on the list were implicitly stratified by sorting the list by type of control (public, private), region (Northeast, Northwest, Southeast, Southwest), and the percentage of degrees awarded in science, engineering, or health fields of study. Two hundred fifteen noncertainty units were selected by systematically sampling from the ordered list with probability proportional to size in the second step.
The second stage of the sampling process selected science, engineering, and health graduates (within the sampled institutions). Each sampled institution was asked to provide lists of graduates for sampling. Within graduation year (cohort), each eligible graduate was then classified into one of 504 sampling strata based on the cross classification of the following variables:
Table A-1 lists the major fields and corresponding sampling rates. These rates are overall sampling rates for the major field, by cohort. To achieve the within-institution sampling rate, the overall rate was divided by the institution's probability of selection. The sampling rates by stratum were applied within each eligible, responding institution, and resulted in sampling 17,952 graduates. One academic institution insisted on selecting its own sample and returned a sample of 48 graduates. The 48 graduates from that school and the 17,952 graduates selected from the 295 participating schools provided the total sample of 18,000 graduates.
Table A-1 Source Data: Excel file
To be included in the sample, graduates had to meet all of the following criteria:
Before collecting data from graduates, it was first necessary to obtain the cooperation of the sampled institutions that provided lists of graduates. Of the 300 sampled institutions, 296 provided lists of graduates for sampling in the 2003 NSRCG and 4 did not provide graduate lists. The institutional list collection had a 98.7 percent unweighted response rate and a 97.2 percent weighted response rate.
Graduate data collection took place between October 2003 and July 2004; mail questionnaires were the initial mode of data collection, followed by CATI and an Internet-based Web instrument. Advance letters were sent to all selected graduates announcing the study and requesting phone numbers where they could be reached during the survey period. Before the data collection process could begin, extensive efforts to locate the graduates were required. Student contact information had to be obtained from educational institutions; once the information was collected, names, addresses, and telephone information were sent to an address review and updating service. Additional locating activities included use of computerized telephone number searches, national change of address searches, school alumni office contacts, school major field department contacts, Internet searches, directory assistance, military locators, post office records, personal referrals from parents or others who knew a graduate in question, and professional tracking organizations.
Table A-2 gives the response rates by cohort, degree, major, type of address, sex, and race/ethnicity. The overall unweighted graduate response rate was 68.1 percent; the weighted response rate was 67.1 percent. As can be seen from table A-2, response rates varied somewhat by graduate characteristics. Rates were lowest for graduates identified on the school sampling lists as non-U.S. residents. It is possible that many unlocated persons listed as non-U.S. residents were actually ineligible for the survey because they lived outside the United States during the survey reference week. However, a graduate was only classified as ineligible if his or her ineligibility status could be confirmed.
Table A-2 Source Data: Excel file
To produce national estimates, the data were weighted. The weighting procedures adjusted for unequal selection probabilities, for nonresponse at the institution and graduate level, and for duplication of graduates on the sampling file (graduates in both cohorts). In addition, a ratio adjustment was made at the institution level, using the number of degrees awarded as reported in IPEDS for specified categories of major and degree level. Because this adjustment was designed to reduce the variability associated with sampling institutions, it was not affected by the differences in target populations between NSRCG and IPEDS at the person level. These differences between NSRCG and IPEDS are discussed in a later section of these notes under the section "Comparisons With IPEDS Data." The final adjustment to the graduate weights adjusted for responding graduates who could have been sampled twice. For example, a person who obtained an eligible bachelor's degree in 2001 could have obtained an eligible master's degree in 2002 and could have been sampled for either degree. To make the estimates from the survey essentially unbiased, the weights of all responding graduates who could have been sampled twice were divided by 2. The weights of the graduates who were not eligible to be sampled twice were not adjusted.
Two weights were developed for the 2003 NSRCG: full NSRCG sample weights for use in computing survey estimates, and replicate weights for variance estimation using a jackknife replication variance estimation procedure.
Editing checks were included within the CATI and web systems, including range checks, skip pattern rules, and logical consistency checks. Skip patterns were controlled by the CATI and web systems so that inappropriate items were avoided and appropriate items were not missed. For logical consistency check violations, CATI and web screens appeared that explained the discrepancy and asked the respondent for corrections. All edit checks discussed previously were rerun after data collection and again when item nonresponse imputation was completed.
Post data collection editing was also conducted on the data collected in the NSRCG. Standard editing procedures were specified by NSF through the issuance of "SESTAT Editing Guidelines," which were distributed to all SESTAT contractors to ensure consistent application of editing rules across the three SESTAT surveys. The majority of editing at this stage involved correcting range, skip, and consistency errors, as well as other general violations, such as multiple responses to "Mark One" questions.
Missing data occurred if the respondent cooperated with the survey but did not answer one or more individual questions. The level of item nonresponse in this study was generally low for most questions. However, imputation for item nonresponse was performed for each survey item to make the study results simpler to present and to allow consistent totals to be obtained when analyzing different questionnaire items. "Not applicable" responses were not imputed because they represented respondents who were not eligible to answer the given item.
Imputation was performed using a hot-deck method. Hot-deck methods estimate the missing value of an item by using values of the same item from other record(s) in the same file. Using the hot-deck procedure, each missing questionnaire item was imputed separately. First, respondent records were sorted by items thought to be related to the missing item. Next, a value was imputed for each item nonresponse recipient from a respondent donor within the same subgroup. The results of the imputation procedure were reviewed to ensure that the plan had been followed correctly. In addition, all edit checks were run on the imputed file to be sure that no data inconsistencies were created in the imputation process.
The survey estimates provided in these tables are subject to two sources of error: sampling and nonsampling errors. Sampling errors occur because the estimates are based on a sample of individuals in the population rather than on the entire population and hence are subject to sampling variability. If the interviews had been conducted with a different sample, the responses would not have been identical; some figures might have been higher, while others might have been lower.
If all possible samples were surveyed under similar conditions, intervals within plus or minus 1.96 standard errors of a particular statistic would include the statistic computed from all members of the population in about 95 percent of the samples. This is the 95 percent confidence interval. For example, suppose the estimate of the total number of 2001 and 2002 bachelor's degree recipients majoring in engineering is 109,247 and the estimated standard error is 2,536. In this case, the 95 percent confidence interval for the statistic would extend from
Estimates of standard errors were computed using a technique known as jackknife replication. As with any replication method, jackknife replication involves constructing a number of subsamples (replicates) from the full sample and computing the statistics of interest for each replicate. The mean square error of the replicate estimates around their corresponding full sample estimate provides an estimate of the sampling variance of the statistic of interest. To construct the replicates, 108 stratified subsamples of the full sample were created. One hundred and eight jackknife replicates were then formed by deleting one subsample at a time from the full sample.
Generalized variance functions (GVF), an alternative to direct estimate, provides users with a simple, fast tool for estimating variances. When users do not have access to the software required for direct variance estimation, they can predict the variance for the 2003 NSRCG estimates by using the GVF models. This method, however, is limited because it can be used only for estimates of totals and percentages of individuals with certain characteristics of interest. Several steps are involved in using GVFs to estimate the standard errors of the estimates. First, the standard errors for a large number of different estimates were computed directly by using the jackknife replication procedures described previously. Next, models were fitted to the estimates, and standard errors and the parameters of these models were estimated from the direct estimates. These models and their estimated parameters were used to approximate the standard error of an estimate from the survey. Models were fitted for the two general types of estimates of primary interest: estimated totals and estimated percentages. Domains were determined for which separate GVF models were needed. For the 2003 NSRCG, models were fitted separately for the entire graduate population and S&E only (excluding health-related fields) by degree (for example, bachelor's and master's). Within each group, parameters were estimated by sex, major field, occupation, and race/ethnicity. Tables A-3 and A-4 show the estimates of the parameters.
Table A-3 Source Data: Excel file
Table A-4 Source Data: Excel file
Let denote an estimator of the population total Y. GVF models usually are created for the relative variance of an estimated total , or
Many empirical works have favored the relative variance model in equation (1), for which Wolter (1985)
provides some justification. To determine the most promising model, an empirical investigation of the model (2)
was conducted and its equivalent form:
The fitting method that is chosen can result in different
estimates for model coefficients, and thus, for differences
in the resulting GVF variance predictions. After
examining the residuals, outliers were excluded from
the fitted model and refit the GVF model. Models (1)
and (2) were evaluated and it was determined that the
model in (2) was the better choice. Based on this estimated
final model, the standard error of an estimated
total can then be predicted by evaluating the model
with and the model parameter estimates a and b. The
final model used for the 2003 NSRCG is (2) above. Thus,
with , a and b, which are estimates of the model parameters
α and β from GVF model (2), the predicted
standard error (which is the square root of the variances)
can be calculated as:
where se() is the predicted standard error of the estimated total .
To use the NSRCG GVF, the following steps should be followed to approximate the standard error of an estimated total:
For example, suppose that the number of bachelor's
or master's degree recipients in engineering is 130,759
(= 130,759). The most appropriate domain from table A-3 is engineering majors. For this domain, the parameters
Two GVF methods were investigated for estimating
generalized standard errors of percentage estimates. In
Method 1, the parameter estimates from the GVF model for
totals were used to predict the standard error of an estimated
percentage. With model (2) used for totals, an approximate
standard error for the estimated percentage is:
Unlike Method 1, which was based on regression estimation, Method 2 produces generalized standard errors directly for percentages. Model (2) assumes the design effect (i.e., the ratio of the variance of an estimate to the variance of the same estimate from a simple random sample) is a constant within each domain. Generalized standard errors were then computed by using a domain-specific average design effect associated with a
range of statistics for each cell. Because the variance for an estimated percentage from a simple random sample
where n is the sample size for the corresponding domain and ADEFF is the average design effect (Bieler and Williams 1990).
For the 2003 NSRCG, design effects were computed separately for each domain. The average values of the design effects from these computations are shown in table A-3 and table A-4. Although users can use both methods to predict standard errors for percentage estimates, empirically investigating the methods suggested that Method 2 as presented in equation (5) above is the better choice.
The following steps should be followed to approximate the standard error of an estimated percentage using Method 2:
For example, suppose the percentage of unemployed
was 17 percent (= 17) and the total number of S&E in
the survey sample was 10,831 (n=10,831). The most appropriate
domain from table A-3 is all graduates, and the
ADEFF for this domain is 1.7. Approximate the standard
error using equation (5) as:
In addition to sampling errors, the survey estimates are subject to nonsampling errors that can arise because of nonobservation (nonresponse or noncoverage), reporting errors, and errors made while collecting and processing data. These errors can sometimes bias the data. The 2003 NSRCG included procedures specifically designed to minimize nonsampling errors. In addition, some special studies conducted during the previous cycles of NSRCG provided some measures of nonsampling errors that are useful in understanding the data from the current survey as well.
Procedures to minimize nonsampling errors were followed throughout the survey. Extensive questionnaire design work was done by Mathematica Policy Research. This work included focus groups, expert panel reviews, and mail and CATI pretests. This design work was done in conjunction with the other two SESTAT surveys.
Comprehensive training and monitoring of interviewers and data processing staff helped to ensure the consistency and accuracy of the data. Data collection was done almost entirely by telephone to help reduce the amount of item nonresponse and item inconsistency. Nonresponse was handled in ways designed to minimize the impact on data quality (through weighting adjustments and imputation). In data preparation, a special effort was made in the area of occupational coding. Respondent-chosen codes were verified by data preparation staff using a variety of information collected on the survey and applying coding rules developed by NSF for the SESTAT surveys.
Although general sampling theory can be used to estimate the sampling variability of a statistic, measuring a nonsampling error is not easy. Usually it requires conducting an experiment as part of the data collection, or using data external to the study. In the 1995 NSRCG, two quality analysis studies were conducted: (1) an analysis of occupational coding and (2) a CATI reinterview. As noted previously, these special studies can also inform analysts about the 2003 survey data.
The occupational coding report included an analysis of the 1995 CATI autocoding of occupation and the best coding operation. During CATI interviewing, each respondent's verbatim occupation description was autocoded by computer into a standard SESTAT code, whenever possible. Autocoding included both coding directly to a final category and coding to an intermediate code-selection screen. If the description could not be autocoded, the respondent was asked to select the appropriate occupation category during the interview. For the primary occupation, 22 percent of the responses were autocoded to a final category and 19 percent were autocoded to an intermediate screen. The results of the occupation autocoding were examined, and the process was found to be successful and efficient.
For the best coding operation, an occupational worksheet for each respondent was generated and reviewed by an experienced occupational coder. This review was based on the work-related information provided by the graduate. If the respondent's self-selected occupation code was inappropriate, a new or "best" code was assigned. A total of 17,894 responses was received to the three occupation questions in the 1995 survey cycle. Of these, 25 percent received updated codes during the best coding process: 16 percent were recoded from the "other" category and 9 percent were recoded from the "non-other" categories. This analysis indicated that the best coding activity was necessary to ensure that the most appropriate occupation codes were included on the final data file. As a result of this 1995 NSRCG quality study, the best coding procedure was implemented in the 1997, 1999, 2001, and 2003 surveys as well. In the 2003 survey, a total of 10,215 occupations were assigned an occupation best code following data collection. Of these, 66.5 percent of the cases had a best code that matched the self-code, and 33.5 percent were assigned a best code that differed from the self-code.
The second quality analysis study conducted in the 1995 NSRCG involved a reinterview of a sample of 800 respondents. For this study, sampled respondents were interviewed a second time, and responses to the two interviews were compared. This analysis found that the questionnaire items in which respondents were asked to provide reasons for certain events or behaviors had a relatively large index of inconsistency values. Examples include reasons for not working during the reference week and reasons for working part time. High response variability is typical for items that ask about reasons and beliefs rather than behaviors, and the results were not unusual for these types of items. Some of the other differences between the two interviews were attributed to the time lag between the original interview and reinterview.
For the 1993 NSRCG, two data quality studies were completed: (1) an analysis of interviewer variance and (2) a behavioral coding analysis of 100 recorded interviews. The interviewer variance study was designed to measure the impact of interviewer effects on the precision of the estimates. The results showed that interviewer effects for most items were minimal and thus had a very limited effect on the standard error of the estimates. Interviewer variance was highest for open-ended questions.
The behavioral coding study was done to observe the extent to which interviewers were following the structured interview and the extent to which it became necessary for them to give unstructured additional explanation or comments to respondents. As part of the study, 100 interviews were taped and then coded on a variety of behavioral dimensions. This analysis revealed that on the whole, the interview proceeded in a very structured manner, with 85 percent of all question and answer "dyads" being "asked and answered only." Additional unstructured interaction/discussion took place most frequently for questions in which there was some ambiguity in the topic. In most cases, this interaction was judged to have facilitated obtaining the correct response.
The results from the quality studies were used to identify questionnaire items that might need additional revision for the next study cycle. Debriefing sessions concerning the survey were held with interviewers, and the information obtained from these sessions was also used to revise the survey for the next cycle.
It is important to exercise caution when making comparisons with previous NSRCG results. During the 1993 cycle, the SESTAT system underwent considerable revision in several areas, including survey eligibility, data collection procedures, questionnaire content and wording, and data coding and editing procedures. The changes made for the 1995 through 2001 cycles were less significant but might affect some data trend analysis. Although the 1993 through 2003 survey data are fairly comparable, care must be taken when comparing results from the 1990s surveys to surveys from the 1980s, due to significant changes made in 1993. For a detailed discussion of these changes, refer to the 1993 through 2001 NSRCG methodology reports.
In the 2003 survey, data were collected on graduates with bachelor's and master's degrees in health fields. This additional information has altered the structure of the tabular presentations. All tables that present data on degree fields will include, for the first time, data on graduates with health degrees.
The reporting on graduates with health degrees has also caused a structural change in the tables that present data on employment status. In previous years, data on employed graduates were presented in two categories: by employment in an S&E occupation and by employment in a non-S&E occupation. In 2003, a third category was added: S&E related occupations. S&E related occupations include health-related occupations, S&E managers, S&E precollege teachers, and S&E technicians and technologists.
Estimates from the 2003 NSRCG cannot be directly compared to the 2001 NSRCG results unless the respondents with health degrees are excluded from the 2003 data.
The National Center for Education Statistics (NCES) conducts a survey of the nation's postsecondary institutions, called the Integrated Postsecondary Education Data System (IPEDS). The IPEDS Completions Survey reports the number of degrees awarded by all major fields of study, along with estimates by sex and race/ethnicity.
Although both NSRCG and IPEDS are surveys of postsecondary education and both report on completions from those institutions, important differences in the target populations for the two surveys directly affect estimates on the number of graduates. The reason for the different target populations is that the goals of the surveys are not the same. The IPEDS estimates of degrees awarded are intended to measure the output of the educational system. The NSRCG estimates are intended to measure the supply and utilization of a portion of graduates in the years after they completed their degree. These differing goals result in definitions of the target population that are not completely consistent for the two surveys. The main differences between the two surveys that affect comparisons of estimates overall and by race/ethnicity are as follows:
Despite the above-referenced factors, NSRCG and IPEDS estimates are consistent when appropriate adjustments for these differences are made. For example, the proportional distributions of graduates by field of study are nearly identical, and the numerical estimates are similar. More information on the comparison of NSRCG and IPEDS estimates is available in A Comparison of Estimates in the NSRCG and IPEDS. This report is available on the SESTAT website at http://sestat.nsf.gov in the Research Compendium section.
The following definitions are provided to facilitate the reader's use of the data in this report.
Major field of study: Derived from the survey major field category most closely related to the respondent's degree field.
Occupation: Derived from the survey job list category most closely related to the respondent's primary job.
Labor force: The labor force includes individuals working full or part time as well as those not working but seeking work or on layoff. It is a sum of the employed and the unemployed.
Unemployed: The unemployed are those who were not working on October 1 and were seeking work or on layoff from a job.
Type of employer: The sector of employment in which the respondent was working on his or her primary job held during the week of October 1, 2003. The following are definitions for each of these categories. Private industry and business includes all private for-profit and private not-for-profit companies, businesses, and organizations, except those reported as educational institutions. It also includes persons reporting they were self-employed. Educational institutions include elementary and secondary schools, 2-year and 4-year colleges and universities, medical schools, university-affiliated research organizations, and all other educational institutions. Government includes local, state, and federal government, military, and commissioned corps.
Primary work activity: Refers to the activity that occupied the most time on the respondent's job. In reporting the data, those who reported applied research, basic research, development, or design work were grouped together in "research and development (R&D)." Those who reported accounting, finance or contracts, employee relations, quality or productivity management, sales and marketing, or managing and supervising were grouped into "management, sales, administration." Those who reported production, operations, maintenance, professional services, or other activities were grouped into "other."
Full-time salary: The annual salary for the full-time employed, defined as those who were not self-employed (either incorporated or not incorporated), whose principal job was not less than 35 hours per week, and who were not full-time students on the reference date (October 1, 2003). Graduates who did not receive salaries were asked to report earned income, excluding business expenses. To annualize salary, reported hourly salaries were multiplied by the reported number of hours paid per week, then multiplied by 52; reported weekly salaries were multiplied by 52; reported monthly salaries were multiplied by 12. Yearly and academic yearly salaries were left as reported.
Race/ethnicity: All graduates, both U.S. citizens and non-U.S. citizens, are included in the race/ethnicity data presented in this report. In tables with sufficient sample size, race/ethnicity data are presented by the specific categories of white, non-Hispanic; black, non-Hispanic; Hispanic; Asian; and American Indian or Alaska Native. The "other" race/ethnicity category includes Native Hawaiian and other Pacific Islanders and individuals in multirace categories. In tables where the sample size is not sufficient to present data by specific category, the groups of black, Hispanic, and American Indian or Alaskan Native are combined into the underrepresented minority category.
The tables in this report present information for two groups of recent graduates. The first group consists of persons who earned bachelor's degrees in science, engineering, and health fields from U.S. institutions during academic years 2001 and 2002. The second group includes those who earned science, engineering, and health master's degrees during the same two years. Standard error tables are presented as a separate set and are included in appendix B.
Bieler G, Williams R. 1990. Generalized standard error models for proportions in complex design surveys. Proceedings of the Section on Survey Research Methods, American Statistical Association.
Wolter KM. 1985. Introduction to Variance Estimation. New York: Springer-Verlag.
 Certainty institutions were selected by identifying the institutions with the largest number of graduates with science, engineering, and health bachelor's and master's degrees in the 2000–2001 and 2001–2002 academic years.
 Prior to graduate sampling, the sampling frames (sampling lists received from the institutions) were unduplicated. These cases were generally due to double majors. For example, if a graduate received two eligible bachelor's degrees during the 2001 academic year, only one record was kept on the frame, recording one major as the first major and the other as the second major (according to a set protocol).
 S&E occupations include the following broad groups: biological, agricultural, and environmental life scientists; computer and information scientists; mathematicians and statisticians; psychologists; social and related scientists; engineers; and postsecondary teachers in science and engineering fields.