Characteristics of Recent Science and Engineering Graduates: 2008
Appendix A. Technical Notes
The National Survey of Recent College Graduates (NSRCG) provides information on recent recipients of bachelor's and master's degrees in science, engineering, and health (SEH) fields. These technical notes include information on the target population, sample design, data collection, response rates, data editing, imputation, weighting, and variance estimation (reliability) for the 2008 NSRCG. Tables of standard errors are also included (appendix A, tables A1–A54). More detailed information is provided in the 2008 NSRCG Methodology Report (available on request).
The NSRCG is sponsored by the National Science Foundation's (NSF's) National Center for Science and Engineering Statistics (NCSES), which was previously the Division of Science Resources Statistics. Originally known as the New Entrants Survey, it has been conducted every 2 or 3 years since 1974. The purpose of the NSRCG is to provide high-quality data on the demographic, educational, and employment characteristics of recent recipients of bachelor's and master's degrees in SEH fields. The NSRCG is closely coordinated with the National Survey of College Graduates (NSCG) and the Survey of Doctorate Recipients (SDR). Results from the three surveys are integrated into the Scientists and Engineers Statistical Data System (SESTAT, http://www.nsf.gov/statistics/sestat/), which provides information about the employment, educational, and demographic characteristics of scientists and engineers in the United States.
Target Population and Sample Design
The target population for the 2008 NSRCG was all individuals who fit both of the following criteria:
All postsecondary institutions in the United States that conferred at least one bachelor's or master's degree in an SEH field between 1 July 2005 and 30 June 2007 (academic years 2006 and 2007) were eligible to participate in the 2008 NSRCG survey.
The NSRCG sample is drawn from a two-stage process. In the first stage, a sample of institutions is selected; in the second stage, a sample of graduates is selected from lists provided by the sampled institutions. The sample frame of institutions for inclusion in the first stage is obtained from the Integrated Postsecondary Education Data System (IPEDS) database maintained by the National Center for Education Statistics (NCES). For the 2008 NSRCG, the first-stage institution sample frame consisted of 2,027 eligible U.S. postsecondary institutions.
The first stage of the sample was selected with probability proportional to size (PPS). A composite size measure was related to the number of eligible graduates, controlling for sample-size domains defined by degree level, field of major, race/ethnicity, and sex. Institutions that produce relatively large numbers of bachelor's or master's degrees were selected with certainty. Institutions selected proportionately to a measure of size reflected the maximum percentage of graduates in each of the degree fields within the level-of-degree categories. The measure of size was adjusted to increase the probability of selection of institutions with relatively high percentages of graduates in targeted minority groups. To maintain the efficiency of the institution sample, all 300 institutions selected for the NSRCG in 2003 and 2006 were retained for the 2008 sample; however, two of the institutions sampled were ineligible for the 2008 NSRCG because they conferred no degrees in SEH fields during academic years 2006 and 2007. Using a PPS sample-selection procedure, a supplemental sample of 4 institutions was drawn from the list of 295 newly eligible NSRCG institutions. Therefore, the total number of sampled schools was 302.
Sampled institutions were asked to provide a list of all students who had graduated with a master's or bachelor's degree in selected SEH fields during the previous two academic years—2006 and 2007. Using these graduate lists, the 2008 NSRCG graduate sampling frame was then constructed, following four steps: (1) processing each institution's list of SEH graduates, verifying eligibility; (2) merging the graduate lists from all of the institutions; (3) de-duplicating multiple degrees; and (4) imputing missing information for the sampling variables. At the end of this process, the sampling frame consisted of 799,206 unique graduate records from the 288 institutions that responded in the first stage.
The second stage entailed sampling the 18,000 bachelor's or master's degree recipients with eligible degrees from the institutions that responded in the first stage. The 2008 NSRCG sample was designed to provide statistically reliable national estimates for domains defined by degree type, major field of study, race/ethnicity, and sex. NSF provided guidance on the required sample size for each NSRCG domain in the 2008 survey. A total of 222 domains was defined, based on three race/ethnicity groups, by sex and by 20 major fields for bachelor's degree recipients, plus three race/ethnicity groups, by sex and by 17 major fields for master's degree recipients, each with a specified minimum effective sample size of 40.
The sampling frame was stratified by the domain variables: degree type, major field of study, race/ethnicity, and sex. Missing values in these items were imputed before sample selection. Missing values for degree type and major field of study constituted no more than 0.01% of cases for each variable and were imputed by comparing counts and institutional data from IPEDS data, where feasible. No more than 0.58% of cases had missing values imputed for sex and no more than 16.31% of cases had missing values imputed for race/ethnicity. Missing values in both sex and race/ethnicity were imputed from lists of more than 3 million name-race/ethnicity and name-sex combinations available since the 2003 NSRCG list collection. In cases where this approach was not suitable, values were imputed randomly, based on counts from IPEDS data.
The sample of 18,000 graduates was allocated in two steps. First, an iterative procedure was used to assign the minimum effective sample size to all levels of domains. The surplus sample was then allocated to domains proportional to population size, excluding the already assigned sample.
PPS sampling was used to select the graduate sample. Institution-level, domain-specific sampling rates were calculated and used as the measure of size for graduate selection. The NSRCG sample-selection procedure was designed to achieve self-weighting graduate samples within each of 222 NSRCG analytic domains. The final sample consisted of 10,159 graduates with bachelor's degrees and 7,841 with master's degrees from 288 institutions.
Data Collection and Response Rates
The first-stage list data collection and the second-stage graduate survey data collection were conducted by Mathematica Policy Research, under contract with NCSES. The first-stage list collection began with contacting the 302 sampled institutions to obtain lists of their SEH graduates for academic years 2006 and 2007. Of the 302 sampled institutions, 288 provided lists of graduates, and 14 refused (a response rate of 95.4%); 282 of the 288 responding institutions subsequently provided contact information for sampled graduates. Of the 6 remaining institutions, 4 provided the names of graduates along with the sampling variables; these graduates were subject to immediate, intensive locating procedures. Working closely with Mathematica, the final 2 institutions conducted their own mailings, using Mathematica-provided materials and protocols. For graduates with missing or inaccurate address information, intensive searches were conducted using subscription-based databases, Internet search engines, social and professional networking sites, and computer-assisted telephone interviewing (CATI).
The second-stage graduate survey data collection used three data collection modes—paper, Web, and CATI. Paper and Web were the primary modes in the initial stage of data collection, followed by CATI. NSF provided the final printed mail questionnaire and the guidelines for programming the electronic survey instruments (2008 SESTAT editing guidelines and procedures are available at http://www.nsf.gov/statistics/sestat/editing.cfm). The guidelines, adapted from the paper instrument, specify question wording, routing, and edit checks to ensure that responses to the CATI interview and Web instruments are logically consistent and within range.
The 2008 NSRCG was designed to collect detailed information for the reference week of 1 October 2008 on four major topic areas: education, employment, other work-related experiences, and demographics. NSF identified several questions that the agency considered key for future analyses and classified them into two groups: (1) critical complete items and (2) critical callback items. In the case of the former, a questionnaire could not be counted as complete if any of the questions covering working status, occupational title, occupational description, or resident status in United States was unanswered by the respondent. For questions identified as critical callback items, missing or inconsistent information was followed up on during a CATI callback designed to collect or correct the missing or inconsistent information, including any additional degree information, additional information for classification of their principal occupation or a second job if mentionsed, weekly hours worked, or work activities for their principal job.
An important facet of the 2008 NSRCG was a randomized postpaid incentive experiment, developed by NSF in collaboration with Mathematica. This experiment, which included randomizing the entire 2008 NSRCG sample into treatment groups, was designed to examine the impact of incentives on response rates and to determine whether incentives and other data collection procedures could be used to increase the number of questionnaires completed online. Survey mailings were customized to meet the individual requirements of the treatment groups in the experiment.
The 2008 NSRCG included a pre-field mailing and two large-scale survey mailings as well as follow-up reminder mailings and e-mails prior to the initiation of CATI follow-up contacts. The pre-field mailing gave sample members advance notice of the upcoming survey and also address updates from the U.S. Postal Service. The first mailing requested participation in the study. For some groups, the Web was the only response mode offered, and instructions for completing the survey on the Web were included. For other groups, the first mailing included a questionnaire and business-reply envelope and also directions for completing the survey on the Web. Depending on the experimental group, incentives were also offered. A thank-you/reminder letter from NSF was mailed approximately 1 week after the first mailing. A second mailing to all nonrespondents was sent about 5 weeks later. This second mailing provided all nonrespondents with a paper questionnaire and business-reply envelope, as well as instructions for completing the survey on the Web. Incentives were also offered when specified by the experimental design. Each mailing provided a list of frequently asked questions and a toll-free helpline number. Additional postcards and reminder e-mails to nonrespondents followed the second mailing, beginning 1 week after the second mailing. The reminders continued on roughly a biweekly schedule until the data collection ended. Cases with missing critical callback items or issues with sample person verification were referred to the CATI team for data retrieval.
In the first stage of sampling, 288 of 302 sampled institutions agreed to participate in the survey. This corresponds to an unweighted response rate of 95.4% and a weighted response rate of 94.2%. At the second stage of sampling, 15,581 of the 18,000 sampled graduates were able to be located (86.6%). Of the 15,581 located graduates, 76.9% (11,985 cases) responded and completed the survey, 6.3% (975 cases) were determined to be ineligible, and 16.8% (2,621 cases) did not respond, either because they refused (1,259) or because the effort ended (1,362), so their eligibility status remained unknown. Of those determined to be ineligible, 642 (65.9%) were found to be living outside of the United States during the reference period. Response rates, by degree level, are summarized in table 1.
The overall unweighted graduate response rate was 71.4%; the overall weighted graduate response rate was 69.7%. Considering both stages of sampling, the overall unweighted survey response rate for the 2008 NSRCG was 68.1%, and the corresponding weighted response rate was 65.7%.
Data Editing and Coding
Returned questionnaires were opened by trained receipt staff. Trained clerks reviewed the questionnaires to identify incompletes and cases that had missing critical callback items. A computer-assisted data entry instrument was used to convert information from returned mail questionnaires into electronic records. All data entered from mail questionnaires were subject to verification and quality control. Missing critical items from both Web and mail questionnaires were forwarded for telephone followup. Seventy-eight percent of the cases sent for follow-up completed the interview. Prior to computer data processing, data files with questionnaires completed in each of the three modes and the coding data bases were reformatted and standardized into a single database.
Coding was conducted in several stages. First, autocoding programs developed by the U.S. Census Bureau were applied to education, occupation, and "other (specify)" verbatim responses. Second, geocoding was applied to identify the location of educational institutions and employers. Third, the U.S. Census Bureau conducted the IPEDS autocoding. Verbatim responses that could not be autocoded were manually coded. This process was subject to a quality-control procedure, and difficult cases were referred to expert coders. All variables were converted to standardized formats and subject to final checks, according to SESTAT guidelines.
Imputation of Missing Data
Missing values for some critical complete items, such as U.S. residency, could be deduced by logical imputation. If, however, a missing value for one of the critical complete items could not be deduced by logical imputation, the questionnaire was classified as a nonresponse. All other questions with missing responses were subject to imputation. Logical imputation was carried out at the editing stage. Statistical imputation techniques were implemented following machine editing to address remaining item nonresponse. To maintain consistency with previous years and other SESTAT surveys, hot-deck imputation was used as the primary statistical imputation method. Class and sorting variables were determined for each survey response item through multiple regression analysis. Cold-deck imputation was used for a few demographic variables, such as birth date, sex, and race/ethnicity. The order of imputation was as follows: demographic information, education background, employment situation, and other work-related experiences. All items with imputed values were subject to multiple quality checks.
Item nonresponse for key employment items—such as employment status, sector of employment, and primary work activity—ranged from 0.0% to 1.8%. Employment situation items, such as reasons for not working or salary, had item nonresponse rates between 2.2% and 6.3%. Items regarding personal demographic data—such as marital status, citizenship, race/ethnicity, and physical ability—had item nonresponse rates ranging from 1.8% to 16.3%.
To produce national estimates from the NSRCG, sampling units are weighted to account for unequal selection probabilities and nonresponse and also to align the sample with known population characteristics.
Each graduate was assigned an unconditional sampling weight by multiplying the nonresponse-adjusted institution-level sampling weight from the first stage of sampling with the graduate-level conditional sampling weight from the second stage of sampling. This weight was then adjusted for any additional duplicates, followed by an adjustment for graduate-level nonresponse. A multiplicity adjustment was then made to the nonresponse-adjusted weight to account for multiple chances of selection for graduates with multiple eligible degrees reported during data collection. The weights were raked by some key variables so that total count estimates calculated with the weights agreed with the known population totals of recent college graduates available from IPEDS. Any extreme weights were then trimmed, and a final raking adjustment was performed.
Reliability of Estimates
The survey estimates provided in these tables are subject to both sampling and nonsampling errors. Sampling error occurs because the estimates are based on a sample of individuals in the population rather than on the entire population; hence, estimates are subject to sampling variability.
Sampling error is measured by the variance, or standard error, of the survey estimate. The variance estimation has to account for a multistage complex sampling design, imputation, and weight-adjustment procedures as much as possible. To address these complexities, both the direct method of jackknife replication and the indirect method of generalized variance functions (GVFs) can be used for variance estimation.
Using the jackknife method, replicate weights were constructed and made available to data users for calculating variance estimates for various statistics. Standard errors for the detailed data presented in tables 1–54 were calculated using this replication method. The jackknife method is a resampling technique that estimates the sampling variation of the estimates based on the variation of estimates calculated from subsamples of the data. Each subsample is subject to the same weighting procedures applied to the complete sample; this results in a set of replicate weights. For the 2008 NSRCG, 186 replicate weights were constructed and can be used to produce variance estimates. The variation of a weighted statistic across all 186 replicates can be used to estimate the variance of the statistic computed from the full sample.
For limited types of statistics and domains of estimation, users may use GVF for quick and simple calculation of standard errors. Estimated parameters of the GVF (variance model) were provided for estimating variances of totals and percentages for a number of domains (available on request). However, because the variance estimates obtained from using GVF are model-based estimates, they may be subject to modeling error.
In addition to sampling errors, survey estimates are subject to nonsampling errors that can result from survey nonresponse, coverage errors, reporting errors, and data processing errors. The 2008 NSRCG used procedures throughout its development and implementation that were specifically designed to minimize nonsampling error. Extensive questionnaire redesign work, completed in conjunction with the other two SESTAT surveys, helped reduce reporting errors through the use of cognitive interviews, expert panel reviews, and mail pretests.
Comprehensive training and monitoring of data processing staff and telephone interviewers helped to ensure the consistency and accuracy of the data. Nonresponse was handled in ways designed to minimize the impact on data quality (through weighting adjustments and imputation). In data preparation, a special effort was made in the area of occupational coding. Respondent-chosen codes were verified by specially trained coding staff using a variety of information collected on the survey, particularly verbatim responses, and by applying coding rules developed by NSF for the SESTAT surveys.
Quality-assurance procedures included throughout the various stages of data collection and data processing reduced the possibilities for nonsampling error. Sources of nonsampling error include (1) nonresponse error, which arises when the characteristics of respondents differ systematically from nonrespondents; (2) measurement error, which arises when the variables of interest cannot be precisely measured; (3) coverage error, which arises when some members of the target population are excluded from the frame and thus do not have a chance to be selected for the sample; (4) respondent error, which occurs when respondents provide incorrect data; and (5) processing error, which can arise at the point of data editing, coding, or data entry. The analyst should be aware of potential nonsampling errors, but these errors are more difficult to detect and quantify than sampling errors.
Changes in the Survey
It is important to exercise caution when making comparisons with previous NSRCG results. During the 1993 cycle, the SESTAT surveys, including the NSRCG, underwent considerable revision in several areas, including survey eligibility, data collection procedures, questionnaire content and wording, and data coding and editing procedures. The changes made for the 1995–2008 cycles were less significant but might affect some trend data analysis. Although the 1993–2008 survey data are fairly comparable, care must be taken when comparing results from the 1990s surveys to surveys from the 1980s due to significant changes made in 1993. The 1993 National Survey of Recent College Graduates Methodology Report (available on request from the NSRCG survey manager) contains a more detailed discussion of these changes.
In all survey cycles except 2006, data were collected on graduates with bachelor's and master's degrees earned in the preceding 2 academic years. However, in 2006, data were collected from graduates in 3 academic years—2003, 2004, and 2005—with a total sample of 27,000 graduates. In addition, beginning with the 2003 survey cycle, the scope of the NSRCG coverage was expanded to include graduates with bachelor's and master's degrees in health fields as well as in science and engineering (S&E) fields. Therefore, estimates from the 2003, 2006, and 2008 NSRCG cannot be compared directly to the 2001 or earlier NSRCG results unless respondents to the 2003, 2006, and 2008 NSRCG with health degrees are excluded from the data comparisons.
In years prior to 2003, data on employed recent graduates were presented in only two categories: employment in S&E occupations, and employment in non-S&E occupations. Beginning in 2003, to further break down those employed in non-S&E occupations, a third category of S&E-related occupations was added. S&E-related occupations include health occupations, S&E managers, S&E precollege teachers, S&E technicians and technologists, and other S&E-related occupations, such as architects and actuaries.
Changes in Survey Content
SESTAT questionnaires, of which the NSRCG is one, have a large set of core data items that are retained from one survey round to another and which support trend comparisons (for the 2008 survey questionnaire, see appendix C). To further support trend comparisons, questionnaire changes tend to be minimal. The reference period for 2008 survey was moved from 1 April to 1 October to maintain a common reference date across all SESTAT surveys. The following changes were made in the 2008 questionnaire.
Comparisons with IPEDS Data
NCES conducts a set of data collections of the nation's postsecondary institutions that are integrated in IPEDS. One of these data sets, IPEDS Completions, reports the number of degrees awarded by all major fields of study along with estimates by sex and race/ethnicity.
Although the first stages of both the NSRCG and of IPEDS Completions collect similar degree completion data from postsecondary institutions, their target populations differ in their coverage. The IPEDS estimates the number of degrees awarded as a measure of output from the postsecondary educational system and can include the same person with more than one degree completion. In contrast, the NSRCG estimates the number of graduates with one or more SEH degrees in the years shortly after they completed their most-recent SEH degree. These differences in coverage between the two surveys can affect comparisons of estimates as follows:
NSRCG and IPEDS estimates are consistent, however, when appropriate adjustments for these differences are made. For example, the proportional distributions of graduates by field of study are nearly identical, and the numerical estimates are similar. More information on the comparison of NSRCG and IPEDS estimates is available in the document "A Comparison of Estimates in the NSRCG and IPEDS," available on request from the NSRCG survey manager.
Definitions and Explanations
Analytical domain. A combination of respondent characteristics defining a group for which estimates are calculated.
Relationship between occupation and degree fields. The relationship between field of occupation and major field of degree was examined at the broad level only. For example, an individual with a physics bachelor's degree working in chemistry is considered to have an occupation and degree in the same broad field; an individual with a computer sciences bachelor's degree working in an engineering occupation is considered to have an occupation in a broad field that differs from that of the degree.
Degree type. Domains are defined by degree type: bachelor's or master's.
Educational institutions. Includes elementary and secondary schools, 2-year and 4-year colleges and universities, medical schools, university-affiliated research organizations, and all other educational institutions.
Government. Includes local, state, and federal government; military and commissioned corps.
IPEDS. The Integrated Postsecondary Education Data System. An integrated system of surveys designed to collect information on the number and types of degrees awarded by U.S. postsecondary institutions and also characteristics of degree recipients.
Labor force. Includes individuals working full or part time as well as those not working but seeking work or on layoff. It is a sum of the employed and the unemployed.
Major field of study. Derived from the field of degree, as specified by the respondent and classified into the SESTAT education codes (see appendix B, table B–1).
Non-U.S. citizen. Non-U.S. citizen includes permanent residents and those on a temporary visa.
Occupation. Derived from responses to several questions on the type of work primarily performed by the respondent. The occupational classification into the SESTAT occupation codes was based on the respondent's principal job held during the survey reference week or last job held, if not employed in the reference week (see appendix B, table B–2).
Primary work activity. The activity that occupied the most time on the respondent's job. In reporting the data, those who reported applied research, basic research, development, or design work were grouped together in "research and development." Those who reported accounting, finance or contracts, employee relations, quality or productivity management, sales and marketing, or managing and supervising were grouped into "management, sales, administration." Those who reported production, operations, maintenance, professional services, or other activities were grouped into "other."
Principal job status. Principal job status (full time or part time) is based on the number of hours usually worked on the principal job during a typical week. Employed graduates who worked 35 or more hours per week on their principal job are classified as full time, and all other employed graduates are classified as part time.
Private industry and business. Includes all private for-profit and private not-for-profit companies, businesses, and organizations, except those reported as educational institutions. It also includes persons reporting that they were self-employed.
Race/ethnicity. All graduates—U.S. citizens and non-U.S. citizens alike—are included in the race/ethnicity data presented in this report. The categories of American Indian or Alaska Native, Asian, black or African American, Native Hawaiian or Other Pacific Islander, white, and persons reporting more than one race refer to non-Hispanic individuals only.
Salary. Salary data reported in the DSTs are for principal job only. Full-time employed are those who were not self-employed (either incorporated or not incorporated), whose principal job was not less than 35 hours per week, and who were not full-time students during the survey reference week. Self-employed persons and full-time students are excluded from salary data.
S&E occupation. S&E occupations include S&E postsecondary teachers; S&E-related occupations include health-related occupations. For detail, see appendix B, table B–2.
SEH fields. Biological, agricultural, and environmental life sciences; computer and information sciences; mathematics and statistics; physical and related sciences; psychology; social and related sciences; engineering; health.
SESTAT. The Scientists and Engineers Statistical Data System. This system integrates data from the Survey of Doctorate Recipients, the National Survey of College Graduates, and the National Survey of Recent College Graduates (http://www.nsf.gov/statistics/sestat/).
Type of employer. The sector of employment in which the respondent was working on his or her primary job held during the survey reference week.
Unemployed. The unemployed are those who were not working during the survey reference week and were seeking work or were on layoff from a job.
 Before raking, the following adjustments were carried out to account for discrepancies between NSRCG and IPEDS. First, the IPEDS reporting unit is "degrees awarded," whereas the NSRCG reporting unit is "graduates with degrees." To account for this difference, we converted NSRCG data with "graduate" as the unit to degree-level data, with multiple records for a case having multiple degrees in eligible fields. Second, IPEDS reflects the number of degrees awarded to all graduates, whereas NSRCG represents a subset of graduates that excludes those who were either living outside the United States on the survey reference week, age 76 or older, deceased, institutionalized, or terminally ill on the survey reference date. Therefore, the NSRCG-eligible degrees were matched to the IPEDS adjusted total counts.
Standard Error Tables