Graduate Students and Postdoctorates in Science and Engineering: Fall 2009
Appendix A. Technical Notes
During the production of this report the America COMPETES Reauthorization Act of 2010 was signed into law. Section 505 of the bill renames the Division of Science Resources Statistics as the National Center for Science and Engineering Statistics (NCSES). The Center retains its reporting line to the Directorate for Social, Behavioral and Economic Sciences within the National Science Foundation. The new name signals the central role of NCSES in the collection, interpretation, analysis, and dissemination of objective data on the science and engineering enterprise.
The Survey of Graduate Students and Postdoctorates in Science and Engineering (GSS) is an annual census of all known academic institutions in the United States that grant master's degrees or research doctorates in science and engineering (S&E) fields and in selected health fields. The data collected in the 2009 GSS represent national estimates of graduate student enrollment and postdoctoral employment as of fall 2009.
In 2009 the survey universe consisted of 703 schools at 575 academic institutions: 493 schools at 366 doctorate-granting institutions, and 210 schools at 209 master's-granting institutions. Data collected included demographic and funding information for graduate students and postdocs, as well as counts of doctorate-holding nonfaculty researchers, by sex.
Table A-1 shows the number of institutions, schools, and organizational units (e.g., departments, degree-granting programs), by degree level covered by the GSS, and shows estimated total annual enrollment in GSS-eligible fields between 1966 and 2009. Changes in the survey that affect comparability of these data are as follows:
Tables A-2 and A-3 show the number of units surveyed, by detailed field, in doctorate-granting and master's-granting institutions. Table A-4 shows the unit response rates from 1975 through 2009. Tables A-5 through A-12 show imputed data and/or imputation rates for different categories.
Survey Instrument and Procedures
A Web survey system was the primary mode of 2009 data submission. The survey cycle was launched in October 2009 and concluded in May 2010.
The 2009 GSS Web survey consisted of two parts. Part 1 required the identification of organizational units ("units") within the reporting school. Part 1 could be completed only in the Web survey system.
Part 2 collected counts of graduate students, postdocs, and other doctorate-holding nonfaculty researchers. A paper worksheet was provided for preparing figures to later be entered in Part 2 of the Web survey. To assist with the transfer of information, the content and format of the data collection grid on the paper worksheet were identical to Part 2 of the Web survey. A small number of school coordinators chose to submit Part 2 data using the paper worksheet.
Institutions select a coordinator for each school that grants a graduate degree in an eligible field. School coordinators for the GSS are responsible for the following:
Revisions Affecting Survey Universe
Units. The Web survey was redesigned in 2007 in an effort to include and appropriately classify all eligible units and to exclude ineligible units. See the Technical Notes section of the 2007 report for more detail.
Fields of study and degree-granting programs. In 2007 a comprehensive review of GSS eligible fields led to several changes to the classification scheme. GSS-eligible degree-granting programs were updated from the 1990 to the 2000 Classification of Instructional Programs (CIP) taxonomy of the National Center for Education Statistics (NCES). Degree-granting programs that had previously been represented by a four-digit CIP code are now represented at the six-digit level of specificity. Three newly eligible fields were added to the survey, some programs became ineligible, and others were reclassified. See the Technical Notes section of the 2007 report for more detail.
Due to these adjustments to the taxonomy and other methodological changes introduced in 2007, data collected since that year are not directly comparable with data from previous years. For trend analyses, the detailed statistical tables (DSTs) provide estimates of the counts that would have been collected in 2007 had the 2006 methodology been used (see "Bridge Year Calculation and Display," below).
Revisions to Instructions and Definitions
Due to the rise in online degree programs, NSF received a number of questions about how to treat students who were enrolled in an online degree program but were not U.S. citizens, permanent residents holding green cards, or foreign nationals holding temporary visas. A clarification was introduced in 2008 to exclude non–U.S. citizens residing outside the United States who are enrolled in an online degree program at a U.S. institution.
Students doing thesis or dissertation research away from a U.S. campus were included beginning with the 2008 survey. The instructions read, "Count all students enrolled in a U.S. institution for credit in a graduate degree program doing thesis or dissertation research work regardless of their location."
Bridge-Year Data Calculation and Display
Due to the methodological changes introduced in 2007, including modifications to the set of GSS-eligible fields, most DSTs provide data for 2007 in two ways: "2007old" and "2007new." Data shown under 2007old provide estimates of the counts that would have been collected in 2007 had the 2006 methodology been used. Counts reported under 2007new were collected using the methodology introduced in 2007.
To derive counts for 2007old, all units that were reported in the 2006 data collection and retained in 2007 were assigned the same GSS field as in 2006. This is consistent with the 2006 GSS coding because the Web survey system before 2007 did not have a direct mechanism for changing GSS codes, and very little recoding was done. Any new unit added in 2007 was given the GSS field code assigned to it, with the following exceptions:
The 2007old counts are based on a subset of the 2007 data due to the first exception listed above. A comparison of 2007old with 2007new data reflects differences due to the addition of the three newly added science fields and recoding of units from their 2006 fields to other fields.
The deadline for Part 1, the update of the unit list, was 30 November 2009. Schools that missed this Part 1 deadline received special attention from the survey contractor early in the survey cycle. The deadline for submitting data for Part 2 was 26 February 2010.
From 2004 through 2006 a unit was considered a complete respondent if it reported complete row and column totals in the data collection grids and a partial respondent if it reported only grand totals for these grids. Any unit that did not meet the requirements for complete or partial respondent status was considered a nonrespondent. Beginning in 2007, in order to receive complete response status, a unit needed complete row and column totals for all grids as well as all details summing to the totals. Units that had only complete row and column totals for all grids were counted as partial respondents. As in previous years, units that reported only grand totals for all tables were counted as partial respondents.
As in previous years, data-collection grids in the Web survey were prefilled with zeros. Prior to the 2007 survey cycle, prefilled zeros were considered legitimate responses if the grid was visited and left with all zeros in place. Beginning in 2007, a checkbox was placed above the grids on each of these screens. The respondent was required to check this box to acknowledge explicitly that the unit had no individuals to report for that particular grid, allowing true zeros to be distinguished from nonresponse for the grid. Grids with a marked checkbox, indicating no individuals to report, contributed to a complete response for the unit. Grids with unchanged, prefilled zeros and a blank checkbox disqualified the unit from complete response status.
Beginning in the 2007 survey cycle, an allowance was made for units that provided complete or partial data for at least one (but not all) of the grids. These units were counted as partial respondents.
These new response rate calculations adhere to American Association for Public Opinion Research standards for computing response rates.
In 2009 the GSS received complete responses from 11,709 (88.1%) of the 13,285 eligible units. An additional 1,478 units (11.1%) were partial respondents. The remaining 98 units (0.7%) were nonrespondents.
New data collection procedures introduced in the 2007 survey cycle (see the Technical Notes section of the 2007 report) appear to have greatly improved inclusion of eligible units and exclusion of ineligible units. The number of unit additions increased over threefold from 2006 to 2007 and leveled off in 2008 (table 1). School coordinators added fewer units in 2009, but there were still almost twice as many units added as in 2006. The number of units deleted more than doubled from 2006 to 2007. Although the number of units deleted in 2008 and 2009 declined from the number deleted in 2007, school coordinators still removed significantly more units than in the 2006 survey cycle. The dramatic increase in the number of units added and deleted in the 2007–09 data collections suggests that there was underreporting of GSS-eligible units and overreporting of ineligible units in previous survey years.
Retrieval and Editing
Data quality is ensured by interactive edit checks built into the Web survey and a comprehensive review after the data are submitted by the school coordinator. The Web survey checks that the counts provided are internally consistent and within an expected range based on the previous year's data. Unit respondents are asked to explain the discrepancy whenever counts are substantially different from the response provided in 2008.
Five types of postsubmission data quality checks were implemented in 2009 to identify questionable data for further review. These checks included changes to the unit list, changes to total counts, changes to the distribution of counts, identical counts, and counts inconsistent with the unit's status. Changes to the unit list included all unit additions and deletions and also changes to the highest degree granted status, GSS code, and unit name. Total count changes were reviewed if they were flagged by the survey instrument, were greater than five and went to/from zero, or were more than two standard deviations away from the mean change for that total. Significant changes to the distribution of counts by race/ethnicity, gender, or primary funding type were also reviewed, as were all cases where the responses provided in any given grid were unchanged from the previous survey cycle or identical to the data provided for a different grid or unit in the same school in the same survey cycle. Finally, data that were inconsistent with a unit's status were examined, such as when all full-time students were reported as first-time students for an extant unit or when graduate students were enumerated for a non-degree granting unit.
Data fluctuations that were not sufficiently explained by the comments provided by the respondents during data collection were flagged for follow-up by telephone call to the school coordinator. Revisions were made directly in the Web survey by the school coordinator, unit respondents, or GSS contractor staff at the direction of the school coordinator. The data collected in the 2009 survey cycle were subject to the most rigorous review to date, resulting in one or more revisions within 4.7% of all reported units (629 of 13,285) spread across 26.0% of all schools (183 of 703). These figures are approximately triple the comparable 2008 figures (1.4% of units and 9.2% of schools) and demonstrate that the 2009 data review and retrieval process was effective in uncovering and correcting errors within the GSS data. As a proportion of overall counts before imputation for nonresponse, the number of part-time students saw the largest change (–3.4%), followed by counts of doctorate-holding nonfaculty researchers (1.8%); first-time, full-time graduate students (–1.6%); and full-time graduate students (–1.4%). The total count of postdocs was relatively unaffected by retrieval (0.2%). See "Known or Suspected Sources of Nonsampling Error" below for a discussion of the types of measurement error detected in the 2009 data review and retrieval process.
Item Nonresponse and Imputation
Of the 216 items collected in the four data collection grids in the 2009 GSS, the mean item nonresponse rate was 4.2%. The item nonresponse rates ranged from 1.0% for total number of full-time students and part-time students to 7.1% for the number of male postdocs whose largest mechanism of support is a federal research grant. All missing data were imputed.
Different imputation techniques were used for extant units and new units. For units with at least 1 year of reported or imputed data, a carry-forward imputation method was used. Inflation factors were calculated for four key totals to account for year-to-year change. The previous year's key totals were then multiplied by these inflation factors to calculate the imputed values for the current year's key totals. Finally, all other variables were imputed by distributing the imputed key totals according to the previous year's proportions. The same procedure was used in the 2008 imputations. In 2007 the carry-forward method was used only if the unit reported data within the previous 5 years. This condition was lifted in 2008 because simulations using the 2007 data revealed that the carry-forward method performed better than other methods, even if the previous data were reported over 20 years ago.
When no reported or imputed data existed for a unit in a prior survey cycle, a different approach was needed. For new units with reported totals but no details in 2009, a nearest neighbor imputation method was used. In this method a donor unit that was "nearest" to the unit whose data were being imputed (imputee) was identified among all responding units having similar characteristics as the imputee (such as having the same GSS code and offering a PhD degree). When graduate student details were being imputed, the nearest neighbor selected had full-time and part-time graduate enrollments that were most similar to the imputee's enrollments. When postdoc and doctorate-holding nonfaculty researcher details were being imputed, the total number of postdocs was used to choose the nearest neighbor. The imputed values were calculated by adjusting the donor's values to account for the difference in full-time and part-time enrollment totals between the two units.
In rare circumstances when no data were available from a new unit, Integrated Postsecondary Education Data System (IPEDS) completions and enrollment data were used to estimate graduate student totals and details. This approach was instituted with the 2008 survey cycle based on research that demonstrated its superiority over a nearest-neighbor method under these conditions. Because IPEDS does not collect data on postdocs and doctorate-holding nonfaculty researchers, a nearest neighbor was selected from the 2009 GSS data to estimate these counts.
Known or Suspected Sources of Nonsampling Error
Review of the data, cognitive interviews, usability tests, pilot tests, site visits, and other methodological activities with the institutions have pointed to a number of possible sources of measurement error. These are discussed below, along with any steps taken to minimize the impact on the data, where applicable.
Data review and telephone interviews conducted with school coordinators have revealed overreporting of graduate students working toward practitioner degrees, particularly in health fields. Starting with the 2007 survey cycle, survey materials indicated that students pursuing master's, DDS, or MD degrees in 24 specified fields should be excluded. After the change in survey materials, school coordinators often provided a comment explaining that they were deleting a unit because the degrees it offers are practitioner based. This provides some indication that these procedures may have reduced reporting error. However, the data quality control process in 2009 indicated that some school coordinators were still reporting graduate students in practitioner-based degree programs. Many school coordinators revised downward the total count of graduate students in fields with degree exclusions, particularly among nursing units, after being contacted about questionable data. Systematic checks for this type of measurement error ensure that school coordinators are aware of the degree exclusions and are reporting data appropriately.
Data review and retrieval indicated that zeros reported by respondents sometimes represent nonresponse rather than actual zero counts. Not distinguishing between the two could result in low estimates, given that data for a given variable are not imputed when item nonresponse is misinterpreted as a zero response. In 2007 to distinguish zero-entered responses from true nonresponses, a checkbox was added for the respondent to confirm a zero entry. Although this helped to reduce substantially the number of ambiguous zero counts, counts for the subgroups still had similar problems. In 2008 the survey was revised to collect the subgroup counts directly, reducing such instances. In 2009 all remaining ambiguous zero counts were reviewed, and follow-up calls with respondents were made to clarify responses, as needed.
As a result of data review and retrieval, zeros for the total number of full-time students; first-time, full-time students; and doctorate-holding nonfaculty researchers were replaced with a positive count in approximately one-quarter of the instances identified as needing review (22.3%, 26.9%, 25.3%, respectively). Zero counts for part-time students and postdocs were also revised fairly often (12.5% and 17.0%, respectively). Although some instances of nonresponse zeros masquerading as reported zeros were rectified during data review and retrieval in the past, the increased rigor of the process in the 2009 survey cycle minimized this type of reporting error. Moreover, with further revisions to the Web survey in the 2010 survey cycle, most instances of ambiguous zeros will be eliminated.
Methodological research, data review and retrieval, and feedback from respondents indicated that graduate students' financial support data were difficult for respondents to report and, therefore, more prone to measurement error than other survey data. These data are difficult for school coordinators to collect accurately, because the information may not be stored in one centralized database for the institution. Also, types of financial support that are not channeled through the institution, such as self-support, may be underreported, and foreign sources of support are not always known. Respondents may also have difficulty categorizing financial information by field, such as when a student is enrolled in one unit but receives support from another. Finally, institutions define mechanisms of support differently (e.g., fellowships vs. traineeships) and may report students according to the institution's definition rather than the definition provided by the GSS.
Usability tests conducted with respondents in 2008 revealed that there had been some misreporting of race and ethnicity. This was due to the unclear format of the GSS race/ethnicity questions. The format reflected NSF's interpretation of the Office and Management and Budget's (OMB's) 1997 revision of its standards on collecting these data. In 1999 GSS began collecting data on Hispanics of one race separately from data on multiracial Hispanics, although this was not necessary for compliance with the revised OMB standards. The cognitive interviews revealed that black Hispanics and white Hispanics were sometimes counted in the "Hispanic—More than one race" category rather than in the appropriate "Only one race—Hispanic" category. In 2008 these two Hispanic categories were collapsed into one, "Hispanic/Latino ethnicity (one or more races)." Subsequent cognitive interviews indicated that the new grouping was easier for respondents to understand.
Increasing numbers of students are choosing not to report their race to their institution, leading to growth over time in the "Unknown/race not stated" GSS category. This leads to gradual declines in the proportion of students reported in some racial and ethnic groups. This is a trend that is not unique to GSS.
Interviews and usability tests with respondents have found that data on postdocs and doctorate-holding nonfaculty researchers are particularly challenging for some respondents to report. Many respondents indicate in the Web survey that they are unable to provide data on their unit's postdocs or doctorate-holding nonfaculty researchers. A pilot study was conducted to evaluate alternative procedures for collecting these data so that more complete and accurate data may be collected in the future. Starting with the 2010 survey cycle, schools will be given the option of appointing a separate postdoc coordinator who may be more knowledgeable about the postdocs or appointees at their school to provide these data.
Anecdotal evidence indicated some double counting may have occurred when an institution had more than one school coordinator or offered joint programs, although written instructions emphasized that each individual should be counted only once. In order to reduce double counting, facilitate interinstitution communication, and allow sharing of reported data, a screen in the Web survey provides names and contact information for all school coordinators at the institution.
Changes in Eligibility and Degree-Granting Status
Institutions are classified as doctorate granting if at least one GSS-eligible unit confers doctoral degrees. Twelve institutions changed GSS degree-granting status in 2009. The status of five institutions or schools changed from eligible to ineligible, based on criteria for inclusion in the GSS (see "Survey Universe," above).
Status changed to doctorate-granting from master's-granting, 10 institutions:
Status changed to master's-granting from doctorate-granting, 2 institutions:
Status changed from eligible to ineligible, 4 institutions/schools:
Institution Name Changes and Mergers
Four institutions reported a name change in 2009:
In 2007 the GSS discontinued the practice of revising previous years' data based on changes the institutions report in units' eligibility and institutions' doctorate-granting status in the current survey cycle. Previously, reported counts for a given year fluctuated with each annual report because the current year's eligibility and doctorate-granting status changes were applied retrospectively to all years in the DSTs. Except for table 68, counts in the 2009 DSTs for 2003–06 reflect eligibility and doctorate-granting status as of fall 2006; they have not been adjusted to reflect changes in status that may have occurred between fall 2006 and fall 2009.
Table 68 historically has listed and ranked each institution that was doctorate-granting in the current survey cycle, regardless of doctoral-degree-granting status or eligibility in previous years. These rules have been continued in 2009. Thus, in table 68, data in years 2003–08 are counts of graduate students in those institutions that were doctorate granting in 2009, and totals for 2003–08 in this table differ from totals for 2003–08 in other tables for doctorate-granting institutions in this report.
When requested by the institution, the GSS will replace imputed estimates with actual data, but only for the most recent prior survey cycle. No such requests were made in the 2009 survey cycle.
Data collected in 2009 included demographic and funding information for graduate students, postdocs, and doctorate-holding nonfaculty researchers. Definitions of key terms follow.
First-time—Those students enrolled for credit in a graduate degree program in an organizational unit for the first time in fall 2009. This may include graduate students previously enrolled in another graduate degree program at the institution or at another institution. It may also include students who already hold another graduate or professional degree.
American Indian or Alaska Native—A person having origins in any of the original peoples of North and South America (including Central America) and who maintains tribal affiliation or community attachment.
Asian—A person having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian subcontinent, including, for example, Cambodia, China, India, Japan, Korea, Malaysia, Pakistan, the Philippine Islands, Thailand, and Vietnam.
Black or African American—A person having origins in any of the black racial groups of Africa.
Native Hawaiian or Other Pacific Islander—A person having origins in any of the original peoples of Hawaii, Guam, Samoa, or other Pacific islands.
White—A person having origins in any of the original peoples of Europe, the Middle East, or North Africa.
Hispanic or Latino—A person of Cuban, Mexican, Puerto Rican, South or Central American, or other Spanish culture or origin, regardless of race.
Non-Hispanic/Latino, more than one race—Institutions report persons who indicate more than one race and are non-Hispanic into that category on the GSS form. The reports and DSTs combine multiracial non-Hispanics with those of unknown race because no more than 0.2% of graduate students are identified as such.
Although the survey forms began collecting Asian and Native Hawaiian/Other Pacific Islander data separately in 1999, reports and DSTs have continued to combine these categories as Asian/Other Pacific Islander because less than 0.5% of graduate students have been reported in the Native Hawaiian/Other Pacific Islander category.
From 1999 through 2007 the survey forms collected counts of Hispanics of one race separately from counts of Hispanics reporting two or more races. However, reports and DSTs in these years combined these data in a single Hispanic or Latino category because no more than 0.5% of graduate students were classified as multiracial Hispanics. In 2008 the survey forms combined these categories into a single Hispanic or Latino category.
Historically black colleges and universities (HBCUs)
Graduate Student Mechanisms of Support
Graduate traineeship—An educational award given to a student selected by the institution.
Graduate research assistantship—An assistantship where most of the student's responsibilities are devoted to research.
Graduate teaching assistantship—An assistantship where most of the student's responsibilities are devoted to teaching.
Other types of support—All other mechanisms of support for full-time students, including self-supported students and members of the armed forces whose tuition is paid by the U.S. Department of Defense.
Postdoctoral Researchers (Postdocs)
(1) Holds a recent doctoral degree, generally awarded within the last 5 years, such as
(2) Has a limited-term appointment, generally from 5 to 7 years,
Mechanisms of Postdoc Support
Federal traineeship—An educational award from the U.S. government given to a postdoc selected by the institution.
Federal research grant—A type of financial assistance award from the U.S. government to an organization or individual to conduct specific research activities.
Nonfederal support—Support from state and local government; the academic institution; foreign sources (e.g., foreign governments, foreign firms, and agencies of the United Nations); and other U.S. sources, such as support from nonprofit institutions, private industry, and all other nonfederal U.S. sources.
Doctorate-Holding Nonfaculty Researchers
Changes have been made to the coverage and content of the GSS to keep it relevant to the needs of data users. Such changes prevent precise maintenance of trend data; therefore, some data items are not available for all institutions in all years. Major changes in the data collected (with the year in which changes became effective) include the following:
Graduate Student Support
Postdocs and Doctorate-Holding Nonfaculty Researchers
Survey UniverseInstitutions Surveyed
NSF's National Center for Science and Engineering Statistics (NCSES) releases the data from this survey annually in its Graduate Students and Postdoctorates in Science and Engineering InfoBrief and DSTs series. The information from this survey is also included in the publications Science and Engineering Indicators and Women, Minorities, and Persons with Disabilities in Science and Engineering. NSF includes selected data items from this survey for individual doctorate-granting institutions in the NCSES Academic Institutional Profiles series (http://www.nsf.gov/statistics/profiles/).
Data from this survey are available through the WebCASPAR data system. Public-use data files in Excel, SAS, and SPSS formats and the guide to the public-use data files are available for the years 1972–2009 at http://www.nsf.gov/statistics/srvygradpostdoc/pub_data.cfm.
The GSS public-use data structure was modified in the 2007 survey cycle. Significant changes include dropping the multirecord structure at the organizational unit level and combining all information associated with the organizational unit into a single-record-per-unit structure. Another notable addition is the inclusion of the IPEDS UNITID, which is a unique number for all postsecondary institutions to facilitate linkages to other data files. For more information, see the guide to public-use data files.
 The research doctorate is a research degree that (1) requires an original contribution of knowledge to a field (typically, but not always, in the form of a written dissertation) and (2) is not primarily intended for the practice of a profession. For additional survey information and available data related to graduate student enrollment and postdocs in S&E, see http://www.nsf.gov/statistics/srvygradpostdoc/.
 In this report, the term "school" refers to a graduate school, medical school, dental school, nursing school, or school of public health; an affiliated research center; a branch campus; or any other organizational component within an academic institution that grants an S&E or selected health degree.
 See response rate 3 calculation, page 45, in American Association for Public Opinion Research (AAPOR). 2011. Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys. 7th ed. AAPOR.
 The OMB standards treat Hispanics as an ethnic group rather than a racial group. Following these standards, "Hispanic" is not counted as a race in GSS. Cognitive interviews with respondents have revealed that this is a source of considerable confusion. For example, black Hispanics and white Hispanics may be counted as "Hispanic—More than one race" rather than "Only one race—Hispanic." In 2008 these two Hispanic categories were collapsed into one, "Hispanic/Latino ethnicity (one or more races)." The race/ethnicity categories were made to match IPEDS by combining the "Hispanic/Latino, more than one race" and "Hispanic/Latino, one race only" categories.