U.S. Academic Scientific Publishing
3.0 Introduction: Study Background, Historical Trends in Article Production, Methods for Counting Publications and Citations, and Attributing Counts to Institutions and Fields
This section presents the background to this study and the different ways for counting and attributing publications and citations.
Section 3.1 discusses the study background. Scientific publications and citations have been used increasingly to measure research output and the academic sector is an important generator of most U.S. publications. However, the long-term growth trend of U.S. academic scientific journal articles stalled in the 1990s. This study was undertaken to prepare a unified database on academic institutions and their publications to examine possible relationships between changes in university resources and characteristics and changes in publication trends.
Section 3.2 discusses technical issues in the analyses of publications and citations. Topics include different methods for counting and attributing publications and citations, similarities and differences between the fixed and expanding journal sets, the meaning of "institutional" authorship, the definitions of whole and fractional counting, how citations are counted and why citations may reflect more than the quality of the cited article, and why analyses on publication counts and citation counts yield very similar results. In addition this section discusses the definition of aggregated groups of scientific fields (called field groups) used in various analyses to examine the uniformity of trends across fields, and the process used for (and possible misalignment that might result from) the mapping of journals into fields to allow for analysis at the field level.
Scientists and engineers often publish their research results in peer-reviewed journal articles. The number of these articles is an indicator, admittedly imperfect, of research output. Citations to these articles are an indicator, likewise imperfect, of how influential the cited article is. In recent years, international use of these and related indicators has become widespread, as countries seek to assess their relative performance in science and engineering research.
Within the U.S. scientific community, the academic sector is critical to the overall health of the nation's research system. University-based scientists generate the most publications and, arguably, conduct much of the most important and innovative research. Developments in this sector are especially important to the overall health of a nation's research system, and they affect the nation's ability to attract and retain talented researchers from other countries. Specifically, this report addresses scientific publication trends in the top 200 R&D performing U.S. academic institutions, as measured by their 1988–2001 research and development (R&D) expenditures, since these institutions produce most output from the academic sector (figure 2). Such concentration of publications is not surprising because research is central to the overall mission of the top 200 R&D performing academic institutions. Many of these institutions achieve or aspire to worldwide recognition as research leaders.
There is evidence to suggest that the long-term growth trend in the output of scientific and technical journal articles by United States academic researchers may have changed in the early 1990s, both in the academic community as a whole and in the top 200 R&D performing academic institutions. Depending on the indicator of publication output used, the evidence suggests that the growth trend either slowed or stopped altogether at that time.
The present report is part of a series prepared by or on behalf of the Division of Science Resources Statistics (SRS). The first of those reports, "Changing U.S. Output of Scientific Articles: 1988–2003," presents descriptive data on patterns and trends in article production and citations in the 15 years beginning in 1988. The second of those reports, "Perceptions of Academic Researchers and Administrators," is based on qualitative data from interviews and focus groups and summarizes the views of experienced observers and practitioners in research universities about how the worlds of academic science and engineering research and publication have been changing over the past 15 years. Building on these earlier publications, the present report analyzes quantitative data on research inputs and outputs during the period from 1988 through 2001 in the U.S. academic sector to explore possible explanations for the various observed patterns and trends.
As part of the present study, SRS contracted with SRI International to examine how trends vary in various parts of the U.S. academic research system and how institutional characteristics may influence article production. To do so, SRI and its subcontractor, the ORC Macro Division of Macro International (ORC Macro), prepared a unified database on academic institutions that perform and publish research and analyzed this database to examine and characterize trends and potential explanatory variables. Data on publications was prepared by ipIQ, Inc. (formerly CHI Research, Inc.), using data from the Thomson ISI Science Citation Index (SCI) and Social Sciences Citation Index (SSCI).
3.2 Counting and Attributing Publications and Citations
In this report, we present counts of scientific and engineering articles, notes, and reviews published in scientific and technical journals tracked and indexed in the Thomson ISI SCI and SSCI. Counts exclude all letters to the editor, news pieces, editorials, conference proceedings and other content whose central purpose is not presentation or discussion of scientific data, theory, methods, apparatus, or experiments.
Arguably, comparisons over time are best made by examining articles in the population of influential journals. The journals in this group change over time. New journals may emerge and attain influence, while a few older journals may decline or cease to exist. As the worldwide research community grows, the net direction of change is towards more articles and journals in the Thomson ISI database (from 4,460 journals in 1988 to 5,653 in 2001). At any given time, the expanding set of journals tracked by Thomson ISI is the most suitable indicator of the mix of journals and articles. Patterns of authorship and citation in this set reflect the fields, nations, and institutions in which high-quality research is being produced. In this report the analyses are primarily conducted using the expanding journal set. However, changes over time in journal coverage can inflate article counts and alter the national or field coverage of the journal set for reasons that have little or nothing to do with influence (for example ISI's decisions concerning depth of coverage in different fields or languages).
To control for changes in ISI journal coverage that may have occurred for these extraneous reasons, we also performed analyses on a fixed set of journals that ISI indexed throughout the period we studied. Changes over time in publication outputs within that set are likely to reflect real output changes rather than yearly variations in the depth of ISI coverage of different sources of output because the set of publications does not change. However, comparisons within a fixed set of journals have a major limitation: because new research communities often spawn new journals to disseminate their research findings, the fixed journal set can severely under represent the kinds of research that were not already well established at the outset of the period. The longer the period of time being studied, the less adequate the fixed journal set becomes as a representation of the articles published throughout the period. For comparison purposes we present a few findings with respect to the fixed journal set. Our findings, and descriptive analyses in the SRS report, "Changing U.S. Output of Scientific Articles: 1988–2003," suggest that analyses conducted on both journal sets yield very similar results.
When we refer to "authorship" in our descriptions of the Thomson ISI data, we mean institutional authorship—the institutional affiliation(s) of the person(s) in the list of authors. The Thomson ISI database does not contain information about the persons who are authors (such as their discipline, age, sex, rank or status) or even a count of the number of authors from each institution. For counting purposes we only know the institutional affiliations of the authors. Thus, for our purposes, an author might be "Harvard University," but not "Dr. Smith."
Two types of attribution counts are whole and fractional counting. In whole counting, each institution that appears in the author list receives one credit for an article (even if there are multiple authors from that institution). Thus, the number of credits for an article varies, depending on the number of distinct institutional authors. When institutions collaborate, a single article is counted more than once. As a result, the sum of the whole counts attributed to the U.S. top 200 R&D performing academic institutions does not equal the whole count for these institutions considered as a single entity.
In fractional counting, when more than one institutional author is involved, credit for the article is divided equally among the institutions that appear on the author list. The sum of these fractional credits is equal to one. Because each article is counted only once, the fractional count for the top 200 R&D performing academic institutions (considered as a single mega-institution) is equal to the sum of the fractional counts for the 200 institutions considered separately. In fractional counting institutions other than the top 200 academic institutions may also receive some of the credit for the publication or citation. Thus, if Harvard University collaborated with two French academic institutions on a particular article, Harvard would receive one-third credit for that publication.
Publications as measured by whole counts are useful indicators of how often an institution is involved in producing articles. Publications as measured by fractional counts are useful in highlighting patterns and trends in the shares of credit attributable to different institutions. Neither method adequately captures the many factors that affect how the research community allocates credit for articles. Taken together, the fractional and whole counting methods provide different perspectives on recent trends in the production of science and engineering articles in the top 200 R&D performing academic institutions. Both are addressed in this report.
The citation count for a publication is the number of times that publication is cited in the journal set. Citations in S&E articles generally credit the contribution and influence of previous research to a scientist's own research. Trends in citation patterns are indicators of the perceived influence and productivity of scientific literature across institutional boundaries. Citations may be considered a measure of the impact of the articles cited, and to a lesser extent, the scientific quality of the article. However, citations occur for many reasons other than the scientific quality of the article. Authors tend to cite their own work, work produced by their own scientific community, authors who are currently in vogue or generally considered to be eminent in their field, the inventors of a useful (but not necessarily high quality) experimental technique or methodology, research that is being refuted, etc. The increased use of electronic databases may exclude citation of older and/or original sources not covered by the databases. Nevertheless, more frequently cited articles are arguably more influential.
For purposes of comparing inputs and outputs, analyses of citation counts are more problematic than analyses of publication counts. While publication counts are directly related to personnel and financial inputs; citation counts are one further step removed and can only occur after the publication appears in the literature. Partially for this reason, citation counts are highly correlated with publication counts, and analysis of citation counts yields essentially identical results to analysis of publication counts. For these reasons, in the body of this paper we concentrate on analyzing input/output relationships for publications. Some parallel analyses of citation counts are presented in appendix D.
In this report, some of the analyses have been implemented for aggregated groups of scientific fields, which we call field groups. The field groups are biology, life and agricultural sciences; computer sciences; medical sciences; engineering, math, and physical sciences; and psychology and social sciences. Field groups consist of individual or aggregated WebCASPAR fields, the NSF classification system used for the various input variables.
Publication data are mapped from the original ipIQ field classification to WebCASPAR fields (and therefore to field groups) to allow field analysis of input variables and publication output measures. (For a crosswalk between WebCASPAR and ipIQ fields, see appendix E.) WebCASPAR field classification has the advantage of corresponding closely to how most universities organize their research by scientific field. The ipIQ field classification of publications is based on the journal in which they appear. Journals are classified into 134 fine fields using the patterns of the journal's citations. Each journal is assigned to one fine field, including general journals like Science and Nature. The latter types of journals are assigned to a "general fine field" such as "general biomedical research." Citations are attributed to the science field to which the publication being cited was allocated.
The allocation of journals to single scientific fields can result in some misalignment of publications and resources. For example, if a collaboration between a statistician in one university and physician in another is published in a medical journal, the credit for that article will accrue to the medical field in the statistician's university while the personnel and expenditures associated with that article are accrued in the field containing the statistics department. In addition, some misalignment can occur because the different sources of data on personnel, expenditures, and publication and citation counts did not use the same systems for classifying resources and outputs into fields. Although we believe that our efforts to reclassify resources and outputs into a common set of WebCASPAR fields were generally successful, there may have been occasions when the totals by field would have been slightly different had the original data sources used the WebCASPAR system.
 WebCASPAR is a NSF database of academic science and engineering resources for individual fields of S&E at individual academic institutions. Information on WebCASPAR is available at http://ncsesdata.nsf.gov/webcaspar/.